http://dev.plutext.org/svn/docx4j/trunk ... /viaXSLFO/ contains the XSLT and the Conversion class which invokes it.
The XSLT creates a .fo, which Apache FOP then converts to a PDF. So the general approach is to convert docx markup to appropriate XSL formatting objects.
The XSLT processes a slightly modified version of the main documnet part (document.xml). Basically, Word sections are changed from point tags to containers (ie so the paragraphs and tables are within a section, and each <section> is in the <sections> tag)
If you look at the XSLT, you'll see that most of the hard work is done by extension functions in the Conversion class.
There is an extension function to create the XSL FO layout-master stuff (for page margins, headers/footers), but it is probably easiest to start by looking at <xsl:template match="w:p"> and w:r (ie paragraph/run level stuff).
THe paragraph and run properties are worked out by extension functions. The idea is that bits of XML are passed into the extension functions, where they are marshalled into docx4j wml objects (so the rest of docx4j can be used to process them). The extension functions generally return XSL FO XML, which is then inserted into the output xml document.
When implementing new stuff, it is useful know what XSL FO you are aiming for, and to compare that to what you've created. To see the XSL FO, invoke:
- Code: Select all
public void setSaveFO(File save)
before running the transform, or enable log4j logging on the Conversion class.
cheers .. Jason