I am trying to convert a docx to a PDF using the Docx4J.toFO() method. Everything works as expected except for the treatment of white spaces. Any amount of spaces is truncated to a single space, and the resulting PDF loses format. It is essential that the PDF retain the white spaces. I have spent a LOT of time doing research and am fairly confident that, with the current settings, the white spaces should be preserved, but they aren't so now I'm here. I will now go through a detailed explanation of all the steps I take to produce the PDF in order to hopefully shed some light for more experienced people as to where my problem lies.
First, the docx is created by using a mailmerge (output = org.docx4j.model.fields.merge.MailMerger.getConsolidatedResultCrude(wordMLPackage, data);). I have a docx template that gets merged with some text (since the endgame functionality is having many documents with different text but the same header). The docx template is attached here and is called "Docx_Template.docx". The merge works by parsing an incoming text file with a .index ending (Incoming_Text.index), the file contains a key value pair, the key being the same as the mail merge field in the template: mfcpty. The text is then mapped to the field successfully and the resulting docx looks exactly what it's supposed to look like. The final docx is called "Incoming_Text.docx" and is attached.
Next, the docx is sent to the code that handles the conversion to PDF. Here is a snippet of the code that does the conversion:
- Code: Select all
wxmlPackage = WordprocessingMLPackage.load(convFile);
IdentityPlusMapper fontMapper = new IdentityPlusMapper();
wxmlPackage.setFontMapper(fontMapper);
PhysicalFonts.discoverPhysicalFonts();
FOSettings foSettings = Docx4J.createFOSettings();
foSettings.setFoDumpFile(new java.io.File("foSettings.xml"));
foSettings.setWmlPackage(wxmlPackage);
OutputStream os = new java.io.FileOutputStream(pdfFileName);
Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);
The resulting pdf is called Output.pdf and is attached. As you can see, all white space is not preserved. In order to debug I dumped the intermediate FOSettings file. Upon inspection it clearly shows in multiple places the attributes: white-space-collapse="false" as well as white-space="pre". I did some research on this and learnt that some people had problems with the white space attribute white-space-treatment="preserve" not doing it's job. This was the attribute that was all over my FOSettings (FOSettings.xml) and I went as far as to alter the docx4j and docx4j-export-fo jars in order to change the way the PDF is created. I managed to successfully change the resulting FO settings file to the white-space attributes that you see in the attached file. I am stumped as to why white space is not being preserved even tho it so clearly says everywhere that it will. Any help is greatly appreciated.
Thank you
Edit: For some more clarification, if I were to add a sufficient amount of spaces between the words, those spaces will also truncate and disappear.
Also, the FOSettings file that is attached is the one that is produced AFTER I altered the docx4j jars, the original is slightly different, but the resulting output.pdf is identical.