Problem converting docx to pdf (missing parts)

by **Lostman** » Thu Dec 15, 2016 7:17 pm

Hi guys,

I am using docx4j to generate a report, I can generate it in .docx but I have problems when converting it to pdf.

Let me explain what I do, first I have a template and I have some keywords to replace, some are text, but some others are images and even tables

To search for keywords I do this:

Code: Select all: template = WordprocessingMLPackage.load(stream); HashMap<String, String> values = setValuesMap(route); replacePlaceholders(template, values); List<Object> texts = template.getMainDocumentPart().getJAXBNodesViaXPath(XPATH_TO_SELECT_TEXT_NODES, true); for (Object obj : texts) { List<?> objContent = getAllElementFromObject(obj, Text.class); for (Object obj1 : objContent){ Text text = (Text) obj1; String textValue = text.getValue(); if (textValue.contains("KEYWORD_IMAGE")){ // Clear node content List<Object> content=((R) obj).getContent(); content.clear(); // Add image replacement // Get a string with base 64 value from an image String image = getBase64Image(); String imageDataBytes = image.substring(image.indexOf(",")+1); byte[] decoded = org.apache.commons.codec.binary.Base64.decodeBase64(imageDataBytes.getBytes()); R r = newImageR(template, decoded, null, null, 0, 1, 2000); content.add(r); } if (textValue.contains("KEYWORD_TABLE")){ List<Object> content=((R) obj).getContent(); content.clear(); content.addAll(createTableFromData(data.getTableData())); } } }

The function to create image element is this

Code: Select all: public static R newImageR( WordprocessingMLPackage wordMLPackage, byte[] bytes, String filenameHint, String altText, int id1, int id2, long cx) throws Exception { BinaryPartAbstractImage imagePart = BinaryPartAbstractImage.createImagePart(wordMLPackage, bytes); Inline inline = imagePart.createImageInline( filenameHint, altText, id1, id2, cx, false); // Now add the inline in w:p/w:r/w:drawing org.docx4j.wml.ObjectFactory factory = Context.getWmlObjectFactory(); org.docx4j.wml.R run = factory.createR(); org.docx4j.wml.Drawing drawing = factory.createDrawing(); run.getContent().add(drawing); drawing.getAnchorOrInline().add(inline); return run; }

And functions to create table is

Code: Select all: private List<Object> createTableFromData(Set<Data> tableData) throws Exception{ List<Object> elements = new ArrayList<Object>(); Tbl tbl = createTable(); for (Data row : tableData){ // Create new row Tr tr = createTrData(row); tbl.getContent().add(tr); } elements.add(tbl); return elements; } private Tbl createTable() throws JAXBException{ String text = "<w:tbl xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\">" + "<w:tblPr>" + "<w:tblW w:type=\"dxa\" w:w=\"8930\"/>" + "<w:tblInd w:type=\"dxa\" w:w=\"344\"/>" + "<w:tblBorders>" + "<w:top w:color=\"BFBFBF\" w:space=\"0\" w:sz=\"4\" w:themeColor=\"background1\" w:themeShade=\"BF\" w:val=\"single\"/>" + "<w:bottom w:color=\"BFBFBF\" w:space=\"0\" w:sz=\"4\" w:themeColor=\"background1\" w:themeShade=\"BF\" w:val=\"single\"/>" + "<w:insideH w:color=\"BFBFBF\" w:space=\"0\" w:sz=\"4\" w:themeColor=\"background1\" w:themeShade=\"BF\" w:val=\"single\"/>" + "<w:insideV w:color=\"BFBFBF\" w:space=\"0\" w:sz=\"4\" w:themeColor=\"background1\" w:themeShade=\"BF\" w:val=\"single\"/>" + "</w:tblBorders>" + "<w:shd w:color=\"auto\" w:fill=\"FFFFFF\" w:themeFill=\"background1\" w:val=\"clear\"/>" + "<w:tblCellMar>" + "<w:top w:type=\"dxa\" w:w=\"15\"/>" + "<w:left w:type=\"dxa\" w:w=\"15\"/>" + "<w:bottom w:type=\"dxa\" w:w=\"15\"/>" + "<w:right w:type=\"dxa\" w:w=\"15\"/>" + "</w:tblCellMar>" + "<w:tblLook w:firstColumn=\"1\" w:firstRow=\"1\" w:lastColumn=\"0\" w:lastRow=\"0\" w:noHBand=\"0\" w:noVBand=\"1\" w:val=\"04A0\"/>" + "</w:tblPr>" + "<w:tblGrid>" + "<w:gridCol w:w=\"3130\"/>" + "<w:gridCol w:w=\"5800\"/>" + "</w:tblGrid>" + "</w:tbl>"; Tbl tbl = (Tbl)XmlUtils.unmarshalString(text); return tbl; } private Tr createTrData(Data row) throws Exception{ ObjectFactory wmlObjectFactory = new ObjectFactory(); Tr tr = createTableRow(156); Tc tc1 = createTableCell(2126); JAXBElement<org.docx4j.wml.Tc> tcWrapped1 = wmlObjectFactory.createTrTc(tc1); tr.getContent().add(tcWrapped1); if (data.getTypesOfEquipment() != null && !data.getTypesOfEquipment().isEmpty()){ tc1.getContent().add(createParagrahBulletListHeader("Types of equipment")); for (EquipmentType et : data.getTypesOfEquipment()){ tc1.getContent().add(createBulletedParagraph(et.getName())); } } Tc tc2 = createTableCell(6804); JAXBElement<org.docx4j.wml.Tc> tcWrapped2 = wmlObjectFactory.createTrTc(tc2); tr.getContent().add( tcWrapped2); tc2.getContent().addAll(convertStringToParagraphs(data.getComments())); return tr; }

I convert to pdf with this

Code: Select all: String inputfilepath = "C:\\Users\\user\\Downloads\\SPAIN_GENERATE_TEST_2.docx"; WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File(inputfilepath)); OutputStream os = new java.io.FileOutputStream(inputfilepath.substring(0,inputfilepath.length()-5) + ".pdf"); FOSettings foSettings = Docx4J.createFOSettings(); foSettings.setFoDumpFile(new java.io.File(inputfilepath + ".fo")); // foSettings.setWmlPackage(template); foSettings.setWmlPackage(wordMLPackage); Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);

I havent added all the code but I think is enough to show what I do, with this I generate a .docx correctly, but when I try to convert it to pdf images and tables didn´t show, after some trys I find that if I open generated docx and save it as a new docx (without doing anything else to the file) and use this duplicate file to generate the pdf, now images and tables did show.
Why is this happening? What am I doing wrong?

I attach four files:

- SPAIN_GENERATE_TEST.docx (generated file from code)
- SPAIN_GENERATE_TEST.pdf (pdf generated from previous file)
- SPAIN_GENERATE_TEST_2.docx (duplicate created using Word 2010)
- SPAIN_GENERATE_TEST_2.pdf (pdf generated from duplicated file)

I unzip both .docx and I see some differents between generated docx and I see some differences, first content of [Content_Types].xml differs, generated file include references to image file that duplicate file doesn´t, also image name is changed. I think that these changes in code have some relation with the fact that duplicated file can be converted to pdf with all his parts, but I don´t know if there is a way to reproduce these behaviour using docx4j.

Can anyone help me?

Thanks a lot for your help

And thanks a lot for your work.

by **jason** » Sat Dec 17, 2016 7:40 pm

You have your drawing in a run in a run, which is wrong:

Syntax: [ Download ] [ Hide ]

Using xml Syntax Highlighting

<w:r>
<w:r>
<w:drawing>
Parsed in 0.000 seconds,  using GeSHi 1.0.8.4

Word evidently fixes that.

For a table cell, you have:

Syntax: [ Download ] [ Hide ]

Using xml Syntax Highlighting

<w:tc>
<w:tcPr>
<w:tcW w:w="6804" w:type="dxa"/>
</w:tcPr>
<w:p>
<w:pPr>
<w:rPr>
<w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi"/>
<w:color w:val="595959" w:themeColor="text1" w:themeTint="A6"/>
<w:sz w:val="20"/>
</w:rPr>
</w:pPr>
<w:r>
<w:rPr>
<w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi"/>
<w:color w:val="595959" w:themeColor="text1" w:themeTint="A6"/>
<w:sz w:val="20"/>
</w:rPr>
<w:p>
<w:pPr>
<w:numPr>
<w:numId w:val="1"/>
</w:numPr>
</w:pPr>
<w:r>
<w:r>
<w:rPr>
<w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi"/>
<w:color w:val="595959" w:themeColor="text1" w:themeTint="A6"/>
<w:sz w:val="20"/>
</w:rPr>
<w:t>ETSI</w:t>
</w:r>
</w:r>
</w:p>
</w:r>
</w:p>
</w:tc>
Parsed in 0.003 seconds,  using GeSHi 1.0.8.4

ie a w:p inside a w:r. That's wrong as well. You've done that in a number of places.

I stopped looking at line 1422.

If you correct the above and continue to have problems, please feel free to post again.

by **jason** » Sat Dec 17, 2016 7:46 pm

https://github.com/plutext/docx4j/commi ... 9b601f44e5 guards against this.

It will be in the next nightly build and v3.3.2 (when released).

by **Lostman** » Wed Dec 21, 2016 1:44 am

Hi Jason,

Your help has been extremely useful, thanks, I solved all my problems of missing parts. But now I am facing a new problem, I want to generate a pdf that prevents copy and edit.

At this point I travel throught the examples and find how to protect a generated word from editing (I implemented it as a test) but not from copy (you can select all content and copy it in a new file). And when i convert it to pdf I can do the same thing, select all the document and copy to an empty one.

So far I didnt find any clue about this. Do you know how to prevent copy from a generated PDF?

Really, thanks for your help.

by **jason** » Sat Dec 24, 2016 5:50 pm

You could try PDFBox:

https://pdfbox.apache.org/docs/1.8.11/j ... nt(boolean)

Usage example: https://pdfbox.apache.org/1.8/cookbook/encryption.html

Problem converting docx to pdf (missing parts)

Problem converting docx to pdf (missing parts)

Re: Problem converting docx to pdf (missing parts)

Re: Problem converting docx to pdf (missing parts)

Re: Problem converting docx to pdf (missing parts)

Re: Problem converting docx to pdf (missing parts)

Who is online