Hi all,
I'm usign docx4j 3.3.6;
I have the attached docx file and when I try to convert it into pdf with the following code I've the exception:
public static void main(String[] args) throws Exception {
String regex = null;
PhysicalFonts.setRegex(regex);
String inputfilepath = System.getProperty("user.dir") + "/pdf/InvalidXmlCharacter.docx";
String outputfilepath = System.getProperty("user.dir") + "/pdf/InvalidXmlCharacter.pdf";
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new File(inputfilepath));
// Refresh the values of DOCPROPERTY fields
FieldUpdater updater = new FieldUpdater(wordMLPackage);
updater.update(true);
// All methods write to an output stream
OutputStream os = new java.io.FileOutputStream(outputfilepath);
System.out.println("Attempting to use XSL FO");
Mapper fontMapper = new IdentityPlusMapper();
wordMLPackage.setFontMapper(fontMapper);
PhysicalFont font = PhysicalFonts.get("Arial Unicode MS");
FOSettings foSettings = Docx4J.createFOSettings();
foSettings.setFoDumpFile(new File(inputfilepath + ".fo"));
foSettings.setWmlPackage(wordMLPackage);
Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);
System.out.println("Saved: " + outputfilepath);
}
Exception exporting package: org.docx4j.openpackaging.exceptions.Docx4JException: Exception writing Document to OutputStream: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 13372; Character reference "" is an invalid XML character.
at org.docx4j.utils.XmlSerializerUtil.serialize(XmlSerializerUtil.java:50)
at org.docx4j.utils.XmlSerializerUtil.serialize(XmlSerializerUtil.java:14)
at org.docx4j.convert.out.fo.renderers.FORendererApacheFOP.render(FORendererApacheFOP.java:209)
at org.docx4j.convert.out.fo.renderers.FORendererApacheFOP.render(FORendererApacheFOP.java:159)
at org.docx4j.convert.out.fo.AbstractFOExporter.postprocess(AbstractFOExporter.java:168)
at org.docx4j.convert.out.fo.AbstractFOExporter.postprocess(AbstractFOExporter.java:47)
at org.docx4j.convert.out.common.AbstractExporter.export(AbstractExporter.java:82)
at org.docx4j.Docx4J.toFO(Docx4J.java:597)
at org.docx4j.Docx4J.toPDF(Docx4J.java:612)
...
Caused by: javax.xml.transform.TransformerException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 13372; Character reference "" is an invalid XML character.
at org.docx4j.org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:506)
at org.docx4j.utils.XmlSerializerUtil.serialize(XmlSerializerUtil.java:47)
... 94 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 13372; Character reference "" is an invalid XML character.
at org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1219)
at __redirected.__XMLReaderFactory.parse(__XMLReaderFactory.java:176)
at org.docx4j.org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:489)
... 95 more
The reason of the exception seems to be the presence into docx of a long pointed list (A,B,...,Z,AA); if I remove AA the conversion is executed successfully; this problem doesn't occours if I use a numbered pointed list (1,1.1,1.1.1,2,3,...).
Any idea about how to solve the problem ?
Thanks