I'm using XML bindings with docx4j to generate word docx reports.
In some cases when the XML is big 2 MB ( 40000 lines ) the heap usage goes up 1-2 GB and it takes 50-60 minutes to generate report.
The slowest part is the binding: ( BindingHandler.applyBindings(templateWordPackage.getMainDocumentPart()); )
did a memory profiling:
The thread java.lang.Thread @ 0x79100a248 Thread-127 keeps local variables with total size 1,362,944,720 (84.19%) bytes.
The memory is accumulated in one instance of "java.lang.Thread" loaded by "<system class loader>".
The stacktrace of this Thread is available. See stacktrace :
Thread-127
at org.apache.xml.dtm.ref.dom2dtm.DOM2DTM.nextNode()Z (DOM2DTM.java:539)
at org.apache.xml.dtm.ref.DTMDefaultBase._nextsib(I)I (DTMDefaultBase.java:563)
at org.apache.xml.dtm.ref.DTMDefaultBase.getNextSibling(I)I (DTMDefaultBase.java:1140)
at org.apache.xml.dtm.ref.DTMDefaultBaseTraversers$ChildTraverser.next(II)I (DTMDefaultBaseTraversers.java:461)
at org.apache.xpath.axes.AxesWalker.getNextNode()I (AxesWalker.java:333)
at org.apache.xpath.axes.AxesWalker.nextNode()I (AxesWalker.java:361)
at org.apache.xpath.axes.WalkingIterator.nextNode()I (WalkingIterator.java:192)
at org.apache.xpath.axes.NodeSequence.nextNode()I (NodeSequence.java:281)
at org.apache.xpath.axes.NodeSequence.item(I)I (NodeSequence.java:471)
at org.apache.xpath.objects.XNodeSet.str()Ljava/lang/String; (XNodeSet.java:272)
at org.apache.xpath.jaxp.XPathImpl.getResultAsType(Lorg/apache/xpath/objects/XObject;Ljavax/xml/namespace/QName;)Ljava/lang/Object; (XPathImpl.java:311)
at org.apache.xpath.jaxp.XPathImpl.evaluate(Ljava/lang/String;Ljava/lang/Object;Ljavax/xml/namespace/QName;)Ljava/lang/Object; (XPathImpl.java:276)
at org.apache.xpath.jaxp.XPathImpl.evaluate(Ljava/lang/String;Ljava/lang/Object;)Ljava/lang/String; (XPathImpl.java:365)
at org.docx4j.openpackaging.parts.XmlPart.xpathGetString(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String; (XmlPart.java:157)
at org.docx4j.model.datastorage.BindingHandler.xpathGetString(Lorg/docx4j/openpackaging/packages/WordprocessingMLPackage;Ljava/util/Map;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String; (BindingHandler.java:253)
at org.docx4j.model.datastorage.BindingHandler.xpathGenerateRuns(Lorg/docx4j/openpackaging/packages/WordprocessingMLPackage;Lorg/docx4j/openpackaging/parts/JaxbXmlPart;Ljava/util/Map;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Lorg/w3c/dom/traversal/NodeIterator;ZLjava/lang/String;)Lorg/w3c/dom/DocumentFragment; (BindingHandler.java:384)
at org.docx4j.model.datastorage.BindingHandler.xpathGenerateRuns(Lorg/docx4j/openpackaging/packages/WordprocessingMLPackage;Lorg/docx4j/openpackaging/parts/JaxbXmlPart;Ljava/util/Map;Lorg/docx4j/openpackaging/parts/opendope/XPathsPart;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Lorg/w3c/dom/traversal/NodeIterator;Z)Lorg/w3c/dom/DocumentFragment; (BindingHandler.java:702)
....
I'm using 2 repeaters inside each other and 10-15 different xpath in them to process the XML:
Something like:
repeate /report/docs
bind 4-5 fields
for each /report/docs/entities/entity
bind 10-15 xml elements to create formated entity text
when the entity number is high ( over 1000 ) the process takes hell lot of time and memory.
1. Is there any way to improve performance/memory usage
2. Is there any way to pre-format entity text prior data bind? Like use an XSLT to generate XHtml and bind it, keeping formating ( bold, italic, etc )
Thanks