1. WordprocessingMLPackage.load(bais) - perf metrics can be at best 500ms using a test docx
2. wordMLPackage.clone(); - perf metrics are similar to load() at 500ms using a test docx
3. Serializing results of WordprocessingMLPackage.load(bais) - WordprocessingMLPackage not serializable
4. Creating wordMLPackage from org.docx4j.xmlPackage.Package using FlatOpcXmlImporter - perf metrics can be at best 50ms using a test docx
5. Pooling wordMLPackage objects for re-use - overkill at this stage
Option 4 seems postive however i am concerned that this approach is not thread safe. In this basic test harness the FlatOpcXmlImporter object is constructed reusing the unmarshalled wmlPackageEl Package object and unsure whether this creates a deep clone or is basically referencing the same Parts objects and possible threading issues. Does anyone have an opinion on this?
- Code: Select all
RandomAccessFile f = new RandomAccessFile(inputfilepath, "r");
byte[] b = new byte[(int)f.length()];
final ByteArrayInputStream bais = new ByteArrayInputStream(b);
f.read(b);
StreamSource source = new StreamSource(bais);
org.docx4j.xmlPackage.Package wmlPackageEl = ((JAXBElement<org.docx4j.xmlPackage.Package>) u
.unmarshal(source)).getValue();
for (int i = 0; i < 100; i++) {
long start = System.currentTimeMillis();
FlatOpcXmlImporter xmlPackage = new FlatOpcXmlImporter(wmlPackageEl);
wordMLPackage = (WordprocessingMLPackage) xmlPackage.get();
times.add(new Long(System.currentTimeMillis() - start));
}
Also, are there other options that have been successfully in improving the performance of obtaining WordprocessingMLPackage objects and avoiding the constant rebuilding of parts and relationships?