Office pptx/xlsx/docx to PDF to in docx4j 8.2.3
September 5th, 2020 by Jasondocx4j 8.2.3 facilitates 3 distinct ways to convert Microsoft Word docx documents to PDF. There are also possibilities for converting pptx or xlsx to PDF.
The three approaches:
- export-fo: the content is converted to XSL FO, and from there, to PDF (or any of the other formats supported by Apache FOP)
- documents4j: since 8.2.0, use Microsoft Word to do the conversion
- via-Microsoft-Graph: new in 8.2.3, use java-docx-to-pdf-using-Microsoft-Graph to do the conversion
So which should you choose? The following table covers some of the things you might want to consider:
export-FO | Microsoft Graph | documents4j | |
---|---|---|---|
Overview | Conversion of docx to XSL FO, then uses Apache FOP to convert to PDF | Uses Microsoft’s cloud | Uses your Microsoft Office installation |
Fidelity | Suitable for simple documents (text, tables, supported image types, header/footers) | 100% (Microsoft’s fidelity) | 100% (Microsoft’s fidelity) |
Suitability | simple docx | docx, pptx, xlsx | docx, pptx, xlsx |
License considerations | ASL v2 | Refer applicable Microsoft cloud terms | Refer Microsoft EULA governing your Office install (increasingly restricted with each release) |
Cost | Free | Microsoft cloud costs | (Microsoft Office) |
Confidentiality | documents don’t leave your server | documents go to Microsoft cloud | documents need not leave your servers |
Other advantages | – Fast XSL FO/PDF templating for high volume PDF creation – Open source, so can be extended |
– Microsoft encourages this approach – Microsoft cloud handles scalability |
– Can update a docx table of contents – Can convert RTF and binary .doc |
Other disadvantages | Two step (docx to XSL FO to PDF) processing is slower (except for XSL FO templating) | – Dependency on 3rd party cloud – Currently can’t update docx table of contents vote to fix |
– Not supported by Microsoft |