Mar 16 2020
documents4j for PDF output
Generating high fidelity PDF output from Office documents has always been a challenge, given the “long tail” of features which are possible in docx/pptx/xlsx files.
For Word documents, it is easy enough to output paragraphs of text, tables and images. But add in VML, DrawingML, equations, SmartArt, and fidelity becomes a challenge.
If your documents are constrained, you may be able to find a suitable conversion tool. Plutext’s PDF Converter was a good example of this. It worked well on a growing range of documents.
But ultimately, if you want great fidelity on a unconstrained set of files, you need to be using Microsoft’s own Office layout engine.
There are various ways to do that, for example https://developer.microsoft.com/en-us/graph/examples/document-conversion
For Java developers, a good solution is documents4j.
It uses Office and the Microsoft Scripting Host for VBS on the conversion machine, so that machine must Microsoft Windows.
Documents4j can run either a “LocalConverter” or a “RemoteConverter”.
Using a LocalConverter is as simple as:
import java.io.File; import java.util.concurrent.Future; import java.util.concurrent.TimeUnit; import com.documents4j.api.DocumentType; import com.documents4j.api.IConverter; import com.documents4j.job.LocalConverter; public class ToPDF { public static void main(String[] args) { File wordFile = new File( System.getProperty("user.dir")+"/input.docx" ); File target = new File( System.getProperty("user.dir")+"/output.pdf" ); IConverter converter = LocalConverter.builder() .baseFolder(new File("C:\\temp")) .workerPool(20, 25, 2, TimeUnit.SECONDS) .processTimeout(30, TimeUnit.SECONDS) .build(); Future<Boolean> conversion = converter .convert(wordFile).as(DocumentType.MS_WORD) .to(target).as(DocumentType.PDF) .prioritizeWith(1000) // optional .schedule(); } }
From Maven, you just need these dependencies:
<dependency> <groupId>com.documents4j</groupId> <artifactId>documents4j-local</artifactId> <version>1.1.1</version> </dependency> <dependency> <groupId>com.documents4j</groupId> <artifactId>documents4j-transformer-msoffice-word</artifactId> <version>1.1.1</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-simple</artifactId> </dependency>
For a successful conversion, your logs will contain:
[main] INFO com.documents4j.conversion.msoffice.MicrosoftWordBridge - From-Microsoft-Word-Converter was started successfully [main] INFO com.documents4j.job.LocalConverter - The documents4j local converter has started successfully [pool-1-thread-1] INFO com.documents4j.conversion.msoffice.MicrosoftWordBridge - Requested conversion from input.docx (application/vnd.com.documents4j.any-msword) to output.pdf (application/pdf)
No Responses so far
Comments are closed.
Comment RSS