Generating high fidelity PDF output from Office documents has always been a challenge, given the “long tail” of features which are possible in docx/pptx/xlsx files.
For Word documents, it is easy enough to output paragraphs of text, tables and images. But add in VML, DrawingML, equations, SmartArt, and fidelity becomes a challenge.
If your documents are constrained, you may be able to find a suitable conversion tool. Plutext’s PDF Converter was a good example of this. It worked well on a growing range of documents.
But ultimately, if you want great fidelity on a unconstrained set of files, you need to be using Microsoft’s own Office layout engine.
There are various ways to do that, for example https://developer.microsoft.com/en-us/graph/examples/document-conversion
For Java developers, a good solution is documents4j.
It uses Office and the Microsoft Scripting Host for VBS on the conversion machine, so that machine must Microsoft Windows.
Documents4j can run either a “LocalConverter” or a “RemoteConverter”.
Using a LocalConverter is as simple as:
import java.io.File;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import com.documents4j.api.DocumentType;
import com.documents4j.api.IConverter;
import com.documents4j.job.LocalConverter;
public class ToPDF {
public static void main(String[] args) {
File wordFile = new File( System.getProperty("user.dir")+"/input.docx" );
File target = new File( System.getProperty("user.dir")+"/output.pdf" );
IConverter converter = LocalConverter.builder()
.baseFolder(new File("C:\\temp"))
.workerPool(20, 25, 2, TimeUnit.SECONDS)
.processTimeout(30, TimeUnit.SECONDS)
.build();
Future<Boolean> conversion = converter
.convert(wordFile).as(DocumentType.MS_WORD)
.to(target).as(DocumentType.PDF)
.prioritizeWith(1000) // optional
.schedule();
}
}
From Maven, you just need these dependencies:
<dependency>
<groupId>com.documents4j</groupId>
<artifactId>documents4j-local</artifactId>
<version>1.1.1</version>
</dependency>
<dependency>
<groupId>com.documents4j</groupId>
<artifactId>documents4j-transformer-msoffice-word</artifactId>
<version>1.1.1</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
</dependency>
For a successful conversion, your logs will contain:
[main] INFO com.documents4j.conversion.msoffice.MicrosoftWordBridge - From-Microsoft-Word-Converter was started successfully
[main] INFO com.documents4j.job.LocalConverter - The documents4j local converter has started successfully
[pool-1-thread-1] INFO com.documents4j.conversion.msoffice.MicrosoftWordBridge - Requested conversion from input.docx (application/vnd.com.documents4j.any-msword) to output.pdf (application/pdf)