I am trying to generate word from template with docx4j, but I am getting error/warning when open word document as '' Word found unreadable content in .docx. Do you want to recover the contents of this document? If you trust the source of this document, click Yes", after replacing content through docx4j.
Error is only coming in particular case when converting html content from internet that contain some image. Below is the html content that I have used:
- Code: Select all
<p style="margin-top:0.5em;margin-bottom:0.5em;color:#202122;font-family:sans-serif;font-size:14px;background-color:#ffffff;">ter the border was defined so to make the northern portion of the territory concerned part of the French mandated territory that became Lebanon, many Zionist geographers — and Israeli geographers in the state's early years — continued to speak of "The Upper Galilee" as being "the northern sub-area of the <a href="https://en.wikipedia.org/wiki/Galilee" title="Galilee" style="text-decoration-line:none;color:#0645ad;background:none;">Galilee</a> region of <a href="https://en.wikipedia.org/wiki/Israel" title="Israel" style="text-decoration-line:none;color:#0645ad;background:none;">Israel</a> and <a href="https://en.wikipedia.org/wiki/Lebanon" title="Lebanon" style="text-decoration-line:none;color:#0645ad;background:none;">Lebanon</a>".</p><p style="margin-top:0.5em;margin-bottom:0.5em;color:#202122;font-family:sans-serif;font-size:14px;background-color:#ffffff;"><img src="" /></p><p style="margin-top:0.5em;margin-bottom:0.5em;color:#202122;font-family:sans-serif;font-size:14px;background-color:#ffffff;">Under this definition, "The Upper Galilee" covers an area spreading over 1,500 km², about 700 in Israel and the rest in Lebanon. This included the highland region of <a href="https://en.wikipedia.org/wiki/Belad_Bechara" title="Belad Bechara" style="text-decoration-line:none;color:#0645ad;background:none;">Belad Bechara</a> in <a href="https://en.wikipedia.org/wiki/Jabal_Amel" title="Jabal Amel" style="text-decoration-line:none;color:#0645ad;background:none;">Jabal Amel</a> located in <a href="https://en.wikipedia.org/wiki/South_Lebanon" class="mw-redirect" title="South Lebanon" style="text-decoration-line:none;color:#0645ad;background:none;">South Lebanon</a>,<sup id="cite_ref-4" class="reference" style="line-height:1;unicode-bidi:isolate;white-space:nowrap;font-size:11.2px;"><a href="https://en.wikipedia.org/wiki/Upper_Galilee#cite_note-4" style="text-decoration-line:none;color:#0645ad;background:none;">[4]</a></sup> </p>
I have used Java 8 and below library of docx4j:
- Code: Select all
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-JAXB-ReferenceImpl</artifactId>
<version>8.2.9</version>
</dependency>
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-ImportXHTML</artifactId>
<version>8.2.1</version>
</dependency>
I have used below code to replace html content in word:
- Code: Select all
public static void replaceCustomContent(WordprocessingMLPackage wordMLPackage, MainDocumentPart documentPart,
String customContentFieldName, String replacedValue) {
List<Object> textElements = getAllElementFromObject(documentPart, Text.class);
for (Object textElement : textElements) {
Text text = (Text) textElement;
if (text.getValue().contains(customContentFieldName)) {
try {
R run = (R) (text.getParent());
P p = (P) (run.getParent());
Tc tc = (Tc) p.getParent();
int cellIndex = tc.getContent().indexOf(p);
if (cellIndex != -1) {
tc.getContent().remove(cellIndex);
XHTMLImporter xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
replacedValue = "<html><head></head><body>" + replacedValue + "</body></html>";
final Document document = Jsoup.parse(replacedValue);
document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
document.outputSettings().escapeMode(EscapeMode.xhtml);
replacedValue = document.html();
List<Object> objects = xHTMLImporter.convert(replacedValue, null);
for (Object object : objects) {
tc.getContent().add(cellIndex, object);
cellIndex++;
}
}
} catch (Docx4JException e) {
log.error("Docx4j exception while converting template");
throw new ApiRuntimeException(TemplateServiceException.DOCX4J_TEMPLATE_CONERSION_EXCEPTION,
new Object[] {}, HttpStatus.INTERNAL_SERVER_ERROR.value(), e);
}
break;
}
}
}