HTML to docx conversion bookmark, duplicate, style issue.

by **sagarmakwana** » Mon Jun 05, 2023 5:49 pm

Hi, Team I need some help fixing a few of the issues, I am developing one feature we can edit docx files online, using some javascript editor. so what is doing is as below.

1. I am uploading a docx file and converting it into HTML.
2. Converted HTML string setting into javascript editor and added content and images.
3. Again convert HTML to docx and download the converted file.[/list]

Below is the issue which I am facing

1. Images that are in the header and the body showing as bookmark images.
2. When I am converting the same content multiple times, it is adding previous content in every converted document in <a> tag, and the image size is also decreased.
3. In the converted docx file CSS content also showing.
4. Some of the style issues.

Java = 1.8
docx4j-core = 8.3.9
docx4j-JAXB-ReferenceImpl = 8.3.9
docx4j-ImportXHTML = 8.3.8

Below is the code for converting docx to HTML 1st time
--------------------------------------------------------------------

Code: Select all: WordprocessingMLPackage wordMLPackage = Docx4J.load(file.getInputStream()); String classpath = applicationContext.getResource("classpath:").getFile().getPath(); SdtWriter.registerTagHandler("HTML_ELEMENT", new SdtToListSdtTagHandler()); HTMLSettings htmlSettings = Docx4J.createHTMLSettings(); htmlSettings.setOpcPackage(wordMLPackage); htmlSettings.setImageDirPath(classpath + "/images/"); htmlSettings.setImageTargetUri("http://localhost:8080/docupload/resources/images/"); Docx4jProperties.setProperty("docx4j.Convert.Out.HTML.OutputMethodXML", true); ByteArrayOutputStream os = new ByteArrayOutputStream(); Docx4J.toHTML(htmlSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL); String htmlContent = new String(os.toByteArray(), StandardCharsets.UTF_8);

Below is the code for converting the HTML string to docx one or more time
---------------------------------------------------------------------------------------

Code: Select all: WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage(); // Converting HTML to docx first. XHTMLImporterImpl xhtmlImporterImpl = new XHTMLImporterImpl(wordMLPackage); List<Object> elements = xhtmlImporterImpl.convert(editorContent, null); wordMLPackage.getMainDocumentPart().getContent().addAll(elements); // Converting docx to HTML back. String classpath = applicationContext.getResource("classpath:").getFile().getPath(); SdtWriter.registerTagHandler("HTML_ELEMENT", new SdtToListSdtTagHandler()); HTMLSettings htmlSettings = Docx4J.createHTMLSettings(); htmlSettings.setOpcPackage(wordMLPackage); htmlSettings.setImageDirPath(classpath + "/images/"); htmlSettings.setImageTargetUri("http://localhost:8080/docupload/resources/images/"); Docx4jProperties.setProperty("docx4j.Convert.Out.HTML.OutputMethodXML", true); ByteArrayOutputStream os = new ByteArrayOutputStream(); Docx4J.toHTML(htmlSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL); String htmlContent = new String(os.toByteArray(), StandardCharsets.UTF_8);

And I have attached one zip file which contains different docx files.

if anyone has any idea please reply.

Thanks.

HTML to docx conversion bookmark, duplicate, style issue.

HTML to docx conversion bookmark, duplicate, style issue.

Who is online