I am trying to convert docx to HTML in a web application(PEGA).
I imported the necessary jars linked to this based on my search.
Issue -1 Certain converted html loses its format.
Issue -2 If i use byteArray instead of file resulting HTML is messed up.
Not sure where i am wrong.
- Code: Select all
java.util.Base64.Decoder decoder = java.util.Base64.getDecoder();
java.util.Base64.Encoder encoder = java.util.Base64.getEncoder();
//Get inputstream from Case Document as a param
byte[] bs = decoder.decode(Word.getBytes());
InputStream is =new ByteArrayInputStream(bs);
WordprocessingMLPackage wordMLPackage = Docx4J.load(is);
HTMLSettings htmlSettings = Docx4J.createHTMLSettings();
htmlSettings.setImageDirPath(FilePath);
htmlSettings.setWmlPackage(wordMLPackage);
OutputStream os = new ByteArrayOutputStream();
OutputStream ou = new FileOutputStream(FilePath);
Docx4J.toHTML(htmlSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);
Docx4J.toHTML(htmlSettings, ou, Docx4J.FLAG_EXPORT_PREFER_XSL);
ou.close();
String result = ou.toString();
return result ;
Generated HTML File
HTML as a byteArray o/p
Source content