Hi
At a glance, it looks to me like you ought to be able to use unmarshallFromTemplate or unmarshalString to create a org.docx4j.wml.P object from the w:p you have listed. The only thing which looks odd is that some of the quotation marks in your snippet are escaped, and others aren't. Fix that, then tell us what happens?
If your objective is to convert html to docx, you might have a look at the proof of concept org.docx4j.openpackaging.packages.html2wordml.xslt
(Are you already using XSLT to solve the problem, or some other approach?)
That transforms html (assumed to be just one or more <p>) into a w:sdtContent. The code below gets an org.docx4j.wml.SdtContentBlock object from it.
- Code: Select all
// Convert htmlstring to WordML via XSLT
// Strip any since Xalan doesn't like these undeclared
if (htmlString.indexOf(" ")>-1) {
htmlString = htmlString.replace(html_nbsp, html_nbsp_replacement);
}
// .. so we need
javax.xml.transform.stream.StreamSource ss
= new javax.xml.transform.stream.StreamSource(new java.io.StringReader(htmlString));
java.io.InputStream xslt = org.docx4j.utils.ResourceUtils.getResource("org/docx4j/openpackaging/packages/html2wordml.xslt");
javax.xml.transform.dom.DOMResult wmlResult = new javax.xml.transform.dom.DOMResult();
// Debug
//javax.xml.transform.stream.StreamResult wmlResult = new javax.xml.transform.stream.StreamResult(pw);
org.docx4j.XmlUtils.transform(ss, xslt, null, wmlResult);
// For convenience, that results in an sdtContent element
JAXBContext jc = org.plutext.Context.jcTransforms;
Unmarshaller u = jc.createUnmarshaller();
u.setEventHandler(new org.docx4j.jaxb.JaxbValidationEventHandler());
org.docx4j.wml.SdtContentBlock sdtContent = (org.docx4j.wml.SdtContentBlock)u.unmarshal(wmlResult.getNode());
You quite probably don't want to make an sdtContent object, in which case you'll need to change the transform a bit.
If you were trying to convert entire HTML documents (rather than a paragraph or 2), a more general approach might be appropriate. This would be for the XSLT to output the entire main document part. Alternatively, output in pkg format (since this includes all the parts in a single xml document - so you could potentially convert css into a styles part), and then to use org.docx4j.convert.in.XmlPackage to create a new WordML package.
Hope this helps,
Jason