G'day Word document lovers!
I'm using org.docx4j.Docx4J.toHTML() to convert a .docx file to HTML, but have become a tad saddened by how lists in the source .docx file are converted, especially nested lists. They come in as a single "flat" list, with inline styling that attempts to use indentation to reconstruct the nesting (*yuck!*). HTML directly supports nested lists (via nesting of <ol> and/or <ul> tags), and I'm a little surprised that the toHTML method doesn't utilise those. I am well aware that there are corner cases in the .docx format that don't directly map to HTML's model (e.g. nested lists where the very first element is at a deep nesting level - HTML can't represent that), but I'm willing to live with that.
My questions are:
1. Are there any conversion options or anything that will result in nested <ol> and/or <ul> tags? I've looked and experimented a bit, but haven't found anything yet.
2. If this isn't directly supported by org.docx4j.Docx4J.toHTML(), what's the best way for me to roll this myself? I'd prefer to use Java, but if XSLT is more suitable I'd consider that too.
Thanks in advance!
Peter