You can create your own XSLT from scratch if you wish, but you may find it easier to adapt docx4j's existing HTML output code.
2.8.1 contains 2 ways of generating HTML output.
The first (and primary) way is via XSLT. See HTMLExporterNG2 and docx2xhtmlNG2.xslt.
The other is HtmlExporterNonXSLT, which was added as a proof of concept since running Xalan extension functions on Android is problematic. It is not as complete/mature as the XSLT based approach.
A comment looks something like:
Using xml Syntax Highlighting
<w:p >
<w:r>
<w:t xml:space="preserve">The
</w:t>
</w:r>
<w:commentRangeStart w:id="0"/>
<w:r>
<w:t xml:space="preserve">quick brown
</w:t>
</w:r>
<w:commentRangeEnd w:id="0"/>
<w:r>
<w:rPr>
<w:rStyle w:val="CommentReference"/>
</w:rPr>
<w:commentReference w:id="0"/>
</w:r>
<w:r>
<w:t>fox.
</w:t>
</w:r>
</w:p>
Parsed in 0.002 seconds, using
GeSHi 1.0.8.4
Since a comment is comprised of "point" tags, not a single XML container, your challenge is to know when you are 'in' a comment. ie you have to maintain state.
Using XSLT, you can maintain state passing a param between templates, or going into a specific mode, or via extension functions. You can do it using any of these approaches (though I'd look at using the modelStates object).
State is easier to maintain using HtmlExporterNonXSLT, since that is pure Java. But putting comments aside, you should try it first to see how far off the HTML output is from your needs.
Hope this helps.. Jason