I was accomplishing this in POI, but the ability to read Font type and size was not effective, and the research I did said docx4j was better at that.
My issues so far:
1. I can't get the text from a line(paragraph) consistently; the text is broken into multiple org.docx4j.wml.P/R/Text objects even though the 'line' is unbroken in the text
2. I am not getting the Font type and Size consistently; it is often null ...