docx4j gives you exactly what it finds in the document.xml part (unzip your docx to have a look). What you see on the document surface in the Word (or for that matter, LibreOffice/OpenOffice) GUI, may be simpler than what that program actually creates at the XML level.
If you were creating the content in docx4j, you'd likely do it as you describe.