Great to hear you got it working; I'm sure a blog post would be most useful to / appreciated by anyone following in your footsteps
In answer to your question
https://github.com/plutext/docx4j/blob/ ... ilter.java may be a start, but you still need to join up adjacent w:r/w:t .. it may be enough for your purposes to do this if they have no run properties (w:rPr), but a more complete solution would join them up if they have the same run properties.
I haven't written anything like this, because I prefer the content control custom xml binding approach, where this problem does not arise
XSLT (used by Filter) probably isn't a good way to join up adjacent w:r/w:t .. you're better off doing this with a method in Java, pretty easy List manipulation (for a Java programmer anyway!).