Hi Justin
jbeltran wrote:1. Parsing text in documents (i.e. in paragraphs, tables, etc.)
Can you explain what you want to do a bit more? Certainly you can get and manipulate the text ...
jbeltran wrote:2. Merging different word documents
You can certainly add/remove parts, and manipulate their contents.
Update Nov 2010: I've created a paid extension for docx4j which provides a general solution to the problem of merging documents. See
http://dev.plutext.org/blog/2010/11/mer ... documents/ for details.
jbeltran wrote:3. Creating hyperlinks (not to external URLs, but to other places in document)
These are bookmarks i think, aren't they? docx4j does support those.
jbeltran wrote:4. Creating table of contents
I haven't done this, but I expect it would be fairly straightforward. Will Word be the ultimate consumer? If it is, maybe you just add the field code (iirc), and let Word generate the TOC. If you need to populate the table yourself, do you need up to date page numbers? If you have merged documents, then you won't be able to rely on lastRenderedPageBreak to help you with the page counting.
cheers
Jason