One thing which makes this a bit tricky is that a word can actually be fragmented across runs of text. So for example, "hello" might be <w:r><w:t>hell</w:t></w:r><w:r><w:t>hello</w:t></w:r>. Such fragmentation will be due to things like change tracking, formatting, and spell check.
org/docx4j/TextUtils.java contains a method
- Code: Select all
public static void extractText(Object o, Writer w, JAXBContext jc)
which gives you a plain text representation of an object.
If you do your find against that, you'll at least know whether the string is present or not. You'd still need to replace in the real JAXB objects though (or be prepared to replace the paragraph contents with the plain text (and lose all formatting).
A find/replace method would be a nice addition to the code base, so feel free to post whatever you come up with as a contribution, and we can help to polish it.
In this respect, please note the DocumentModel class. Right now, it doesn't deal with the document at this level of detail, but you could imagine a class associated with each org.docx4j.wml.P (paragraph) class, containing the plain text representation of the paragraph (and for other purposes, the vertical space the paragraph will occupy on the page).