Traversing DocX-File

by **tori** » Sat Nov 16, 2013 1:02 am

Hi community,

I created a simple docx file consisting of 4 lines with simple text. I tried to manipulate some parts of the text by traversing over all org.docx4j.wml.Text elements:

Code: Select all: WordprocessingMLPackage wpml = WordprocessingMLPackage.load(pathToDocX.toFile()); MainDocumentPart mdp = wpml.getMainDocumentPart(); Finder finder = new Finder(Text.class); new TraversalUtil(mdp.getContent(), finder); public static class Finder extends CallbackImpl { protected Class<?> typeToFind; public List<Object> results = new ArrayList<>(); protected Finder(Class<?> typeToFind) { this.typeToFind = typeToFind; } public List<Object> apply(Object o) { // Adapt as required if (o.getClass().equals(typeToFind)) { results.add(o); } return null; } }

I got all Text-Elements but not in one piece. I believe its because of my word document:

Code: Select all: <w:document mc:Ignorable="w14 wp14"> <w:body> <w:p> <w:r> <w:t>Das ist Text 1!</w:t> </w:r> </w:p> <w:p> <w:r> <w:t>Das ist Text 2!</w:t> </w:r> </w:p> <w:p> <w:r> <w:t>Das ist Text3!</w:t> </w:r> </w:p> <w:p> <w:r> <w:t>Das ist Text 4!</w:t> </w:r> </w:p> <w:p/> <w:p> <w:r> <w:t xml:space="preserve">Hier ein ganz langer </w:t> </w:r> <w:r> <w:t>Text!</w:t> </w:r> <w:bookmarkStart w:name="_GoBack" w:id="0"/> <w:bookmarkEnd w:id="0"/> </w:p> <w:sectPr> <w:pgSz w:w="11906" w:h="16838"/> <w:pgMar w:top="1417" w:right="1417" w:bottom="1134" w:left="1417" w:header="708" w:footer="708" w:gutter="0"/> <w:cols w:space="708"/> <w:docGrid w:linePitch="360"/> </w:sectPr> </w:body> </w:document>

How can i handle such docx files with traversing? Should i traverse over paragraphs?

Thanks for help.

by **jason** » Sat Nov 16, 2013 7:17 am

What is your objective?

by **tori** » Mon Nov 18, 2013 9:06 pm

Thanks jason for your replay and sorry for my unclear post:

My objective is to replace special placeholder in a docx-file which I want to use as template. For example:

Code: Select all: This is my <placeholder1> and this is my <placeholder2>!

I did traversing over the docx-file in order to find all Text.class elements. Next I checked if the value of the Text.class element contains a special placeholder which i want to replace via the getValue methode. I expected to get only one single Text.class element with a value of "This is my <placeholder1> and this is my <placeholder2>!". However, the text elements which i found are:

Code: Select all: org.docx4j.wml.Text@20498030 --> whith a value --> "This is " org.docx4j.wml.Text@4a58fee6 --> whith a value --> "my <placeholder1" org.docx4j.wml.Text@4a58fee6 --> whith a value --> "> and this is my <" org.docx4j.wml.Text@4a58fee6 --> whith a value --> "placeholder2>!"

I used Word2010 and docx4j version 2.8.1

by **jason** » Mon Nov 18, 2013 9:16 pm

Yes, the problem with variable replacement approaches is "split runs".

You may be able to work around some/most issues by running your docx through https://github.com/plutext/docx4j/blob/ ... epare.java

Alternatively, you could use content control data binding instead, which isn't affected by split runs

by **tori** » Mon Nov 18, 2013 9:42 pm

Thanks jason,

I will try to fix it via VariablePrepare. The only VariablePrepare class file available is: org.docx4j.samples.VariablePrepare without the static prepare() method; Can i found that in a jar-file or should i do copy and past of your posted implementation?

Traversing DocX-File

Traversing DocX-File

Re: Traversing DocX-File

Re: Traversing DocX-File

Re: Traversing DocX-File

Re: Traversing DocX-File

Who is online