Especifically i want to check that all the text in the docx is written in Arial, size 12, with double spacing between lines. The style can come applied directly on the run, on the paragraph, as default styling or from custom styles created in the docx.
From what i read the way to do it in docx4j is the following (correct me if i am wrong please):
- Read the document from origin:
- Code: Select all
WordprocessingMLPackage wordMLPackage = null;
wordMLPackage = WordprocessingMLPackage.load(new File(docPath));
- Iterate over it's paragraphs or runs:
- (paragraphs):
- Code: Select all
//Paragraphs
PropertyResolver propertyResolver = new PropertyResolver(wordMLPackage);
final String XPATH_TO_SELECT_TEXT_NODES = "//w:p";
List<Object> jaxbNodes = null;
jaxbNodes = documentPart.getJAXBNodesViaXPath(XPATH_TO_SELECT_TEXT_NODES, true);
int i=1;
//Iterate over each paragraph.
for (Object jaxbNode : jaxbNodes){
final String paragraphString = jaxbNode.toString();
System.out.println("[Start]: " + paragraphString);
P paragraph = ((P)XmlUtils.unwrap(jaxbNode) );
PPr paragraphProperties = paragraph.getPPr();
if (paragraphProperties != null && paragraphProperties.getPStyle() != null) {
String style = paragraphProperties.getPStyle().getVal();
if(style!=null){
System.out.println("The style of the paragraph "+i+" is: "+style);
}
}
//Obtain effective properties
PPr estiloPPr = propertyResolver.getEffectivePPr(paragraphProperties);
}
}
- (runs):
- Code: Select all
//Runs
final String XPATH_TO_SELECT_RUN_NODES = "//w:r";
List<Object> jaxbRunNodes = null;
wordMLPackage = WordprocessingMLPackage.load(new File(docPath));
propertyResolver = new PropertyResolver(wordMLPackage);
documentPart = wordMLPackage.getMainDocumentPart();
jaxbRunNodes = documentPart.getJAXBNodesViaXPath(XPATH_TO_SELECT_RUN_NODES, true);
for (Object jaxbRun : jaxbRunNodes){
String runString = jaxbRun.toString();
R run = ((R)XmlUtils.unwrap(jaxbRun));
if(run!=null){
RPr runProps = run.getRPr();
if(runProps!=null){
log.info("Run fonts"+ runProps.getRFonts());;
log.info("Font size"+runProps.getSz());
log.info("Run style: "+runProps.getRStyle());
log.info("Run spacing: "+runProps.getSpacing());
}else{
log.info("Run doesnt have its own styling.");
}
}
}
Even i've check the "Getting Started" guide, and made a little research, analysing how the export mechanism works in order to obtain the styling properties for each text when exporting to pdf or html, i'm finding it quite difficult to achive my styling checks.
I've read about using PropertyResolver class in oder to obtain the real styling applied to certain run, but I dont really understand 100% how to use it, because i can't see full information of styling when debbuging the above code snippets.
I've read too the Traversing a docx example (OpenMainDocumentAndTraverse.java) but i'm not able to determine a way of doing what i want as i've already said.
Any advice is really appreciated, even if it means solving the problem in another way. (i'm a noob using docx4j so feel free to advice in any way you consider a better solution).
Thanks in advance.