I'm completely new to docx4j, so please excuse my naive question.
I am trying to convert an MS Word 2007 docx file to html.
The docx files I want to convert have mathematical equations which i designed in MS Word using the equation editor, so the equations are not images. When I use the sample code to convert the docx file to html, it works very well, except that the equations are ignored. There is no html output where the equations should be. The logs suggest that certain math tags are ignored:
- Code: Select all
7003 [main] WARN org.docx4j.convert.out.html.HtmlExporterNG2 - NOT IMPLEMENTED: support for m:oMath;
7205 [main] WARN org.docx4j.convert.out.html.HtmlExporterNG2 - NOT IMPLEMENTED: support for m:oMathPara;
I see that docx4j has a package 'org.docx4j.math' with lots of classes so I assume that there is a way to make equations work using docx4j but it is not obvious to me.
I attached my docx file that I'm trying to convert.
This is my code, copied from the samples:
- Code: Select all
package docxconverter;
import java.io.File;
import java.io.OutputStream;
import org.docx4j.convert.out.Containerization;
import org.docx4j.convert.out.html.AbstractHtmlExporter;
import org.docx4j.convert.out.html.HtmlExporterNG2;
import org.docx4j.convert.out.html.SdtWriter;
import org.docx4j.convert.out.html.TagSingleBox;
import org.docx4j.convert.out.html.AbstractHtmlExporter.HtmlSettings;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
/**
* If the source docx contained a WMF, that
* will get converted to inline SVG. In order
* to see the SVG in your browser, you'll need
* to rename the file to .xml or serve
* it with MIME type application/xhtml+xml
*
*/
public class DocxToHtmlConverter {
public static void main(String[] args)
throws Exception {
File inputfilepath = null;
try {
inputfilepath = new File("C:/Users/Keith/Desktop", "IndividualQuestion.docx");
} catch (IllegalArgumentException e) {
e.printStackTrace();
}
System.out.println(inputfilepath);
boolean save = true;
// Load .docx or Flat OPC .xml
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(inputfilepath);
AbstractHtmlExporter exporter = new HtmlExporterNG2();
HtmlSettings htmlSettings = new HtmlSettings();
htmlSettings.setImageDirPath(inputfilepath + "_files");
htmlSettings.setUserBodyTop("<H1>TOP!</H1>");
htmlSettings.setUserBodyTail("<H1>TAIL!</H1>");
// Sample sdt tag handler (tag handlers insert specific
// html depending on the contents of an sdt's tag).
// This will only have an effect if the sdt tag contains
// the string @class=XXX
// SdtWriter.registerTagHandler("@class", new TagClass() );
SdtWriter.registerTagHandler(Containerization.TAG_BORDERS, new TagSingleBox());
SdtWriter.registerTagHandler(Containerization.TAG_SHADING, new TagSingleBox());
OutputStream os;
if (save) {
os = new java.io.FileOutputStream(inputfilepath + ".html");
} else {
os = System.out;
}
javax.xml.transform.stream.StreamResult result = new javax.xml.transform.stream.StreamResult(os);
exporter.html(wordMLPackage, result, htmlSettings);
if (save) {
System.out.println("Saved: " + inputfilepath + ".html using " + exporter.getClass().getName());
}
}
}
Here is the complete log of the output from using the above code together with the attached file:
- Code: Select all
C:\Users\Keith\Desktop\IndividualQuestion.docx
log4j:WARN No appenders could be found for logger (org.docx4j.utils.ResourceUtils).
log4j:WARN Please initialize the log4j system properly.
11 [main] INFO org.docx4j.utils.Log4jConfigurator - Since your log4j configuration (if any) was not found, docx4j has configured log4j automatically.
291 [main] INFO org.docx4j.jaxb.Context - JAXB: RI not present. Trying Java 6 implementation.
291 [main] INFO org.docx4j.jaxb.Context - JAXB: Using Java 6 implementation.
291 [main] INFO org.docx4j.jaxb.Context - loading Context jc
5380 [main] INFO org.docx4j.jaxb.Context - loaded com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl .. loading others ..
5575 [main] INFO org.docx4j.jaxb.Context - .. others loaded ..
5584 [main] INFO org.docx4j.openpackaging.contenttype.ContentTypeManager - Detected WordProcessingML package
5594 [main] INFO org.docx4j.openpackaging.parts.Part - /_rels/.rels
5595 [main] INFO org.docx4j.openpackaging.parts.relationships.RelationshipsPart - unmarshalling org.docx4j.openpackaging.parts.relationships.RelationshipsPart
5603 [main] INFO org.docx4j.openpackaging.parts.Part - /docProps/app.xml
5604 [main] INFO org.docx4j.openpackaging.parts.DocPropsExtendedPart - unmarshalling org.docx4j.openpackaging.parts.DocPropsExtendedPart
5606 [main] INFO org.docx4j.openpackaging.parts.Part - /docProps/core.xml
5607 [main] INFO org.docx4j.openpackaging.parts.DocPropsCorePart - unmarshalling org.docx4j.openpackaging.parts.DocPropsCorePart
5610 [main] INFO org.docx4j.openpackaging.parts.Part - /word/document.xml
5610 [main] INFO org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart - For MDP, unmarshall via binder
5765 [main] INFO org.docx4j.openpackaging.parts.Part - /word/_rels/document.xml.rels
5765 [main] INFO org.docx4j.openpackaging.parts.relationships.RelationshipsPart - unmarshalling org.docx4j.openpackaging.parts.relationships.RelationshipsPart
5767 [main] INFO org.docx4j.openpackaging.parts.Part - /word/webSettings.xml
5769 [main] INFO org.docx4j.openpackaging.parts.Part - /word/settings.xml
5774 [main] INFO org.docx4j.openpackaging.parts.Part - /word/styles.xml
5792 [main] INFO org.docx4j.openpackaging.parts.Part - /word/theme/theme1.xml
5793 [main] INFO org.docx4j.openpackaging.parts.ThemePart - unmarshalling org.docx4j.openpackaging.parts.ThemePart
5806 [main] INFO org.docx4j.openpackaging.parts.Part - /word/fontTable.xml
6488 [main] WARN org.docx4j.fonts.PhysicalFonts - file:/C:/Windows/FONTS/35191___.TTF is not embeddable; ignoring this font.
6489 [main] WARN org.docx4j.fonts.PhysicalFonts - file:/C:/Windows/FONTS/40240___.TTF is not embeddable; ignoring this font.
6491 [main] WARN org.docx4j.fonts.PhysicalFonts - file:/C:/Windows/FONTS/50416___.TTF is not embeddable; ignoring this font.
6492 [main] WARN org.docx4j.fonts.PhysicalFonts - file:/C:/Windows/FONTS/51253___.TTF is not embeddable; ignoring this font.
6495 [main] WARN org.docx4j.fonts.PhysicalFonts - file:/C:/Windows/FONTS/57930___.TTF is not embeddable; ignoring this font.
6495 [main] WARN org.docx4j.fonts.PhysicalFonts - file:/C:/Windows/FONTS/57961___.TTF is not embeddable; ignoring this font.
6497 [main] WARN org.docx4j.fonts.PhysicalFonts - file:/C:/Windows/FONTS/63193___.TTF is not embeddable; ignoring this font.
6498 [main] WARN org.docx4j.fonts.PhysicalFonts - file:/C:/Windows/FONTS/65659___.TTF is not embeddable; ignoring this font.
6499 [main] WARN org.docx4j.fonts.PhysicalFonts - file:/C:/Windows/FONTS/70214___.TTF is not embeddable; ignoring this font.
6500 [main] WARN org.docx4j.fonts.PhysicalFonts - file:/C:/Windows/FONTS/70729___.TTF is not embeddable; ignoring this font.
6501 [main] WARN org.docx4j.fonts.PhysicalFonts - file:/C:/Windows/FONTS/75678___.TTF is not embeddable; ignoring this font.
6503 [main] WARN org.docx4j.fonts.PhysicalFonts - file:/C:/Windows/FONTS/78640___.TTF is not embeddable; ignoring this font.
6503 [main] WARN org.docx4j.fonts.PhysicalFonts - file:/C:/Windows/FONTS/78936___.TTF is not embeddable; ignoring this font.
6504 [main] WARN org.docx4j.fonts.PhysicalFonts - file:/C:/Windows/FONTS/89198___.TTF is not embeddable; ignoring this font.
6507 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/ALGER.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6539 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/BAUHS93.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6542 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/BERNHC.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6561 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/BROADW.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6580 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/CHILLER.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6620 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/ELEPHNTI.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6635 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/Gabriola.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6645 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/GIGI.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6656 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/HARLOWSI.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6657 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/HARNGTON.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6658 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/HATTEN.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6659 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/HP%20PSG.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6661 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/impact.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6664 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/ITCBLKAD.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6666 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/JOKERMAN.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6666 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/JUICE___.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6732 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/PLAYBILL.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6755 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/SNAP____.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6755 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/STENCIL.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6763 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/TEMPSITC.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6770 [main] WARN org.docx4j.fonts.PhysicalFonts - Aborting: file:/C:/Windows/FONTS/TT0131M_.TTF (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
6818 [main] INFO org.docx4j.openpackaging.parts.WordprocessingML.StyleDefinitionsPart - Style with name Normal, id 'Normal' is default paragraph style
6818 [main] INFO org.docx4j.openpackaging.parts.WordprocessingML.StyleDefinitionsPart - Style with name Default Paragraph Font, id 'DefaultParagraphFont' is default character style
6911 [main] INFO org.docx4j.convert.out.html.HtmlExporterNG2 - /pkg:package
6932 [main] INFO org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart - Preparing StyleTree
6940 [main] WARN org.docx4j.model.properties.Property - Font 'Arial' is not mapped to a physical font.
6940 [main] WARN org.docx4j.model.properties.Property - No mapping from null
6940 [main] WARN org.docx4j.convert.out.html.AbstractHtmlExporter - ! null rPr for character style DefaultParagraphFont
7003 [main] WARN org.docx4j.convert.out.html.HtmlExporterNG2 - NOT IMPLEMENTED: support for m:oMath;
7205 [main] WARN org.docx4j.convert.out.html.HtmlExporterNG2 - NOT IMPLEMENTED: support for m:oMathPara;
7207 [main] WARN org.docx4j.convert.out.html.HtmlExporterNG2 - NOT IMPLEMENTED: support for m:oMathPara;
7210 [main] WARN org.docx4j.convert.out.html.HtmlExporterNG2 - NOT IMPLEMENTED: support for m:oMathPara;
7212 [main] WARN org.docx4j.convert.out.html.HtmlExporterNG2 - NOT IMPLEMENTED: support for m:oMathPara;
7220 [main] INFO org.docx4j.convert.out.html.HtmlExporterNG2 - wordDocument transformed to xhtml ..
Saved: C:\Users\Keith\Desktop\IndividualQuestion.docx.html using org.docx4j.convert.out.html.HtmlExporterNG2
If anyone has any pointers I would be very grateful.
Keith