I am using docx4j 3.2.1 to convert .docx files into .html files in real time; in other words, when the web browser requests the file, tomcat reads the .docx file, uses docx4j to convert it into a html string, then streams the content back to the browser. This is running on Linux 3.0.0-26-server. My issue is that although the text looks fine, numbers (phone numbers, zipcodes, addresses, etc.) are displaying as strange, farsi-looking characters. For example, the .docx file is using Arial to display zipcode 48174 - docx4j converts this to:
<span class="" style="">٤٨١٧٤</span>
Previous posts related to this issue suggested that I might need the mscorefonts, though those posts seemed to describe trouble converting all text - not just numbers. But I installed the msfonts, and I now get:
<span class="" style="font-family: 'Times New Roman';">٤٨١٧٤</span>
So as you can see, a font has been added to the style, but the actual text is still gibberish.
Can anyone suggest what might be wrong, or how I might go about troubleshooting this?
Thanks!
Steve