Greetings,
I'm using docx4j 6.0.1 as a new docx4j user. I'm working on a text conversion project for a library of 20+ year old document in .doc format, using fonts of that era. With Word 2016 I open a document and save to the .docx format. The documents appear fine in Word but I've encountered odd issues with encoding.
Unzipping the .docx file and reviewing the word/document.xml file I found that within w:t tags the letter "A" is encoding as 0xF041, for example . I think its something to do with the older document font, I can create an entirely new document with the font, type in "A" and get this result.
Scanning the text string and performing the bit operation: 0x00FF & myChar , will get 0xF041 back to 0x41 as expected. How is this situation normally handled? Does docx4j have utilities to decode text like this?
thank you,
-Daniel