Arabic Characters in PDF

by **mmshabeer** » Fri Apr 24, 2015 3:00 am

testarabic.pdf: OutputFile; (26.3 KiB) Downloaded 932 times

testarabic.docx: InputFile; (13.84 KiB) Downloaded 842 times

Hi,

Using Docx4j(3.2.0), tried to generate PDF from docx using 'Docx4J.toFO'.
Arabic characters are missing in PDF.

Code Snippet:

Code: Select all: private static void createTestPDF() throws Exception{ FOSettings foSettings = Docx4J.createFOSettings(); InputStream is = new FileInputStream(new File("testarabic.docx")); WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(is); //Print all available physical fonts PhysicalFonts.discoverPhysicalFonts(); Map<String, PhysicalFont> physicalFonts = PhysicalFonts.getPhysicalFonts(); Iterator<Entry<String, PhysicalFont>> availableFonts = physicalFonts.entrySet().iterator(); while(availableFonts.hasNext()) { Entry<String, PhysicalFont> font = availableFonts.next(); String key = font.getKey(); PhysicalFont pFont = font.getValue(); System.out.println("Key is " + key + ";; Name " + pFont.getName()); } Mapper fontMapper = new IdentityPlusMapper(); PhysicalFont font = PhysicalFonts.get("Arial Unicode MS"); fontMapper.put("Times New Roman", font); wordMLPackage.setFontMapper(fontMapper); foSettings.setWmlPackage(wordMLPackage); OutputStream pdfOutputStream = new FileOutputStream("testarabic.pdf"); System.out.println(foSettings.getSettings()); Docx4J.toFO(foSettings, pdfOutputStream, Docx4J.FLAG_EXPORT_PREFER_XSL); System.out.println(" Done !!!!"); }

Attached documents.

Environment : Windows 7
Java Version: 1.6
Kindly help.

Thanks.

by **jason** » Thu Apr 30, 2015 8:31 pm

Please see attached png, which shows that my docx4j PDF output is the same as the Word docx (as it appears on my system).

Note that I have font "Arabic Typesetting" installed on this machine, and it is used in the PDF output.

Not sure whether you see the асдфас stuff?

This is using current docx4j source code. If there are any differences from 3.2.0, that would be because of changes in https://github.com/plutext/docx4j/blob/ ... ector.java
(click the history button).

I see you already have something like:

Syntax: [ Download ] [ Hide ]

Using java Syntax Highlighting

// .. example of mapping font Times New Roman which doesn't have certain Arabic glyphs
// eg Glyph "ي" (0x64a, afii57450) not available in font "TimesNewRomanPS-ItalicMT".
// eg Glyph "ج" (0x62c, afii57420) not available in font "TimesNewRomanPS-ItalicMT".
// to a font which does

                PhysicalFont font 
= PhysicalFonts.get("Arial Unicode MS");
// make sure this is in your regex (if any)!!!
if(font!=null){

                        fontMapper.put("Times New Roman", font);
//fontMapper.put("Arial", font);
}
Parsed in 0.013 seconds,  using GeSHi 1.0.8.4

Do you have Arial Unicode MS installed?

Try uncommenting fontMapper.put("Arial", font);
since arial is used for the асдфас stuff

by **Asttle** » Fri Feb 02, 2018 9:46 pm

Can u mention the dependencies for this code because i am also using the same code but i am getting exxceptions

Arabic Characters in PDF

Arabic Characters in PDF

Re: Arabic Characters in PDF

Re: Arabic Characters in PDF

Who is online