RithanyaLaxmi wrote:Thanks Jason, the problem I am having is that i am creating the .docx XML on my own rather than generating it from the .docx file, hence for me to the do relationships part manually is tough. Hence how to get the perfect XML with the matching relationship and parts to generate a PDF?
Assuming the header/footer and the rest of your document always stays the same, and all you want to do is insert the data inti it, as a table say, one approach would be to save as xml from Word 2007 to get the XML file (this is Flat OPC XML). Then save as one String the Flat OPC XML up until your data table, and as another String the bit from there to the end.
Now you can create your entire XML file = string1 + datatable + string2.
But that approach doesn't need docx4j at all, at least until you decide you want PDF output.
The other approach is to use docx4j. You could use it to build the document from scratch (ie creating header parts etc, if you need them). Or, you could use docx4j to open an existing document, add in your data table, and save the document (possibly using a new name).
RithanyaLaxmi wrote: I am using MS Word API to generate .docx which contains the data fetched from DB, in which i am applying the respective styles, fonts, symbols, etc. If the data fetched from the DB is quite huge, then there is a problem in displaying those data in the .docx file.
Problem in Word? How many pages is your document? Are you displaying the data in a table? If so, how many rows?
RithanyaLaxmi wrote:I found that internally MS Word 2007 will write some content through tags which may not be needed to display the data. Hence i am figuring out what are the necessary MS Word tags needed when converting into a .xml file. So that i can avoid unnecessary tags and build only the respective tags which are needed to display the data. Hence i am planning to write my own .xml with the MS Word tags which are needed, than generating a .XML from .docx file
My queries are:-
1) Whether it is right that the MS Word will generate some tags which may not be needed during the conversion of .docx to document.xml? That makes it heavy? If so what are the tags , so that i can avoid them when write by own .xml file.
2) Please send links to understand about the MS Word tags and its advantages, which tags are needed and which are not ?
Well certainly you can do without the rsid's and the proofing (grammar/spelling).
And Word does add a number of parts which aren't strictly necessary. But I'm not convinced that trying to get rid of tags will make much difference.
As far as I know, the content of a docx is identical to the Flat OPC XML (on a semantic level that is - of course the former is zipped up etc).
It'd be interesting to know if there are some performance limitations with the Flat OPC XML format .. I'm not aware of any.
RithanyaLaxmi wrote:3) Whether my approach to write a new .xml similar to document.xml (.docx conversion) is worthy one to go forward so that i can build the .xml with the tags i needed , so that i can improve the performance of the data display?
See above.
RithanyaLaxmi wrote:4) I want to know whether the "bleed" which is done through the Word API is supported in WordML, if what is the tag for it? I tested with vertAlign didn't work through. Similarly "WaterMark" , "Pagebreak" are supported in WordML? Is there any links to find that what are the features WordML suuports when compared to Word API.
What is bleed? WordML can represent anything that Word can put in a document. So, add bleed (whatever it is) to the document, save it, and open it in package explorer to see how it is represented. Watermark might be found in the header part? Pagebreaks are definitely supported.
RithanyaLaxmi wrote:When done some investigation on this, i found that there are some of the points which take be taken care :-
1) Unnecessary Namespaces can be avoided , by default when generated through Word will include all the namespaces.
2) The <w:sectPr> element displayed at the end which contains the layout information will be created by default when generated through Word. Which can be avoided.
As I mentioned above, I think you are wasting your time trying to get rid of these things.