siva19185 wrote:Sorry currently i am using only HtmlExporterNG.
ok, but any fixes I work on will be on NG2; I'll port them to NG only if it is easy.
siva19185 wrote:But how to hadle the empty space between words other than tab. Tab we can identify but how to measure the number of spaces left between words if its more than one space from document.xml i only find <w:t xml:space="preserve"> for sentence that have more than one space btw their word!
Word will put @xml:space="preserve" in certain circumstances (where the XML spec means it is significant).
One is where you have more than one w:r in a w:p, and one of the w:r ends in a space.
Your example also makes sense.
I had a look at Word 2007's HTML output. The HTML output doesn't contain anything special unless there are adjacent spaces. In this case, Word outputs <span style='mso-spacerun:yes'> </span>
You or I need to do some experiments to see whether various browsers collapse multiple spaces to 1, whether @style='mso-spacerun:yes' makes any difference, and whether the browsers parsing mode (strict or tag soup) makes a difference. If you could look into this, I can reflect your findings in the XSLT. I guess we can try
I think tabs are more of a challenge!
siva19185 wrote:Relating to Qn 2&3 i made a docx and converted them to HTML using Word itself. The line spacing of 3 in Microsoft word is converted to line-height:300% in HTML and line spacing of 2 is converted to line-height:200% and so on in their conversion. This is applied to <p> tags style.
Similarly i found a default margin: 0 0 10pt in their converted HTML style applied to <P> tag for all word documents conversion to HTML.
I couldnt find which part of document.xml represents the above styles.
line-height:115% comes from w:spacing/@w:line="276"/240. See the spec. The w:spacing/@w:line="276" is from w:pPrDefault in the styles part.
I don't know where the magic number 240 comes from. Maybe its that we're using 11pt font + 1pt x 20? Experimenting with other font sizes might explain this.
Interestingly, Microsoft's xslt uses a hardcoded /20 (resulting unit is pt):
- Code: Select all
<xsl:template match="w:spacing[@w:lineRule or @w:line]" mode="ppr">
<xsl:choose>
<xsl:when test="not(@w:lineRule) or @w:lineRule = 'exact'">
line-height:<xsl:value-of select="@w:line div 20"/>pt;
</xsl:when>
</xsl:choose>
</xsl:template>
siva19185 wrote:Similarly i found a default margin: 0 0 10pt in their converted HTML style applied to <P> tag for all word documents conversion to HTML.
I couldnt find which part of document.xml represents the above styles.
The margin-bottom comes from w:spacing/@w:after="200", again, in w:pPrDefault. 200 means 200 twips.
NG2 is not using w:pPrDefault; I will fix that. (iirc, NG does, but it doesn't handle the @w:line)
cheers
Jason