From
http://www.documentinteropinitiative.or ... ec913.aspxFields shall be implemented in XML using either of two approaches:
• As a simple field implementation, using the fldSimple element, or
• As a complex field implementation, using a set of runs involving the fldChar and instrText elements.
For a simple field implementation, only one element, fldSimple, shall be used, in which case, its instr attribute shall contain a field, and the body of the element shall contain the most recently updated field result. [Example: Here is the corresponding XML for a simple field implementation of DATE:
<w:r>
<w:fldSimple w:instr="DATE"> 12/31/2005 </w:fldSimple>
</w:r>
end example]
For a complex field implementation, a set of runs shall be used with each run containing, in sequence, the following elements:
• fldChar with attribute fldCharType value begin,
• One or more instrText elements, which, collectively, contain a complete field,
• Optionally,
• fldChar with attribute fldCharType value separate, which separates the field from its field result,
• Any number of runs and paragraphs that contains the most recently updated field result, and
• fldChar with attribute fldCharType value end.
[Note: Fields that are for display purposes only have no need to, and do not, store a field result. end note][Example: Here is the corresponding XML for a complex field implementation of DATE:
<w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
<w:instrText xml:space="preserve"> DATE </w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r>
<w:t>12/31/2005</w:t>
</w:r>
<w:r>
<w:fldChar w:fldCharType="end"/>
</w:r>
end example]
[Note: Every simple field implementation for a given field has a corresponding complex field implementation. However, not every complex field implementation has a corresponding simple field implementation. If some characters in a field have different run properties than others, that field must be implemented using multiple runs, and that requires that complex field implementation be used. For an example, see §2.16.4.3, where the first letter of a DATE field is made bold, underlined, and red, while the other letters have none of these properties. end note]
As shown in §2.16.1, the instruction of one field can be another field, allowing fields to nest. In such cases, the XML run sequence for the inner field is defined at the point of reference for that inner field, inside the outer field's XML run sequence. [Example: Consider the following sentence:
It's IF DATE \@ "M-d"<>"1-1" "not " new year's day.
The IF field contains the nested field DATE \@ "M-d". When updated, on January 1 of any year, the result sentence is "It's new year's day." On all other days of the year, the resulting sentence is "It's not new year's day."
Note that fields can be nested.
The section before contains an overview of the syntax:
The general syntax of a field is as follows:
field:
field-type [ instruction ]
field-type:
date-and-time
document-automation
document-information
equations-and-formulas
index-and-tables
links-and-references
mail-merge
numbering
user-information
form-field
date-and-time:
CREATEDATE | DATE | EDITTIME | PRINTDATE | SAVEDATE | TIME
document-automation:
COMPARE | DOCVARIABLE | GOTOBUTTON | IF | MACROBUTTON | PRINT
document-information:
AUTHOR | COMMENTS | DOCPROPERTY | FILENAME | FILESIZE | INFO
| KEYWORDS | LASTSAVEDBY | NUMCHARS | NUMPAGES | NUMWORDS | SUBJECT
| TEMPLATE | TITLE
equations-and-formulas:
= formula | ADVANCE | EQ | SYMBOL
index-and-tables:
INDEX | RD | TA | TC | TOA | TOC | XE
links-and-references:
AUTOTEXT | AUTOTEXTLIST | BIBLIOGRAPHY | CITATION | HYPERLINK | INCLUDEPICTURE | INCLUDETEXT
| LINK | NOTEREF | PAGEREF | QUOTE | REF | STYLEREF
mail-merge:
ADDRESSBLOCK | ASK | COMPARE | DATABASE | FILLIN | GREETINGLINE | IF
| MERGEFIELD | MERGEREC | MERGESEQ | NEXT | NEXTIF | SET | SKIPIF
numbering:
AUTONUM | AUTONUMLGL | AUTONUMOUT | BARCODE | LISTNUM | PAGE | REVNUM
| SECTION | SECTIONPAGES | SEQ
user-information:
USERADDRESS | USERINITIALS | USERNAME
form-field:
FORMCHECKBOX | FORMDROPDOWN | FORMTEXT
instruction:
field
field-argument
switches
field-argument switches
switches field-argument
field-argument:
[ " ] text [ " ]
switches:
switch
switch switches
switch:
formatting-switch
field-specific-switch
formatting-switch:
date-and-time-formatting-switch
numeric-formatting-switch
general-formatting-switch
field-specific-switch:
\field-switch-character [ field-argument ]
field-switch-character:
!
one or two Latin letters
formula is discussed in §2.16.3, and formatting-switches are discussed in §2.16.4.
If the text in a field-argument contains white space, the delimiting double-quote characters shall be present; otherwise, they are optional. To include a double-quote character in text, it shall be preceded with a backslash (\). [Example: The field argument "\"name\"" results in the argument's actually being "name". end example] To include a backslash character in text, it shall be preceded with another backslash (\). [Example: File system pathnames on some systems use a backslash as a directory separator, as in the field
INCLUDETEXT "E:\\ReadMe.txt"
in which case, each such separator needs to be preceded with a backslash, as shown above. end example]
Arbitrary amount of white space can occur before the first token, after the last token, and between successive tokens, including no white space at all.
The first step to adding higher-level support for fields to docx4j would be to create a field parser - preferably one supporting nested fields. This requires parsing either the flat JAXB element or XML representations...
It would also be worth articulating the use cases we are trying to support.