2008-09-23
Tyng-Ruey Chuang
trc@iis.sinica.edu.tw
Institute of Information Science
Academia Sinica, Taipei, Taiwan
We shall study Extensible Markup Language (XML) 1.0 (Fourth Edition)
which is a W3C Recommendation published on August 16, 2006.
Link:
http://www.w3.org/TR/2006/REC-xml-20060816
For a gentle introduction to XML, read pages 1-85 of XML in a Nutshell (3rd edition) by Elliotte Rusty Harold & W. Scott Means. This desktop reference is published by O'Reilly and is highly recommended.
Subjects to be covered this week:
Case studies:
Document:
document ::= prolog element Misc*
Prolog:
prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"') Eq ::= S? '=' S? VersionNum ::= '1.0' Misc ::= Comment | PI | S
Element & Content:
element ::= EmptyElemTag | STag content ETag content ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)*
White Space:
S ::= (#x20 | #x9 | #xD | #xA)+
The text in an XML document consists of intermingled character data and markup. All text that is not markup constitutes the character data of the document.
Character Data:
CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)
Markup takes the form of:
start-tags,
end-tags,
empty-element tags,
entity references,
character references,
comments,
CDATA section delimiters,
document type declarations,
processing instructions,
XML declarations,
text declarations, and
any white space that is at the top level of the document entity (that is, outside the document element and not inside any other markup).
(General) Entity references (for example):
"<
" is used for "<"
"&
" is used for "&"
">
" is used for ">"
"
is used for "
'
is used for '
(Numeric) Character references (for example):
"&
" is used for "&"
"&
" is used for "&"
"堃
" is used for "堃"
Comments (for example):
<!-- declarations for <head> & <body> -->
CDATA sections (for example):
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd" [ <!ATTLIST polygon area CDATA #IMPLIED circumference CDATA #IMPLIED> ]> <svg width="100" height="100" viewBox="0 0 100 100" xmlns="http://www.w3.org/2000/svg"> <defs> <style type="text/css"><![CDATA[ polygon { fill: green; stroke: black; stroke-width: 1} ]]> </style> </defs> <polygon id="_10001001" points="10,10 10,90 90,90 90,10" area="6400" circumference="320"/> </svg>
Processing instructions (for example):
<?xml-stylesheet href="person.css" type="text/css"?>
A valid document includes a document type declaration that identifies the DTD (Document Type Definition) that the document satisfies. A DTD contains or points to the following kinds of markup declarations.
Element Type declarations: An element type can have
element content, mixed content, EMPTY
content, or ANY
content.
<!ELEMENT br EMPTY> <!ELEMENT container ANY> <!ELEMENT div1 (head, (p | list | note)*, div2*)> <!ELEMENT p (#PCDATA|a|ul|b|i|em)*> <!ELEMENT b (#PCDATA)>
Attribute-List declarations specify the name, data type, and default value (if any) of each attribute associated with a given element type. There are 10 attribute types: CDATA, NMTOKEN, NMTOKENS, enumeration, ENTITY, ENTITIES, ID, IDREF, IDREFS, NOTATION.
General Entity declarations (for example):
<!ENTITY lt "&#60;">
CDATA:
<!ELEMENT img EMPTY> <!ATTLIST img src CDATA #REQUIRED alt CDATA #REQUIRED> <img src="../images/MonaLisa.png" alt="Mona Lisa"/>
NMTOKEN/NMTOKENS:
<!ELEMENT performance (#PCDATA)> <!ATTLIST performance dates NMTOKENS #REQUIRED> <performance dates="2001-08-21 2001-08-23 2001-08-27">King Lear</performace>
Enumeration:
<!ELEMENT date EMPTY> <!ATTLIST date year (2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008) #REQUIRED> <!ATTLIST date month (Jan | Feb | Mar | Apr | May | Jun | JUL | Aug | Sep | Oct | Nov | Dec) #REQUIRED> <!ATTLIST date day (1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31) #REQUIRED> <date year="2007" month="Mar" day="6" />
ID/IDREF/IDREFS:
<!ELEMENT animal EMPTY> <!ELEMENT person EMPTY> <!ELEMENT angel EMPTY> <!ATTLIST animal name CDATA #REQUIRED id ID #REQUIRED> <!ATTLIST person name CDATA #REQUIRED id ID #REQUIRED> <!ATTLIST angel name CDATA #REQUIRED id ID #REQUIRED> <!ELEMENT team EMPTY> <!ATTLIST team name CDATA #REQUIRED members IDREFS #REQUIRED> <animal name="pooh" id="777"/> <animal name="dodo" id="778"/> <person name="andrea" id="012"/> <person name="dodo" id="013"/> <angel name="winnie" id="912"/> <team name="fantasy" members="777 912 012 013"/>
Attribute Defaults: #IMPLIED, #REQUIRED, #FIXED, Literal.
<!ATTLIST termdef id ID #REQUIRED name CDATA #IMPLIED> <!ATTLIST list type (bullets|ordered|glossary) "ordered"> <!ATTLIST form method CDATA #FIXED "POST">
#IMPLIED: This attribute is optional. An element may or may not provide a value for the attribute. No default value is provided.
#REQUIRED: This attribute is required. Each element must provide a value for the attribute. No default value is provided.
#FIXED: The attribute value is a constant. An element may or may not include this attribute. If included, it must have the specific value.
Literal: The actual value is given as a quote string.
An example of external DTD:
<?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xml:lang="en-US"> <head><title>An Example</title></head> <body><p>This is a test.</p></body> </html>
An example of internal DTD:
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]> <greeting>Hello, world!</greeting>
Both used at the same time:
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE greeting SYSTEM "hello.dtd" [ <!ELEMENT greeting (#PCDATA)> ]> <greeting>Hello, world!</greeting>Entity and attribute-list declarations in the internal subsets take precedence over those in the external subset.
A standalone XML document: External markup declarations have no effect on the XML document.
<?xml version="1.0" standalone="yes"?>
Every XML document must be well-formed. The document must adhere to rules like the following:
This is not an exhausted list! For a complete list of well-formedness constraints, check out the XML Recommendation.
An XML document is valid if, in addition to being well-formed, the document satisfies all the constraints specified in the XML document's DTD. Such constraints state, for example, that an element's children must occur in an order as required by the element's content model, and that the an element can only have attributes that are specified. For a complete list of validity constraints, check out the XML Recommendation.