2007-03-06 & 2007-03-13
Tyng-Ruey Chuang,
<trc@iis.sinica.edu.tw>
Institute of Information Science
Academia Sinica, Taipei, Taiwan
XML shall be straightforwardly usable over the Internet.
XML shall support a wide variety of applications.
XML shall be compatible with SGML.
It shall be easy to write programs which process XML documents.
The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
XML documents should be human-legible and reasonably clear.
The XML design should be prepared quickly.
The design of XML shall be formal and concise.
XML documents shall be easy to create.
Terseness in XML markup is of minimal importance.
W3C Recommendation (16 August 2006, edited in place 29 September 2006)
Link:
http://www.w3.org/TR/2006/REC-xml-20060816
The text in an XML document consists of intermingled character data and markup.
All text that is not markup constitutes the character data of the document.
Character Data:
CharData
::= [^<&]* - ([^<&]* ']]>' [^<&]*)
Markup takes the form of:
start-tags,
end-tags,
empty-element tags,
entity references,
character references,
comments,
CDATA section delimiters,
document type declarations,
processing instructions,
XML declarations,
text declarations, and
any white space that is at the top level of the document entity (that is, outside the document element and not inside any other markup).
(General) Entity References (for example):
"<
" is used for "<"
"&
" is used for "&"
">
" is used for ">"
"
is used for "
'
is used for '
(Numeric) Character References (for example):
"&
" is used for "&"
"&
" is used for "&"
"堃
" is used for "堃"
Comments (for example):
<!-- declarations for <head> & <body> -->
CDATA section delimiters (for example):
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd" [ <!ATTLIST polygon area CDATA #IMPLIED circumference CDATA #IMPLIED> ]> <svg width="100" height="100" viewBox="0 0 100 100" xmlns="http://www.w3.org/2000/svg"> <defs> <style type="text/css"><![CDATA[ polygon { fill: green; stroke: black; stroke-width: 1} ]]> </style> </defs> <polygon id="_10001001" points="10,10 10,90 90,90 90,10" area="6400" circumference="320"/> </svg>
Processing Instruction (for example):
<?xml-stylesheet href="person.css" type="text/css"?>
A valid document includes a document type declaration that identifies the DTD (Document Type Definition) that the document satisfies. A DTD contains or points to the following kinds of markup declarations.
Element Type declarations: An element type can have
element content, mixed content, EMPTY
content, or ANY
content.
<!ELEMENT br EMPTY> <!ELEMENT container ANY> <!ELEMENT div1 (head, (p | list | note)*, div2*)> <!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*> <!ELEMENT p (#PCDATA|a|ul|b|i|em)*> <!ELEMENT b (#PCDATA)>
Attribute-List declarations specify the name, data type, and default value (if any) of each attribute associated with a given element type. There are 10 attribute types: CDATA, NMTOKEN, NMTOKENS, enumeration, ENTITY, ENTITIES, ID, IDREF, IDREFS, NOTATION.
General Entity declarations (for example):
<!ENTITY lt "&#60;">
Parameter Entity declarations (for example):
<!ENTITY % draft 'INCLUDE' > <!ENTITY % final 'IGNORE' > <![%draft;[ <!ELEMENT book (comments*, title, body, supplements?)> ]]> <![%final;[ <!ELEMENT book (title, body, supplements?)> ]]>
<!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "xhtml-lat1.ent"> %HTMLlat1;
Notation declarations.
CDATA:
<!ELEMENT img EMPTY> <!ATTLIST img src CDATA #REQUIRED alt CDATA #REQUIRED> <img src="../images/MonaLisa.png" alt="Mona Lisa"/>
NMTOKEN/NMTOKENS:
<!ELEMENT performance (#PCDATA)> <!ATTLIST performance dates NMTOKENS #REQUIRED> <performance dates="2001-08-21 2001-08-23 2001-08-27">King Lear</performace>
Enumeration:
<!ELEMENT date EMPTY> <!ATTLIST date year (2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008) #REQUIRED> <!ATTLIST date month (Jan | Feb | Mar | Apr | May | Jun | JUL | Aug | Sep | Oct | Nov | Dec) #REQUIRED> <!ATTLIST date day (1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31) #REQUIRED> <date year="2007" month="Mar" day="6" />
ID/IDREF/IDREFS:
<!ELEMENT animal EMPTY> <!ELEMENT person EMPTY> <!ELEMENT angel EMPTY> <!ATTLIST animal name CDATA #REQUIRED id ID #REQUIRED> <!ATTLIST person name CDATA #REQUIRED id ID #REQUIRED> <!ATTLIST angel name CDATA #REQUIRED id ID #REQUIRED> <!ELEMENT team EMPTY> <!ATTLIST team name CDATA #REQUIRED members IDREFS #REQUIRED> <animal name="pooh" id="777"/> <animal name="dodo" id="778"/> <person name="andrea" id="012"/> <person name="dodo" id="013"/> <angel name="winnie" id="912"/> <team name="fantasy" members="777 912 012 013"/>
Attribute Defaults: #IMPLIED, #REQUIRED, #FIXED, Literal.
<!ATTLIST termdef id ID #REQUIRED name CDATA #IMPLIED> <!ATTLIST list type (bullets|ordered|glossary) "ordered"> <!ATTLIST form method CDATA #FIXED "POST">
An example of external DTD:
<?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xml:lang="en-US"> <head><title>An Example</title></head> <body><p>This is a test.</p></body> </html>
An example of internal DTD:
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]> <greeting>Hello, world!</greeting>
Both used at the same time:
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE greeting SYSTEM "hello.dtd" [ <!ELEMENT greeting (#PCDATA)> ]> <greeting>Hello, world!</greeting>Entity and attribute-list declarations in the internal subsets take precedence over those in the external subset.
An example:
<?xml version="1.0"?> <!-- both namespace prefixes are available throughout --> <bk:book xmlns:bk='urn:loc.gov:books' xmlns:isbn='urn:ISBN:0-395-36341-6'> <bk:title>Cheaper by the Dozen</bk:title> <isbn:number>1568491379</isbn:number> </bk:book>
Namespace defaulting:
<?xml version="1.0"?> <!-- initially, the default namespace is "books" --> <book xmlns='urn:loc.gov:books' xmlns:isbn='urn:ISBN:0-395-36341-6'> <title>Cheaper by the Dozen</title> <isbn:number>1568491379</isbn:number> <notes> <!-- make HTML the default namespace for some commentary --> <p xmlns='http://www.w3.org/1999/xhtml'> This is a <i>funny</i> book! </p> </notes> </book>