Homework Assignment 5 (XML Document Processing)

Note: You must prepare your homework as plain text files, and send them to the TA at evirt@iis.sinica.edu.tw by 14:20 pm, December 31, 2008, Taipei Time. In your e-mail to the TA, the message has an empty body but with four attachments: readme.html, letter.dtd, letter.xsd, and letter.rng. Your e-mail must have the exact Subject: [TRC98XML-5]:student-id, where student-id is your student ID (for example, [TRC98XML-3]:B94705001). No late homework will be accepted.

This homework assignment asks you to write in three schema languages (XML DTD, XML Schema, and Relax NG) the schema documents that best specify the structure in the following "customer letter" XML documents. We describe informally the document structure of these letters as the following.

In total, there are 8 element types allowed in the document: letter, title, body, customer, name, product, bold, italic. The document root element must be a letter element.

The letter has two elements as children: a title element followed by a body element. Inside the body element is an interleaved sequence of two kinds of content. The first is called "marked up text" which we will describe shortly. The second is a sequence of two elements: a customer element followed by a product element. The title element under the letter element also has "marked up text" as its content.

A customer element has two elements as children: a title element followed by a name element. The title element has an empty content, and must have an attribute prefix whose values can only be one of Mr, Mrs, or Miss. The name element always has text (#PCDATA in XML DTD) as it content. Note that the title element under the customer element and the title element under the letter element have different content models. The product element has "marked up text" as its content.

A "marked up text" is an interleaved sequence of two kinds of content: text, and any number of either bold or italic elements. Both bold and italic elements have "marked up text" as their contents as well. However, inside an bold element there must be no bold element as its child or descendant. Likewise, inside an italic element there must be no italic element as its child or descendant.

The following is a valid letter:

<?xml version="1.0" encoding="iso-8859-1"?>
<letter>
 <title>
  Exciting new products for the <bold>Christmas</bold> season!
 </title>
 <body>
  Dear <customer><title prefix="Miss"/><name>Lin</name></customer>, I am writing
  to you about the <product><italic>SoCute</italic></product>
  line of new notebooks. These notebooks start at the
  <italic><bold>unbelievable</bold> price of NT$ 5,000 each</italic>,
  and is ready to order. <bold>Get yours now!</bold>
  I wish you a most happy holiday.
 </body>
</letter>

The following letter is an invalid XML document:

<?xml version="1.0" encoding="iso-8859-1"?>
<letter>
 <title>
  Exciting new products for the <bold><bold>Christmas</bold></bold> season!
 </title>          <!-- bold inside bold ^^^ -->
 <body>            <!-- vvv title shall have empty content -->
  Dear <customer><title>Miss</title><name>Lin</name></customer>, I am writing
  to you about the <italic>SoCute</italic>  <!-- Missing product element -->
  line of new notebooks. These notebooks start at the
  <italic><bold><italic>un</italic>believable</bold> <!-- italic inside italic -->
  price of NT$ 5,000 each</italic>,
  and is ready to order. <bold>Get yours now!</bold>
  I wish you a most happy holiday.
 </body>
</letter>

Note that XML DTD (perhaps even XML Schema) may not be expressive enough to describe some of the above constraints. In such a case, you shall design your schema in such a way that the set of documents it will accept as valid shall include all the valid documents (as required above); however, this set of documents shall be as small as possible.

You may use the following online services to validate your schema documents (and the document instances):

  1. XML DTD: http://validator.w3.org/
  2. XML Schema: http://www.w3.org/2001/03/webdata/xsv, or http://tools.decisionsoft.com/schemaValidate/
  3. Relax NG: http://validator.nu/

You can also install these and other validators at your own PC as standalone applications so as to help you write schema documents and validate document instances.

Name your schema documents as letter.dtd, letter.xsd, and letter.rng, respectively for the schema languages XML DTD, XML Schema, and Relax NG. Also prepare a HTML document readme.html explaining your schema designs.

Hint: It is probably easier to start with Relax NG, then translate what you have into XML Schema and XML DTD.