Chapter 8 Working with XML Documents
XML is a framework for creating markup languages. It has been designed specifically for use on the Web.
Here is an example XML document:
<?xml version="1.0"?> <Example> <Computer> <Name>Toshiba</Name> <Processor>Pentium</Processor> <Memory units="MB">96</Memory> </Computer> </Example>
Any complete XML document starts with an XML declaration (the first line in the example above).
Other attributes can be added to this declaration, as described in any XML book.
An XML document must be valid, well-formed, or both.
To define a set of tags for use in a particular application, XML uses a separate document named a document type definition (DTD). A DTD states what tags are allowed in an XML document and defines rules for how those tags can be used in relation to each other. It defines the elements that are allowed in the language, the attributes each element can have, and the type of information each element can hold. Documents can be verified against a DTD to ensure that they follow all the rules of the language. A document that satisfies a DTD is said to be valid.
The DTD for documents like the example given above might look something like this:
<!DOCTYPE Example [ <!ELEMENT Computer (Name, Processor, Memory)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Processor (#PCDATA)> <!ELEMENT Memory (#PCDATA)> <!ATTLIST Memory units #REQUIRED> ]>
If a document uses a DTD, the DTD must immediately follow the declaration.
The second way to specify XML syntax is to assume that a document is using its language properly. XML provides a set of generic syntax rules that must be satisfied, and as long as a document satisfies these rules, it is said to be well-formed. All valid documents must be well-formed.
Processing well-formed documents is faster than processing valid documents because the parser does not have to verify against the DTD. When valid documents are transmitted, the DTD must also be transmitted if the receiver does not already possess it. On the other hand, well-formed documents can be sent without other information.
XML documents should conform to a DTD if they are going to be used by more than one application. If they are not valid, there is no way to guarantee that various applications will be able to understand each other.
For simplicity, most example in this chapter use well-formed documents with no DTD.
There are a few additional restrictions on XML than HTML, which make parsing of XML simpler. Unlike in HTML, you cannot omit tags. This guarantees that parsers know where elements end. The following example is acceptable HTML, but not XML:
<table> <tr> <td>Dog</td> <td>Cat <td>Mouse </table>
To make this well-formed XML, you need to add all the missing end tags:
<table> <tr> <td>Dog</td> <td>Cat</td> <td>Mouse</td> </tr> </table>
Empty elements cannot be represented in XML the same way they are in HTML. An empty element is one that is not used to mark up data, so in HTML, there will not be an end tag. There are two ways to handle empty elements. The first is to place a dummy tag immediately after the start tag. For example:
<img href="picture.jpg"></img>
The second method is to use a slash character at the end of the initial tag:
<img href="picture.jpg"/>
This tells a parser that the element consists only of one tag.
XML is case sensitive, which allows
it to be used with non-Latin alphabets. You must ensure that letter
case matches in start and end tags: <MyTag>
and </Mytag>
belong
to two different elements.
White space within tags in XML is unchanged by parsers.
All XML elements must be properly nested. All child elements must be closed before their parent elements close.
Copyright © 2001 Sybase, Inc. All rights reserved. |