Chapter 8 Working with XML Documents


Working with text

This section includes a description of how to access and work with the text that is contained in XML documents.

Text occurs:

When an element or attribute has no child nodes apart from a single text node, you can access the text content of an element or attribute using the nodeValue of the element's child.

If an element contains no other elements or other non-text pieces, the only child of the element is a text node. The nodeValue of this text node is a string. This example writes out the text of an element:

document.writeln( elemnode.firstChild.nodeValue )

For an attribute node, this example writes out the value of the node:

document.writeln( attnode.nodeValue )

Some elements contain several text nodes among their children. Consider the following cases:

<Names>Sammy &amp; Rosie</Names>

The first child is a text node with value "Sammy". The second child is an entity node representing the ampersand character. The third child is a text node with value "Rosie".

For more information, see "Working with entities".

Obtaining all the text inside an element

To obtain all the text inside an element, including its children, you can use the getElementsByTagName method of the root element, with the special value "*", which means all tags. The following function demonstrates this technique.

function listTextOfAllElements(rootelement){
  var elemlist, elem, child, i, j ;
  elemlist = rootelement.getElementsByTagName( "*" );
  for( i = 0 ; i < elemlist.length ; i ++ ){
    elem = elemlist.item(i);
    for( j = 0 ; j < elem.childNodes.length ; j ++ ){
      child = elem.childNodes.item(j);
      if( child.nodeType == 3) { // 3 is a text node
        document.writeln( child.nodeValue );
      }
    }
  }
}

Working with CDATA sections

CDATA sections provide a way to include blocks of text in XML documents even if the text contains characters that would otherwise be recognized as markup. CDATA sections start with <![CDATA . All characters inside a CDATA section, including angle brackets and ampersands, are seen as text data until the marker for the end of the section, which is ]]> , is reached.

Here are some examples of CDATA sections:

The text content of a CDATA section is the nodeValue of the object. For example, the following fragment writes out the content of a node if it is a CDATA section node:

if (child.nodeType == 4 ){ //CData Section
    document.writeln( child.nodeValue ) ;
}

Escaping text with xmlEscape

Dynamo includes the xmlEscape function to assist with preparing text for use in XML documents. The prototype is as follows:

string xmlEscape( input_string [, use_CDATA  ] )

This function encodes '&', '<', and '>' characters in a string and returns the encoded string. The input_string parameter is the string to be encoded. The optional use_CDATA parameter dictates whether a CDATA section will be used to encode the characters. If not provided, this parameter defaults to false. If use_CDATA is false, then characters are encoded using the 'ampersand' method of encoding.

Examples

document.writeln( xmlEscape( "<MyTag>Hello!</MyTag>" ));

This script produces this output:

&lt;MyTag&gt;Hello!&lt;/MyTag&gt; 

This script:

document.writeln( xmlEscape( "Calvin & Hobbs", true ) ); 

produces this output:

<![CDATA[Calvin & Hobbs]]>

 


Copyright © 2001 Sybase, Inc. All rights reserved.