Title: Creating and Reading XML documents using 'NET
1Creating and Reading XML documents using .NET
- Marios Tziakouris
- University of Cyprus
- EPL602
- Fall 2004
-
2XML Documents Can be Read Programmatically
- The .NET Framework consists of many classes to
aid in programmatically iterating through and
navigating XML documents. - These classes are found in the System.Xml
namespace.
3Accessing XML Content
- XML documents can be accessed in one of two ways
in a push model or a pull model. - The pull model loads the entire XML document into
memory, and then works with the document once it
has been completely loaded. - The push model accesses only tiny pieces of the
XML document when needed.
4Comparing and Contrasting Push and Pull Approaches
5How to use the Two Methods
- The .NET Framework provides developers both
methods - Pull Method use the DOM classes in the .NET
Framework. - Push Method use the XmlReader and XmlWriter
classes.
6Using the Pull Method
- The System.Xml namespace contains a number of
classes to work with XML documents in the DOM
paradigm - XmlDocument represents an XML document.
- XmlElement represents an individual element in
the DOM - XmlAttribute represents an attribute.
- XmlText represents text content.
7Using the Push Method
- The XmlReader reads one node at a time from a
specified XML source. The XmlReader can only
read in a FORWARD direction. - The XmlReader class cannot be used directly
instead, one of its derived classes must be used
instead - XmlNodeReader reads one node at a time from an
XML DOM. - XmlTextReader reads one node at a time from an
XML source, such as a file with XML content. - XmlValidatingReader a reader that performs DTD
or schema validation
8Iterating through an XML Document using
XmlTextReader
- To iterate through the contents of an XML
document with the XmlTextReader we need to - Specify the XML document to iterate through when
creating the XmlTextReader. - Call the Read() method, which reads in the next
Node. - Access the properties of the XmlTextReader to
determine the name, value, and other information
about the read Node.
9Iterating through an XML Document using
XmlTextReader
- We can programmatically read through the contents
of an XML file like so
// create an XmlTextReader to read the specified
XML file XmlTextReader reader new
XmlTextReader(filepath) // now, display the
information of each node in the TextBox while
(reader.Read()) // access the properties of
the XmlTextReader class... // like
reader.Name, reader.NodeType, reader.Value,
etc. // close the XmlTextReader reader.Close()
10What is a Node?
- Recall that the XmlReader classes read XML nodes.
What constitutes a node? Can you identify the
nodes in the following XML fragment?
lt?xml version1.0 encodingutf-8 ?gtltbooksgt
ltbook price34.95gt lttitlegtAnimal
Farmlt/titlegt ltauthorsgt
ltauthorgtOrwelllt/authorgt lt/authorsgt
lt/bookgtlt/booksgt
11What is a Node?
lt?xml version1.0 encodingutf-8 ?gtltbooksgt
ltbook price34.95gt lttitlegtAnimal
Farmlt/titlegt ltauthorsgt
ltauthorgtOrwelllt/authorgt lt/authorsgt
lt/bookgtlt/booksgt
The whitespace between each element (if present)
is also considered a node! (Although, you can set
the XmlTextReaders WhitespaceHandling property
to specify if the Reader should read whitespace
nodes or not.
12What is a Node?
lt?xml version1.0 encodingutf-8 ?gtltbooksgt
ltbook price34.95gt lttitlegtAnimal
Farmlt/titlegt ltauthorsgt
ltauthorgtOrwelllt/authorgt lt/authorsgt
lt/bookgtlt/booksgt
Notice that the attributes of an element are not
considered nodes...
13Creating a Program to View the Content Read by an
XmlTextReader
- We can create a program that allows the user to
select an XML file then, the contents of the XML
file are read by an XmlTextReader, with each read
nodes name, type, and value displayed.(Run
demo!)
14Reading the Attributes
- As we saw in the demo, the attributes are not
read as a separate node. - We can determine whether or not a given node has
attributes by the HasAttributes property. - In order to programmatically access the
attributes of a node, we must use the
MoveToNextAttribute() method of the XmlTextReader.
15Reading the Attributes
while (reader.Read()) // C if
(reader.HasAttributes) while
(reader.MoveToNextAttribute()) // Access
the attribute name/value via //
reader.Name/reader.Value While reader.Read //
VB.NET If reader.HasAttributes then While
reader.MoveToNextAttribute() ' Access the
attribute name/value via '
reader.Name/reader.Value End While End
If End While
16The XmlTextReader Properties and Methods
- The properties and methods of the XmlTextReader
can be found in Visual Studio .NET or in MSDN - Some other methods include
- ReadInnerXml() returns a string with the
complete content (including XML markup) of the
current nodes content (child nodes, text
content, etc.) - ReadOutterXml() returns a string containing the
nodes XML markup along with the nodes content
XML markup.
17The XmlTextReader Properties and Methods
- When reading an XML document, the XmlTextReader
class will throw an XmlException if there was an
error in parsing the XML. - An error can occur if the XML, for example, is
malformed. (That is, it is not well-formed.) - Run the XmlException demo
18Using the DOM to Iterate through an XML Document
- In contrast to the Push method (XmlReader/XmlWrite
r), the .NET Framework offers a Pull method. - Recall that the Pull method reads the entire XML
document into memory and then works with it from
there. - For this model, XML documents are represented in
the Document Object Model (DOM).
19What is the DOM?
- DOM stands for Document Object Model, and its a
model that can be used to describe an XML
document. - The DOM expresses the XML document as a hierarchy
of nodes, where each element can have zero to
many children elements. - The text content and attributes of an element are
expressed as its children as well.
20Example XML File
lt?xml version"1.0" encoding"UTF-8"
?gt ltbooksgt ltbook price"34.95"gt lttitlegtTYASP
3.0lt/titlegt ltauthorsgt ltauthorgtMitchelllt/a
uthorgt lt/authorsgt lt/bookgt ltbook
price29.95"gt lttitlegtASP.NET
Tipslt/titlegt ltauthorsgt ltauthorgtMitchelllt/
authorgt ltauthorgtWaltherlt/authorgt ltauthorgtSev
enlt/authorgt lt/authorsgt lt/bookgt lt/booksgt
21The DOM View of the XML Document
22The DOM Classes - XmlNode
- There are a number of classes in the System.Xml
namespace that represent the DOM. - Each box in the DOM model is represented in the
.NET Framework by the XmlNode class. - This means that elements, attributes, and text
values are all represented by the XmlNode class.
23Extending the XmlNode Class
- There are a number of classes that are derived
from the XmlNode class - XmlAttribute
- XmlElement
- XmlDocument
- And so on
24The XmlNode Properties
- The XmlNode class includes many properties, the
most important ones being - Name the name of the node. For elements and
attributes, the name is the name of the element
or attribute. For text content, the name is
text. - Value the value of the DOM element. For
elements, there is no value. For attributes,
its the value of the attribute for text nodes,
its the value of the text in the node. - NodeType indicates the type of the node
(element, text, attribute, etc.)
25More XmlNode Properties
- InnerXml the string content of the XML markup
of the nodes children. - OuterXml the string content of the XML markup
of the node itself and its children. - InnerText the string content of the value of
the node and all its children nodes. - HasChildNodes a Boolean, indicating if the node
has any children.
26The XmlNodeList Class
- The XmlNodeList class represents an arbitrary
collection of XmlNodes. - For example, the XmlNode class has a ChildNodes
property, which returns an XmlNodeList instance.
This instance is a collection of nodes
representing the DOM elements children.
27Loading an XML Document into a DOM Representation
- The XmlDocuments Load() method has four
variations - Load(Stream)
- Load(string)
- Load(TextReader)
- Load(XmlTextReader)
- In the Load(string) variation, the input string
is a file path (or URL) to the XML file to load
into the DOM representation.
28The XmlDocument Properties
- The XmlDocument is derived from the XmlNode
class, meaning it has all of the properties and
methods available to the XmlNode class. - Once an XML file has been loaded into an
XmlDocument instance, we can access the root
element through the DocumentElement property.
29The XmlElement and XmlAttribute Classes
- The XmlElement and XmlAttribute classes are also
derived from the XmlNode class. - They represent, respectively, an element and an
attribute.
30Example
- The following loads and XML document and displays
the name of the root element.
Dim xmlDoc As New XmlDocument() xmlDoc.Load(filepa
th) Dim rootElementName as String rootElementName
xmlDoc.DocumentElement.Name
31Example
- Iterating through the root elements children
Dim xmlDoc As New XmlDocument() xmlDoc.Load(filepa
th) Dim n as XmlNode For Each n in
xmlDoc.DocumentElement.ChildNodes ' Display the
name of the node using n.Name Next
32An Example of Iterating through an XML Document
- Lets create an application that displays an XML
document in a TreeView control. - Each node in the TreeView represents a Node in
the DOM
33An Example of Iterating through an XML Document
- We can recursively iterate through the DOM,
ensuring that well visit each node. - View the application code...
34Navigating through an XML Document
- So far, all we have seen is how to iterate
through an XML document, one node at a time. - With the pull method (DOM), however, we can
navigate through the document as well. - For example, we might want access just the
elements in the document that have a certain
name. (Such as elements with the name ltauthorgt.)
35Accessing Elements with a Certain Name
- The XmlDocument class contains a
GetElementsByTagName() method, which returns an
XmlNodeList containing elements that have the
specified tag name.
Dim xmlDoc As New XmlDocument() xmlDoc.Load(filepa
th) Dim n as XmlNode For Each n in
xmlDoc.GetElementsByTagName("author") Display
n.Value Next
36Navigating through an XML Document
- However, what if we want to access nodes based on
more complex criteria, such as Access all
ltbookgt elements with a price attribute value less
than 30, or, Access the name of the authors who
have written more than one book. - To accomplish this we need something more
powerful enter XPath!
37A Quick Examination of XPath
- XPath is used to define particular sections of an
XML document. - XPath is named XPath because its syntax is
similar to the syntax for a file path. For
example, in our books XML document, we could use
the following XPath statement to access all of
the author elements - /books/book/authors/author
38Navigating through the DOM using XPath
- The XmlNode class contains two methods for
navigating the DOM - SelectSingleNode(string)
- SelectNodes(string)
- These string input parameter for both of these
methods is an XPath expression. - SelectSingleNode() returns at most one node, the
first node to match the XPath expression. - SelectNodes() returns all of the nodes that match
the XPath expression.
39An Example
- The following code displays the titles of books
whose price is less than 30.00.
Dim xmlDoc As New XmlDocument() xmlDoc.Load(filepa
th) Dim n as XmlNode For Each n in _
xmlDoc.SelectNodes("/books/book_at_pricelt30/title/t
ext()") Display n.Value Next
40More on XPath
- There are many more features and much more
functionality available with XPath, which well
not examine. - For a good tutorial on XPath, see
http//www.w3schools.com/xpath/default.asp.
41Summary
- In this presentation, we saw how to
programmatically iterate through XML documents. - We examined the differences between the push and
pull methods. The pull method uses the DOM,
while the push method uses XmlTextReaders and
XmlTextWriters.
42Summary
- We briefly studied the usage of XPath, a
technology designed to allow for XML document
navigation. - We saw how to use the SelectSingleNode() and
SelectNodes() methods of the XmlNode class to
navigate an XML document. - XML document navigation is only possible in the
DOM world.
43Creating XML Documents
- Recall that XML documents can be read using both
the push and pull method - The pull model loads the entire XML document in
memory before working with it DOM - The push model loads only the needed portions of
the XML document XmlReader classes.
44Creating XML Documents
- When creating XML documents, you can use either a
push or pull methodology. - Pull use the DOM
- Push use the XmlTextWriter class.
- We will examine both approaches in this section.
45Creating XML Documents with the XmlTextWriter
Class
- The XmlTextWriter class Represents a writer that
provides a fast, non-cached, forward-only way of
generating streams or files containing XML data
(Microsoft documentation) - Has methods that allow for the creation of
elements, attributes, text content, XML comments,
and so on
46The XmlTextWriter Class
- The XmlTextWriter can output to a file or stream.
This is specified in the classs constructor,
which has three forms - The first form accepts a TextWriter
- The second accepts a Stream and Encoding
- The third accepts a file path and an encoding.
(If the file exists, it will be overwritten with
the new content)
47The XmlTextWriter Class
- Important methods include
- WriteStartDocument() outputs the lt?xml ?gt
preprocessing directive. - WriteEndDocument() signals the completion of
writing. (Closes any open elements or attributes
and puts the writer back in its start state.) - Flush() flushes the output.
- Close() closes the stream, flushing the output.
48The XmlTextWriter Class
- More important methods include
- WriteStartElement(string) creates the start of
an element with a specified name. - WriteEndElement() ends the element created by
the previous WriteStartElement(string) call. - WriteString(string) writes text content for an
element. - WriteAttributeString(string, string) writes an
attribute with a specified name and value.
49Examples
- View the XmlTextWriter Demo.
- Note that the XmlTextWriter has a Formatting
property that can be set to either - Formatting.None (the default)
- Formatting.Indented
50Creating XML Documents with the DOM
- The DOM classes can be used to create an XML
document. - To start, you need to create a new XmlDocument
instance. - To create new elements, use the
CreateElement(string) method to create a new
element with a specified name. - To add an element to an existing element, use the
AppendChild(element) method.
51Creating XML Documents with the DOM
- To create new attributes, use the
CreateAttribute(string) method. - To add an attribute to an element e, use
e.Attributes.Append(attribute) - To create text content, use the
CreateTextNode(string) method. Text content,
like other elements, can be added via the
AppendChild(element) method.
52Creating XML Documents with the DOM
- The content of DOM can be saved by calling the
Save() method, which can save the results to - A specified file path
- A TextReader
- A Stream
- An XmlWriter
53Examples
- View the DOMExample-VB demo. Semantically
identical to our earlier demo. - Note that to apply indentation, the
PreserveWhitespace should be set to False before
the Save() method is called
54Editing XML Documents with the DOM
- In addition to creating new XML documents, the
DOM can be used to edit existing XML documents. - To accomplish this, we perform the following
steps - Load the XML document to edit via the XmlDocument
classs Load() method. - Programmatically edit the contents of the DOM.
- Save the changes using the Save() method.
55Example
- For example, imagine we wanted to find all
instances of a particular string in just the text
content of an XML document, and replace it with
some other string. - We could load the XML document into the DOM,
access all of its text nodes via an appropriate
XPath statement, and then performing and find and
replaces if needed. - Finally, we could then save the changes back to
the original XML file.
56Example
- For example, imagine we wanted to replace the
word book with collection of words in the
text content for the following XML document
ltbookgt lttitlegtThe Greatest Booklt/titlegt
ltyeargt1998lt/yeargt ltauthorsgt
ltauthorgtSmithlt/authorgt ltauthorgtBooklt/authorgt
lt/authorgt lt/bookgt
57Example
ltbookgt lttitlegtThe Greatest Collection of
Wordslt/titlegt ltyeargt1998lt/yeargt ltauthorsgt
ltauthorgtSmithlt/authorgt ltauthorgtCollection
of Wordslt/authorgt lt/authorgt lt/bookgt
58Example
- Note that we would not want this XML document to
become the following
ltcollection of wordsgt lttitlegtThe Greatest
Collection of Wordslt/titlegt ltyeargt1998lt/yeargt
ltauthorsgt ltauthorgtSmithlt/authorgt
ltauthorgtCollection of Wordslt/authorgt
lt/authorgt lt/collection of wordsgt
We want to replace only text content, not element
names as well!
59Example
- Run FindAndReplace-CSharp demo.
- Note the XPath to access all text nodes //text()
- Realize that we could have also recursively
iterated through the DOM searching for text nodes
(checking each XmlNodes NodeType property) but
the XPath approach is much cleaner and simpler.
60Removing Nodes in a DOM
- Elements and text nodes can be removed from the
DOM via the RemoveChild(element) method. - Attributes can be removed with the
Attributes.Remove(attribute) method.
61Summary
- With the XmlTextWriter, you can create an XML
document from scratch. - Using the DOM, you can create an XML document
from scratch as well as edit existing documents.
62Questions?