The System.Xml
namespace
comprises the following namespaces and core classes:
System.Xml.*
XmlReader
and XmlWriter
High-performance, forward-only cursors for reading or writing an XML stream
XmlDocument
Represents an XML document in a W3C-style DOM
System.Xml.XPath
Infrastructure and API (XPathNavigator
) for XPath, a string-based
language for querying XML
System.Xml.XmlSchema
Infrastructure and API for (W3C) XSD schemas
System.Xml.Xsl
Infrastructure and API (XslCompiledTransform
) for performing (W3C)
XSLT transformations of XML
System.Xml.Serialization
Supports the serialization of classes to and from XML (see Chapter 16)
System.Xml.XLinq
Modern, simplified, LINQ-centric version of XmlDocument
(see Chapter 10)
W3C is an abbreviation for World Wide Web Consortium, where the XML standards are defined.
XmlConvert
, the static class for
parsing and formatting XML strings, is covered in Chapter 6.
XmlReader
is a high-performance class for reading an XML
stream in a low-level, forward-only manner.
Consider the following XML file:
<?xml version="1.0" encoding="utf-8" standalone="yes"?> <customer id="123" status="archived"> <firstname>Jim</firstname> <lastname>Bo</lastname> </customer>
To instantiate an XmlReader
,
you call the static XmlReader.Create
method, passing in a Stream
, a
TextReader
, or a URI string. For
example:
using (XmlReader reader = XmlReader.Create ("customer.xml")) ...
To construct an XmlReader
that
reads from a string:
XmlReader reader = XmlReader.Create ( new System.IO.StringReader (myString));
You can also pass in an XmlReaderSettings
object to control parsing
and validation options. The following three properties on XmlReaderSettings
are particularly useful for
skipping over superfluous content:
bool IgnoreComments // Skip over comment nodes? bool IgnoreProcessingInstructions // Skip over processing instructions? bool IgnoreWhitespace // Skip over whitespace?
In the following example, we instruct the reader not to emit whitespace nodes, which are a distraction in typical scenarios:
XmlReaderSettings settings = new XmlReaderSettings(); settings.IgnoreWhitespace = true; using (XmlReader reader = XmlReader.Create ("customer.xml", settings)) ...
Another useful property on XmlReaderSettings
is ConformanceLevel
. Its default value of
Document
instructs the reader to
assume a valid XML document with a single root node. This is a problem
if you want to read just an inner portion of XML, containing multiple
nodes:
<firstname>Jim</firstname> <lastname>Bo</lastname>
To read this without throwing an exception, you must set ConformanceLevel
to Fragment
.
XmlReaderSettings
also has a
property called CloseInput
, which
indicates whether to close the underlying stream when the reader is
closed (there’s an analogous property on XmlWriterSettings
called CloseOutput
). The default value for CloseInput
and CloseOutput
is false
.
The units of an XML stream are XML nodes. The reader traverses the stream in textual (depth-first)
order. The Depth
property of the
reader returns the current depth of the cursor.
The most primitive way to read from an XmlReader
is to call Read
. It advances to the next node in the
XML stream, rather like MoveNext
in
IEnumerator
. The first call to
Read
positions the cursor at the
first node. When Read
returns
false
, it means the cursor has
advanced past the last node, at which point the
XmlReader
should be closed and
abandoned.
In this example, we read every node in the XML stream, outputting each node type as we go:
XmlReaderSettings settings = new XmlReaderSettings(); settings.IgnoreWhitespace = true; using (XmlReader reader = XmlReader.Create ("customer.xml", settings)) while (reader.Read()) { Console.Write (new string (' ',reader.Depth*2)); // Write indentation Console.WriteLine (reader.NodeType); }
The output is as follows:
XmlDeclaration Element Element Text EndElement Element Text EndElement EndElement
Attributes are not included in Read
-based traversal (see the section
Reading Attributes, later in this
chapter).
NodeType
is of type XmlNodeType
, which
is an enum with these members:
|
|
|
Two string
properties on
XmlReader
provide access to a
node’s content: Name
and Value
. Depending on the node type, either
Name
or Value
(or both) is populated:
XmlReaderSettings settings = new XmlReaderSettings(); settings.IgnoreWhitespace = true; settings.ProhibitDtd = false; // Must set this to read DTDs using (XmlReader r = XmlReader.Create ("customer.xml", settings)) while (r.Read()) { Console.Write (r.NodeType.ToString().PadRight (17, '-')); Console.Write ("> ".PadRight (r.Depth * 3)); switch (r.NodeType) { case XmlNodeType.Element: case XmlNodeType.EndElement: Console.WriteLine (r.Name); break; case XmlNodeType.Text: case XmlNodeType.CDATA: case XmlNodeType.Comment: case XmlNodeType.XmlDeclaration: Console.WriteLine (r.Value); break; case XmlNodeType.DocumentType: Console.WriteLine (r.Name + " - " + r.Value); break; default: break; } }
To demonstrate this, we’ll expand our XML file to include a document type, entity, CDATA, and comment:
<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE customer [ <!ENTITY tc "Top Customer"> ]> <customer id="123" status="archived"> <firstname>Jim</firstname> <lastname>Bo</lastname> <quote><![CDATA[C#'s operators include: < > &]]></quote> <notes>Jim Bo is a &tc;</notes> <!-- That wasn't so bad! --> </customer>
An entity is like a macro; a CDATA is like a verbatim string
(@
"..."
) in C#. Here’s the result:
XmlDeclaration---> version="1.0" encoding="utf-8" DocumentType-----> customer - <!ENTITY tc "Top Customer"> Element----------> customer Element----------> firstname Text-------------> Jim EndElement-------> firstname Element----------> lastname Text-------------> Bo EndElement-------> lastname Element----------> quote CDATA------------> C#'s operators include: < > & EndElement-------> quote Element----------> notes Text-------------> Jim Bo is a Top Customer EndElement-------> notes Comment----------> That wasn't so bad! EndElement-------> customer
XmlReader
automatically
resolves entities, so in our example, the entity reference &tc;
expands into Top Customer
.
Often, you already know the structure of the XML document that
you’re reading. To help with this, XmlReader
provides a range of methods that
read while presuming a particular structure. This
simplifies your code, as well as performing some validation at the
same time.
XmlReader
throws an XmlException
if any validation fails.
XmlException
has LineNumber
and LinePosition
properties indicating where
the error occurred—logging this information is essential if the XML
file is large!
ReadStartElement
verifies
that the current NodeType
is
StartElement
, and then calls
Read
. If you specify a name, it
verifies that it matches that of the current element.
ReadEndElement
verifies that
the current NodeType
is EndElement
, and then calls Read
.
For instance, we could read this:
<firstname>Jim</firstname>
as follows:
reader.ReadStartElement ("firstname"); Console.WriteLine (reader.Value); reader.ReadEndElement();
The ReadElementContentAsString
method does all
of this in one hit. It reads a start element, a text node, and an end
element, returning the content as a string:
string firstName = reader.ReadElementContentAsString ("firstname", "");
The second argument refers to the namespace, which is blank in
this example. There are also typed versions of this method, such as
ReadElementContentAsInt
, which
parse the result. Returning to our original XML document:
<?xml version="1.0" encoding="utf-8" standalone="yes"?> <customer id="123" status="archived"> <firstname>Jim</firstname> <lastname>Bo</lastname> <creditlimit>500.00</creditlimit> <!-- OK, we sneaked this in! --> </customer>
We could read it in as follows:
XmlReaderSettings settings = new XmlReaderSettings(); settings.IgnoreWhitespace = true; using (XmlReader r = XmlReader.Create ("customer.xml", settings)) { r.MoveToContent(); // Skip over the XML declaration r.ReadStartElement
("customer"); string firstName = r.ReadElementContentAsString
("firstname", ""); string lastName = r.ReadElementContentAsString
("lastname", ""); decimal creditLimit = r.ReadElementContentAsDecimal
("creditlimit", ""); r.MoveToContent(); // Skip over that pesky comment r.ReadEndElement
(); // Read the closing customer tag }
The MoveToContent
method is really useful. It
skips over all the fluff: XML declarations, whitespace, comments,
and processing instructions. You can also instruct the reader to do
most of this automatically through the properties on XmlReaderSettings
.
In the previous example, suppose that <lastname>
was optional. The
solution to this is straightforward:
r.ReadStartElement ("customer"); string firstName = r. ReadElementContentAsString ("firstname", ""); string lastName = r.Name == "lastname" ? r.ReadElementContentAsString() : null; decimal creditLimit = r.ReadElementContentAsDecimal ("creditlimit", "");
The examples in this section rely on elements appearing in the XML file in a set order. If you need to cope with elements appearing in any order, the easiest solution is to read that section of the XML into an X-DOM. We describe how to do this later in the section Patterns for Using XmlReader/XmlWriter.
The way that XmlReader
handles empty elements presents a horrible trap. Consider the
following element:
<customerList></customerList>
In XML, this is equivalent to:
<customerList/>
And yet, XmlReader
treats
the two differently. In the first case, the following code works as
expected:
reader.ReadStartElement ("customerList"); reader.ReadEndElement();
In the second case, ReadEndElement
throws an exception,
because there is no separate “end element” as far as XmlReader
is concerned. The workaround is
to check for an empty element as follows:
bool isEmpty = reader.IsEmptyElement; reader.ReadStartElement ("customerList"); if (!isEmpty) reader.ReadEndElement();
In reality, this is a nuisance only when the element in
question may contain child elements (such as a customer list). With
elements that wrap simple text (such as firstname
), you can avoid the whole issue
by calling a method such as ReadElementContentAsString
. The ReadElement
XXX
methods handle both kinds of empty elements correctly.
Table 11-1 summarizes all Read
XXX
methods
in XmlReader
. Most of these are
designed to work with elements. The sample XML fragment shown in
bold is the section read by the method described.
Table 11-1. Read methods
Members | Works on NodeType | Sample XML fragment | Input parameters | Data returned |
---|---|---|---|---|
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
| ||
|
|
| ||
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
The ReadContentAs
XXX
methods parse a text node into type XXX
.
Internally, the XmlConvert
class
performs the string-to-type conversion. The text node can be within
an element or an attribute.
The ReadElementContentAs
XXX
methods are wrappers around corresponding ReadContentAs
XXX
methods. They apply to the element node, rather
than the text node enclosed by the
element.
The typed Read
XXX
methods also include versions that read base 64 and BinHex
formatted data into a byte array.
ReadInnerXml
is typically
applied to an element, and it reads and returns an element and all
its descendents. When applied to an attribute, it returns the value
of the attribute.
ReadOuterXml
is the same as
ReadInnerXml
, except it includes
rather than excludes the element at the cursor position.
ReadSubtree
returns a proxy
reader that provides a view over just the current element (and its
descendents). The proxy reader must be closed before the original
reader can be safely read again. At the point the proxy reader is
closed, the cursor position of the original reader moves to the end
of the subtree.
ReadToDescendent
moves the
cursor to the start of the first descendent node with the specified
name/namespace.
ReadToFollowing
moves the
cursor to the start of the first node—regardless of depth—with the specified
name/namespace.
ReadToNextSibling
moves the
cursor to the start of the first sibling node with the specified
name/namespace.
ReadString
and ReadElementString
behave like ReadContentAsString
and ReadElementContentAsString
, except that
they throw an exception if there’s more than a
single text node within the element. In
general, these methods should be avoided, as they throw an exception
if an element contains a comment.
XmlReader
provides an indexer giving you direct (random) access to an element’s
attributes—by name or position. Using the indexer is equivalent to
calling GetAttribute
.
Given the following XML fragment:
<customer id="123" status="archived"/>
we could read its attributes as follows:
Console.WriteLine (reader ["id"]); // 123 Console.WriteLine (reader ["status"]); // archived Console.WriteLine (reader ["bogus"] == null); // True
The XmlReader
must be
positioned on a start element in order to read
attributes. After calling ReadStartElement
, the attributes are gone
forever!
Although attribute order is semantically irrelevant, you can access attributes by their ordinal position. We could rewrite the preceding example as follows:
Console.WriteLine (reader [0]); // 123 Console.WriteLine (reader [1]); // archived
The indexer also lets you specify the attribute’s namespace—if it has one.
AttributeCount
returns the
number of attributes for the current node.
To explicitly traverse attribute nodes, you must make a
special diversion from the normal path of just calling Read
. A good reason to do so is if you
want to parse attribute values into other types, via the ReadContentAs
XXX
methods.
The diversion must begin from a start
element. To make the job easier, the forward-only rule is
relaxed during attribute traversal: you can jump to any attribute
(forward or backward) by calling MoveToAttribute
.
MoveToElement
returns you
to the start
element from
anyplace within the attribute node diversion.
Returning to our previous example:
<customer id="123" status="archived"/>
we can do this:
reader.MoveToAttribute ("status"); string status = ReadContentAsString(); reader.MoveToAttribute ("id"); int id = ReadContentAsInt();
MoveToAttribute
returns
false
if the specified attribute
doesn’t exist.
You can also traverse each attribute in sequence by calling
the MoveToFirstAttribute
and then
the MoveToNextAttribute
methods:
if (reader.MoveToFirstAttribute()) do { Console.WriteLine (reader.Name + "=" + reader.Value); } while (reader.MoveToNextAttribute()); // OUTPUT: id=123 status=archived
XmlReader
provides two parallel systems for referring to element
and attribute names:
Name
NamespaceURI
and LocalName
Whenever you read an element’s Name
property or call a method that accepts
a single name
argument, you’re
using the first system. This works well if no namespaces or prefixes
are present; otherwise, it acts in a crude and literal manner.
Namespaces are ignored, and prefixes are included exactly as they were
written. For example:
Sample fragment | Name |
---|---|
|
|
|
|
|
|
The following code works with the first two cases:
reader.ReadStartElement ("customer");
The following is required to handle the third case:
reader.ReadStartElement ("x:customer");
The second system works through two
namespace-aware properties: NamespaceURI
and LocalName
. These properties take into
account prefixes and default namespaces defined by parent elements.
Prefixes are automatically expanded. This means that NamespaceURI
always reflects the
semantically correct namespace for the current element, and LocalName
is always free of prefixes.
When you pass two name arguments into a method such as ReadStartElement
, you’re using this same
system. For example, consider the following XML:
<customer xmlns="DefaultNamespace" xmlns:other="OtherNamespace"
>
<address>
<other:city>
...
We could read this as follows:
reader.ReadStartElement ("customer", "DefaultNamespace"); reader.ReadStartElement ("address", "DefaultNamespace"); reader.ReadStartElement ("city", "OtherNamespace");
Abstracting away prefixes is usually exactly what you want. If
necessary, you can see what prefix was used through the Prefix
property and convert it into a
namespace by calling LookupNamespace
.
XmlWriter
is a forward-only writer of an XML stream. The design of
XmlWriter
is symmetrical to XmlReader
.
As with XmlTextReader
, you
construct an XmlWriter
by calling
Create
with an optional settings
object. In the following example, we
enable indenting to make the output more human-readable, and then write
a simple XML file:
XmlWriterSettings settings = new XmlWriterSettings(); settings.Indent = true; using (XmlWriter writer = XmlWriter.Create ("..\..\foo.xml", settings)) { writer.WriteStartElement ("customer"); writer.WriteElementString ("firstname", "Jim"); writer.WriteElementString ("lastname"," Bo"); writer.WriteEndElement(); }
This produces the following document (the same as the file we read
in the first example of XmlReader
):
<?xml version="1.0" encoding="utf-8" ?> <customer> <firstname>Jim</firstname> <lastname>Bo</lastname> </customer>
XmlWriter
automatically writes
the declaration at the top unless you indicate otherwise in XmlWriterSettings
, by setting OmitXmlDeclaration
to true
or ConformanceLevel
to Fragment
. The latter also permits writing
multiple root nodes—something that otherwise throws an exception.
The WriteValue
method writes a
single text node. It accepts both string and nonstring types such as
bool
and DateTime
, internally calling XmlConvert
to perform XML-compliant string
conversions:
writer.WriteStartElement ("birthdate"); writer.WriteValue (DateTime.Now); writer.WriteEndElement();
In contrast, if we call:
WriteElementString ("birthdate", DateTime.Now.ToString());
the result would be both non-XML-compliant and vulnerable to incorrect parsing.
WriteString
is equivalent to
calling WriteValue
with a string.
XmlWriter
automatically escapes
characters that would otherwise be illegal within an attribute or
element, such as & < >
, and
extended Unicode characters.
You can write attributes immediately after writing a
start
element:
writer.WriteStartElement ("customer"); writer.WriteAttributeString ("id", "1"); writer.WriteAttributeString ("status", "archived");
To write nonstring values, call WriteStartAttribute
, WriteValue
, and then WriteEndAttribute
.
XmlWriter
also defines the
following methods for writing other kinds of nodes:
WriteBase64 // for binary data WriteBinHex // for binary data WriteCData WriteComment WriteDocType WriteEntityRef WriteProcessingInstruction WriteRaw WriteWhitespace
WriteRaw
directly injects a
string into the output stream. There is also a WriteNode
method that accepts an XmlReader
, echoing everything from the given
XmlReader
.
The overloads for the Write*
methods allow you to associate an
element or attribute with a namespace. Let’s rewrite the contents of
the XML file in our previous example. This time we will associate all
the elements with the http://oreilly.com
namespace, declaring the prefix o
at the customer
element:
writer.WriteStartElement ("o", "customer", "http://oreilly.com"); writer.WriteElementString ("o", "firstname", "http://oreilly.com", "Jim"); writer.WriteElementString ("o", "lastname", "http://oreilly.com", "Bo"); writer.WriteEndElement();
The output is now as follows:
<?xml version="1.0" encoding="utf-8" standalone="yes"?> <o:customer xmlns:o='http://oreilly.com'> <o:firstname>Jim</o:firstname> <o:lastname>Bo</o:lastname> </o:customer>
Notice how for brevity XmlWriter
omits the child element’s
namespace declarations when they are already declared by the parent
element.
Consider the following classes:
public class Contacts { public IList<Customer> Customers = new List<Customer>(); public IList<Supplier> Suppliers = new List<Supplier>(); } public class Customer { public string FirstName, LastName; } public class Supplier { public string Name; }
Suppose you want to use XmlReader
and XmlWriter
to serialize a Contacts
object to XML as in the
following:
<?xml version="1.0" encoding="utf-8" standalone="yes"?> <contacts> <customer id="1"> <firstname>Jay</firstname> <lastname>Dee</lastname> </customer> <customer> <!-- we'll assume id is optional --> <firstname>Kay</firstname> <lastname>Gee</lastname> </customer> <supplier> <name>X Technologies Ltd</name> </supplier> </contacts>
The best approach is not to write one big method, but to
encapsulate XML functionality in the Customer
and Supplier
types themselves by writing
ReadXml
and WriteXml
methods on these types. The pattern
in doing so is straightforward:
ReadXml
and WriteXml
leave the reader/writer at the
same depth when they exit.
ReadXml
reads the outer
element, whereas WriteXml
writes only its inner content.
Here’s how we would write the Customer
type:
public class Customer { public const string XmlName = "customer"; public int? ID; public string FirstName, LastName; public Customer () { } public Customer (XmlReader r) { ReadXml (r); } public void ReadXml (XmlReader r) { if (r.MoveToAttribute ("id")) ID = r.ReadContentAsInt(); r.ReadStartElement(); FirstName = r.ReadElementContentAsString ("firstname", ""); LastName = r.ReadElementContentAsString ("lastname", ""); r.ReadEndElement(); } public void WriteXml (XmlWriter w) { if (ID.HasValue) w.WriteAttributeString ("id", "", ID.ToString()); w.WriteElementString ("firstname", FirstName); w.WriteElementString ("lastname", LastName); } }
Notice that ReadXml
reads the
outer start and end element nodes. If its caller did this job instead,
Customer
couldn’t read its own
attributes. The reason for not making WriteXml
symmetrical in this regard is
twofold:
The caller might need to choose how the outer element is named.
The caller might need to write extra XML attributes, such as the element’s subtype (which could then be used to decide which class to instantiate when reading back the element).
Another benefit of following this pattern is that it makes your
implementation compatible with IXmlSerializable
(see Chapter 16).
The Supplier
class is
analogous:
public class Supplier { public const string XmlName = "supplier"; public string Name; public Supplier () { } public Supplier (XmlReader r) { ReadXml (r); } public void ReadXml (XmlReader r) { r.ReadStartElement(); Name = r.ReadElementContentAsString ("name", ""); r.ReadEndElement(); } public void WriteXml (XmlWriter w) { w.WriteElementString ("name", Name); } }
With the Contacts
class, we
must enumerate the customers
element in ReadXml
, checking
whether each subelement is a customer or a supplier. We also have to
code around the empty element trap:
public void ReadXml (XmlReader r)
{
bool isEmpty = r.IsEmptyElement; // This ensures we don't get
r.ReadStartElement();
// snookered by an empty
if (isEmpty) return; // <contacts/> element!
while (r.NodeType == XmlNodeType.Element)
{
if (r.Name == Customer.XmlName) Customers.Add (new Customer (r));
else if (r.Name == Supplier.XmlName) Suppliers.Add (new Supplier (r));
else
throw new XmlException ("Unexpected node: " + r.Name);
}
r.ReadEndElement();
}
public void WriteXml (XmlWriter w)
{
foreach (Customer c in Customers)
{
w.WriteStartElement (Customer.XmlName);
c.WriteXml (w);
w.WriteEndElement();
}
foreach (Supplier s in Suppliers)
{
w.WriteStartElement (Supplier.XmlName);
s.WriteXml (w);
w.WriteEndElement();
}
}
You can fly in an X-DOM at any point in the XML tree where XmlReader
or XmlWriter
becomes too cumbersome. Using the
X-DOM to handle inner elements is an excellent way to combine X-DOM’s
ease of use with the low-memory footprint of XmlReader
and XmlWriter
.
To read the current element into an X-DOM, you call
XNode.ReadFrom
, passing in the
XmlReader
. Unlike XElement.Load
, this method is not “greedy”
in that it doesn’t expect to see a whole document. Instead, it reads
just the end of the current subtree.
For instance, suppose we have an XML logfile structured as follows:
<log> <logentry id="1"> <date>...</date> <source>...</source> ... </logentry> ... </log>
If there were 1 million logentry
elements, reading the whole thing
into an X-DOM would waste memory. A better solution is to traverse
each logentry
with an XmlReader
, and then use XElement
to process the elements
individually:
XmlReaderSettings settings = new XmlReaderSettings();
settings.IgnoreWhitespace = true;
using (XmlReader r = XmlReader.Create ("logfile.xml", settings))
{
r.ReadStartElement ("log");
while (r.Name == "logentry")
{
XElement logEntry = (XElement) XNode.ReadFrom (r);
int id = (int) logEntry.Attribute ("id");
DateTime date = (DateTime) logEntry.Element ("date");
string source = (string) logEntry.Element ("source");
...
}
r.ReadEndElement();
}
If you follow the pattern described in the previous section,
you can slot an XElement
into a custom type’s
ReadXml
or WriteXml
method without the caller ever
knowing you’ve cheated! For instance, we could rewrite Customer
’s ReadXml
method as follows:
public void ReadXml (XmlReader r)
{
XElement x = (XElement) XNode.ReadFrom (r);
FirstName = (string) x.Element ("firstname");
LastName = (string) x.Element ("lastname");
}
XElement
collaborates with
XmlReader
to ensure that
namespaces are kept intact and prefixes are properly expanded—even
if defined at an outer level. So, if our XML file read like
this:
<log xmlns="http://loggingspace"> <logentry id="1"> ...
the XElements
we
constructed at the logentry
level
would correctly inherit the outer namespace.
You can use an XElement
just to write inner elements to an XmlWriter
. The following code writes 1
million logentry
elements to an
XML file using XElement
—without
storing the whole thing in memory:
using (XmlWriter w = XmlWriter.Create ("log.xml"))
{
w.WriteStartElement ("log");
for (int i = 0; i < 1000000; i++)
{
XElement e = new XElement ("logentry",
new XAttribute ("id", i),
new XElement ("date", DateTime.Today.AddDays (-1)),
new XElement ("source", "test"));
e.WriteTo (w);
}
w.WriteEndElement ();
}
Using an XElement
incurs
minimal execution overhead. If we amend this example to use XmlWriter
throughout, there’s no
measurable difference in execution time.
XmlDocument
is an in-memory representation of an XML
document. Its object model and the methods that its types expose conform
to a pattern defined by the W3C. So, if you’re familiar with another
W3C-compliant XML DOM (e.g., in Java), you’ll be at home with XmlDocument
. When compared to the X-DOM,
however, the W3C model is much “clunkier.”
The base type for all objects in an XmlDocument
tree is XmlNode
. The following types derive from
XmlNode
:
XmlNode
XmlDocument
XmlDocumentFragment
XmlEntity
XmlNotation
XmlLinkedNode
XmlLinkedNode
exposes NextSibling
and PreviousSibling
properties and is an abstract
base for the following subtypes:
XmlLinkedNode XmlCharacterData XmlDeclaration XmlDocumentType XmlElement XmlEntityReference XmlProcesingInstruction
To load an XmlDocument
from an existing source, you instantiate an XmlDocument
and then call Load
or LoadXml
:
Load
accepts a filename,
Stream
, TextReader
, or XmlReader
.
LoadXml
accepts a literal
XML string.
To save a document, call Save
with a filename, Stream
, TextWriter
, or XmlWriter
:
XmlDocument doc = new XmlDocument(); doc.Load ("customer1.xml"); doc.Save ("customer2.xml");
To illustrate traversing an XmlDocument
, we’ll use the following XML
file:
<?xml version="1.0" encoding="utf-8" standalone="yes"?> <customer id="123" status="archived"> <firstname>Jim</firstname> <lastname>Bo</lastname> </customer>
The ChildNodes
property
(defined in XNode
) allows you to
descend into the tree structure. This returns an indexable
collection:
XmlDocument doc = new XmlDocument(); doc.Load ("customer.xml"); Console.WriteLine (doc.DocumentElement.ChildNodes
[0].InnerText); // Jim Console.WriteLine (doc.DocumentElement.ChildNodes
[1].InnerText); // Bo
With the ParentNode
property,
you can ascend back up the tree:
Console.WriteLine (
doc.DocumentElement.ChildNodes[1].ParentNode
.Name); // customer
The following properties also help traverse the document (all of
which return null
if the node does
not exist):
FirstChild |
LastChild |
NextSibling |
PreviousSibling |
The following two statements both output firstname
:
Console.WriteLine (doc.DocumentElement.FirstChild.Name); Console.WriteLine (doc.DocumentElement.LastChild.PreviousSibling.Name);
XmlNode
exposes an Attributes
property for accessing attributes either by name (and
namespace) or by ordinal position. For example:
Console.WriteLine (doc.DocumentElement.Attributes ["id"].Value);
The InnerText
property represents the concatenation of all child text
nodes. The following two lines both output Jim
, since our XML document contains only a
single text node:
Console.WriteLine (doc.DocumentElement.ChildNodes[0].InnerText); Console.WriteLine (doc.DocumentElement.ChildNodes[0].FirstChild.Value);
Setting the InnerText
property replaces all child nodes with a single
text node. Be careful when setting InnerText
to not accidentally wipe over
element nodes. For example:
doc.DocumentElement.ChildNodes[0].InnerText = "Jo"; // wrong doc.DocumentElement.ChildNodes[0].FirstChild.InnerText = "Jo"; // right
The InnerXml
property
represents the XML fragment within the current
node. You typically use InnerXml
on
elements:
Console.WriteLine (doc.DocumentElement.InnerXml); // OUTPUT: <firstname>Jim</firstname><lastname>Bo</lastname>
InnerXml
throws an exception if the node type cannot have
children.
Call one of the Create
XXX
methods on the XmlDocument
,
such as CreateElement
.
Add the new node into the tree by calling AppendChild
, PrependChild
, InsertBefore
, or InsertAfter
on the desired parent
node.
Creating nodes requires that you first have an XmlDocument
—you cannot simply instantiate
an XmlElement
on its own like
with the X-DOM. Nodes rely on a host XmlDocument
for sustenance.
For example:
XmlDocument doc = new XmlDocument(); XmlElement customer = doc.CreateElement ("customer"); doc.AppendChild (customer);
The following creates a document matching the XML we started with earlier in this chapter in the section XmlReader:
XmlDocument doc = new XmlDocument (); doc.AppendChild (doc.CreateXmlDeclaration ("1.0", null, "yes")); XmlAttribute id = doc.CreateAttribute ("id"); XmlAttribute status = doc.CreateAttribute ("status"); id.Value = "123"; status.Value = "archived"; XmlElement firstname = doc.CreateElement ("firstname"); XmlElement lastname = doc.CreateElement ("lastname"); firstname.AppendChild (doc.CreateTextNode ("Jim")); lastname.AppendChild (doc.CreateTextNode ("Bo")); XmlElement customer = doc.CreateElement ("customer"); customer.Attributes.Append (id); customer.Attributes.Append (status); customer.AppendChild (lastname); customer.AppendChild (firstname); doc.AppendChild (customer);
You can construct the tree in any order. In the previous example, it doesn’t matter if you rearrange the order of the lines that append child nodes.
To remove a node, you call RemoveChild
, ReplaceChild
, or RemoveAll
.
See Chapter 10 for an introduction to XML namespaces and prefixes.
The CreateElement
and CreateAttribute
methods are overloaded to
let you specify a namespace and prefix:
CreateXXX
(string name); CreateXXX
(string name, string namespaceURI); CreateXXX
(string prefix, string localName, string namespaceURI);
The name
parameter refers to
either a local name (i.e., no prefix) or a name qualified with a
prefix. The namespaceURI
parameter
is used if and only if you are declaring (rather
than merely referring to) a namespace.
Here is an example of declaring a namespace with a prefix while creating an element:
XmlElement customer = doc.CreateElement ("o", "customer", "http://oreilly.com");
Here is an example of referring to a namespace with a prefix while creating an element:
XmlElement customer = doc.CreateElement ("o:firstname");
In the next section, we will explain how to deal with namespaces when writing XPath queries.
XPath is the W3C standard for XML querying. In the .NET
Framework, XPath can query an XmlDocument
rather like LINQ queries an X-DOM.
XPath has a wider scope, though, in that it’s also used by other XML
technologies, such as XML schema, XLST, and XAML.
XPath queries are expressed in terms of the XPath 2.0 Data Model. Both the DOM and the XPath Data Model represent an XML document as a tree. The difference is that the XPath Data Model is purely data-centric, abstracting away the formatting aspects of XML text. For example, CDATA sections are not required in the XPath Data Model, since the only reason CDATA sections exist is to enable text to contain markup character sequences. The XPath specification is at http://www.w3.org/tr/xpath20/.
The examples in this section all use the following XML file:
<?xml version="1.0" encoding="utf-8" standalone="yes"?> <customers> <customer id="123" status="archived"> <firstname>Jim</firstname> <lastname>Bo</lastname> </customer> <customer> <firstname>Thomas</firstname> <lastname>Jefferson</lastname> </customer> </customers>
You can write XPath queries within code in the following ways:
Call one of the Select
XXX
methods on an XmlDocument
or
XmlNode
.
Spawn an XPathNavigator
from either:
An XmlDocument
An XPathDocument
Call an XPath
XXX
extension method on an XNode
.
The Select
XXX
methods
accept an XPath query string. For example, the following finds the
firstname
node of an XmlDocument
:
XmlDocument doc = new XmlDocument();
doc.Load ("customers.xml");
XmlNode n = doc.SelectSingleNode ("customers/customer[firstname='Jim']");
Console.WriteLine (n.InnerText); // JimBo
The Select
XXX
methods
delegate their implementation to XPathNavigator
, which you can also use
directly—over either an XmlDocument
or a read-only XPathDocument
.
You can also execute XPath queries over an X-DOM, via extension
methods defined in System.Xml.XPath
:
XDocument doc = XDocument.Load (@"Customers.xml");
XElement e = doc.XPathSelectElement
("customers/customer[firstname='Jim']");
Console.WriteLine (e.Value); // JimBo
The extension methods for use with XNode
s are:
CreateNavigator |
XPathEvaluate |
XPathSelectElement |
XPathSelectElements |
The XPath specification is huge. However, you can get by knowing just a few operators (see Table 11-2), just as you can play a lot of songs knowing just three chords.
Table 11-2. Common XPath operators
Operator | Description |
---|---|
| Children |
| Recursively children |
| Current node (usually implied) |
| Parent node |
| Wildcard |
| Attribute |
| Filter |
| Namespace separator |
To find the customers
node:
XmlNode node = doc.SelectSingleNode ("customers");
The /
symbol queries child
nodes. To select the customer
nodes:
XmlNode node = doc.SelectSingleNode ("customers/customer");
The //
operator includes all
child nodes, regardless of nesting level. To select all lastname
nodes:
XmlNodeList nodes = doc.SelectNodes ("//lastname");
The ..
operator selects
parent nodes. This example is a little silly because we’re starting
from the root anyway, but it serves to illustrate the
functionality:
XmlNodeList nodes = doc.SelectNodes ("customers/customer..customers");
The *
operator selects nodes
regardless of name. The following selects the child nodes of customer
, regardless of name:
XmlNodeList nodes = doc.SelectNodes ("customers/customer/*");
The @
operator selects
attributes. *
can be used as a
wildcard. Here is how to select the id
attribute:
XmlNode node = doc.SelectSingleNode ("customers/customer/@id");
The []
operator filters a
selection, in conjunction with the operators =
, !=
,
<
, >
, not()
, and
, and or
. In this example, we filter on firstname
:
XmlNode n = doc.SelectSingleNode ("customers/customer[firstname='Jim']");
The :
operator qualifies a
namespace. Had the customers
element been qualified with the x
namespace, we would access it as follows:
XmlNode node = doc.SelectSingleNode ("x:customers");
XPathNavigator
is a cursor over the XPath Data Model
representation of an XML document. It is loaded with primitive methods
that move the cursor around the tree (e.g., move to parent, move to
first child, etc.). The XPathNavigator
’s Select*
methods take an XPath string to
express more complex navigations or queries that return multiple
nodes.
Spawn instances of XPathNavigator
from an XmlDocument
, an XPathDocument
, or another XPathNavigator
. Here is an example of
spawning an XPathNavigator
from an
XmlDoument
:
XPathNavigator nav = doc.CreateNavigator(); XPathNavigator jim = nav.SelectSingleNode ( "customers/customer[firstname='Jim']" ); Console.WriteLine (jim.Value); // JimBo
In the XPath Data Model, the value of a node is the
concatenation of the text elements, equivalent to XmlDocument
’s InnerText
property.
The SelectSingleNode
method
returns a single XPathNavigator
.
The Select
method returns an
XPathNodeIterator
, which simply
iterates over multiple XPathNavigator
s. For example:
XPathNavigator nav = doc.CreateNavigator(); string xPath = "customers/customer/firstname/text()"; foreach (XPathNavigator navC in nav.Select (xPath)) Console.WriteLine (navC.Value); OUTPUT: Jim Thomas
To perform faster queries, you can compile an XPath query into
an XPathExpression
. You then pass
the compiled expression to a Select*
method, instead of a string. For
example:
XPathNavigator nav = doc.CreateNavigator(); XPathExpression expr = nav.Compile ("customers/customer/firstname"); foreach (XPathNavigator a in nav.Select (expr)) Console.WriteLine (a.Value); OUTPUT: Jim Thomas
Querying elements and attributes that contain namespaces requires some extra unintuitive steps. Consider the following XML file:
<?xml version="1.0" encoding="utf-8" standalone="yes"?> <o:
customersxmlns:o='http://oreilly.com'
> <o:
customer id="123" status="archived"> <firstname>Jim</firstname> <lastname>Bo</lastname> </o:
customer> <o:
customer> <firstname>Thomas</firstname> <lastname>Jefferson</lastname> </o:
customer> </o:
customers>
The following query will fail, despite qualifying the nodes with
the prefix o
:
XmlDocument doc = new XmlDocument();
doc.Load ("customers.xml");
XmlNode n = doc.SelectSingleNode ("o:customers/o:customer");
Console.WriteLine (n.InnerText); // JimBo
To make this query work, you must first create an XmlNamespaceManager
instance as
follows:
XmlNamespaceManager xnm = new XmlNamespaceManager (doc.NameTable);
You can treat NameTable
as a
black box (XmlNamespaceManager
uses
it internally to cache and reuse strings). Once we create the
namespace manager, we can add prefix/namespace pairs to it as
follows:
xnm.AddNamespace ("o", "http://oreilly.com");
The Select*
methods on
XmlDocument
and XPathNavigator
have overloads that accept an
XmlNamespaceManager
. We can
successfully rewrite the previous query as follows:
XmlNode n = doc.SelectSingleNode ("o:customers/o:customer", xnm
);
XPathDocument
is used for read-only XML documents that conform to the
W3C XPath Data Model. An XPathNavigator
backed by an XPathDocument
is faster than an XmlDocument
, but it cannot make changes to
the underlying document:
XPathDocument doc = new XPathDocument ("customers.xml"); XPathNavigator nav = doc.CreateNavigator(); foreach (XPathNavigator a in nav.Select ("customers/customer/firstname")) Console.WriteLine (a.Value); OUTPUT: Jim Thomas
The content of a particular XML document is nearly always
domain-specific, such as a Microsoft Word document, an application
configuration document, or a web service. For each domain, the XML file
conforms to a particular pattern. There are several standards for
describing the schema of such a pattern, to standardize and automate the
interpretation and validation of XML documents. The most widely accepted
standard is XSD, short for XML Schema
Definition. Its precursors, DTD and XDR, are also supported
by System.Xml
.
Consider the following XML document:
<?xml version="1.0"?> <customers> <customer id="1" status="active"> <firstname>Jim</firstname> <lastname>Bo</lastname> </customer> <customer id="1" status="archived"> <firstname>Thomas</firstname> <lastname>Jefferson</lastname> </customer> </customers>
We can write an XSD for this document as follows:
<?xml version="1.0" encoding="utf-8"?> <xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="customers"> <xs:complexType> <xs:sequence> <xs:element maxOccurs="unbounded" name="customer"> <xs:complexType> <xs:sequence> <xs:element name="firstname" type="xs:string" /> <xs:element name="lastname" type="xs:string" /> </xs:sequence> <xs:attribute name="id" type="xs:int" use="required" /> <xs:attribute name="status" type="xs:string" use="required" /> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
As you can see, XSD documents are themselves written in XML. Furthermore, an XSD document is describable with XSD—you can find that definition at http://www.w3.org/2001/xmlschema.xsd.
You can validate an XML file or document against one or more schemas before reading or processing it. There are a number of reasons to do so:
You can get away with less error checking and exception handling.
Schema validation picks up errors you might otherwise overlook.
Error messages are detailed and informative.
To perform validation, plug a schema into an XmlReader
, an XmlDocument
, or an X-DOM object, and then
read or load the XML as you would normally. Schema validation happens
automatically as content is read, so the input stream is not read
twice.
Here’s how to plug a schema from the file customers.xsd into an XmlReader
:
XmlReaderSettings settings = new XmlReaderSettings(); settings.ValidationType = ValidationType.Schema; settings.Schemas.Add (null, "customers.xsd"); using (XmlReader r = XmlReader.Create ("customers.xml", settings)) ...
If the schema is inline, set the following flag instead of
adding to Schemas
:
settings.ValidationFlags |= XmlSchemaValidationFlags.ProcessInlineSchema;
You then Read
as you would
normally. If schema validation fails at any point, an XmlSchemaValidationException
is
thrown.
Calling Read
on its own
validates both elements and attributes: you don’t need to navigate
to each individual attribute for it to be validated.
If you want only to validate the document, you can do this:
using (XmlReader r = XmlReader.Create ("customers.xml", settings)) try { while (r.Read()) ; } catch (XmlSchemaValidationException ex) { ... }
XmlSchemaValidationException
has
properties for the error Message
,
LineNumber
, and LinePosition
. In this case, it only tells
you about the first error in the document. If you want to report on
all errors in the document, you instead must handle the ValidationEventHandler
event:
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Add (null, "customers.xsd");
settings.ValidationEventHandler += ValidationHandler;
using (XmlReader r = XmlReader.Create ("customers.xml", settings))
while (r.Read()) ;
When you handle this event, schema errors no longer throw exceptions. Instead, they fire your event handler:
static void ValidationHandler (object sender, ValidationEventArgs e) { Console.WriteLine ("Error: " + e.Exception.Message); }
The Exception
property of
ValidationEventArgs
contains the
XmlSchemaValidationException
that
would have otherwise been thrown.
The System.Xml
namespace
also contains a class called XmlValidatingReader
. This was used to
perform schema validation prior to Framework 2.0, and it is now
deprecated.
To validate an XML file or stream while reading into an
X-DOM or XmlDocument
, you create
an XmlReader
, plug in the
schemas, and then use the reader to load the DOM:
XmlReaderSettings settings = new XmlReaderSettings(); settings.ValidationType = ValidationType.Schema; settings.Schemas.Add (null, "customers.xsd"); XDocument doc; using (XmlReader r = XmlReader.Create ("customers.xml", settings)) try {doc = XDocument.Load (r);
} catch (XmlSchemaValidationException ex) { ... } XmlDocument xmlDoc = new XmlDocument(); using (XmlReader r = XmlReader.Create ("customers.xml", settings)) try {xmlDoc.Load (r);
} catch (XmlSchemaValidationException ex) { ... }
You can also validate an XDocument
or XElement
that’s already in memory, by
calling extension methods in System.Xml.Schema
. These methods accept an
XmlSchemaSet
(a collection of
schemas) and a validation event handler:
XDocument doc = XDocument.Load (@"customers.xml");
XmlSchemaSet set = new XmlSchemaSet ();
set.Add (null, @"customers.xsd");
StringBuilder errors = new StringBuilder ();
doc.Validate
(set, (sender, args) => { errors.AppendLine
(args.Exception.Message); }
);
Console.WriteLine (errors.ToString());
To validate an XmlDocument
already in memory, add the schema(s) to the XmlDocument
’s Schemas
collection and then call the
document’s Validate
method,
passing in a ValidationEventHandler
to process the
errors.
XSLT stands for Extensible Stylesheet Language Transformations. It is an XML language that describes how to transform one XML language into another. The quintessential example of such a transformation is transforming an XML document (that typically describes data) into an XHTML document (that describes a formatted document).
Consider the following XML file:
<customer> <firstname>Jim</firstname> <lastname>Bo</lastname> </customer>
The following XSLT file describes such a transformation:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"> <html> <p><xsl:value-of select="//firstname"/></p> <p><xsl:value-of select="//lastname"/></p> </html> </xsl:template> </xsl:stylesheet>
The output is as follows:
<html> <p>Jim</p> <p>Bo</p> </html>
The System.Xml.Xsl.XslCompiledTransform
transform
class efficiently performs XLST transforms. It renders XmlTransform
obsolete. XmlTransform
works very simply:
XslCompiledTransform transform = new XslCompiledTransform(); transform.Load ("test.xslt"); transform.Transform ("input.xml", "output.xml");
Generally, it’s more useful to use the overload of Transform
that accepts an XmlWriter
rather than an output file, so you
can control the formatting.