XML (the eXtensible Markup Language) provides an industry-standard method for encoding structured information. It defines syntactic and structural rules that enable software applications to process XML files even when they don’t understand all of the data.
XML specifications are defined and maintained by the World Wide Web Consortium (W3C). The latest version is XML 1.1 (Second Edition). However, XML 1.0 (currently in its fifth edition) is the most popular version, and is supported by all XML parsers. W3C states that:
You are encouraged to create or generate XML 1.0 documents if you do not need the new features in XML 1.1; XML Parsers are expected to understand both XML 1.0 and XML 1.1 (see http://www.w3.org/xml/core/#publications/).
This chapter will introduce XML 1.0 only, and in fact, will focus on
just the most commonly used XML features. We’ll introduce you to the
XDocument
and XElement
classes first, and you’ll learn how to
create and manipulate XML documents.
Of course, once you have a large document, you’ll want to be able to find substrings, and we’ll show you two different ways to do that, using LINQ. The .NET Framework also allows you to serialize your objects as XML, and deserialize them at their destination. We’ll cover those methods at the end of the chapter.
XML is a markup language, not unlike HTML, except that it is extensible—that is, applications that use XML can (and do) create new kinds of elements and attributes.
In XML, a document is a hierarchy of
elements. An element is typically defined by a pair
of tags, called the start and end tags. In the
following example, FirstName
is an
element:
<FirstName>Orlando</FirstName>
A start tag contains the element name surrounded by a pair of angle brackets:
<FirstName>
An end tag is similar, except that the element name is preceded by a forward slash:
</FirstName>
An element may contain content between its
start and end tags. In this example, the element contains text, but
content can also contain child elements. For example, this
Customer
element has three child
elements:
<Customer> <FirstName>Orlando</FirstName> <LastName>Gee</LastName> <EmailAddress>[email protected]</EmailAddress> </Customer>
The top-level element in an XML document is called its root element. Every document has exactly one root element.
An element does not have to contain content, but every element (except for the root element) has exactly one parent element. Elements with the same parent element are called sibling elements.
In this example, Customers
(plural) is the root. The children of the root element, Customers
, are the three
Customer
(singular) elements:
<Customers> <Customer> <FirstName>Orlando</FirstName> <LastName>Gee</LastName> <EmailAddress>[email protected]</EmailAddress> </Customer> <Customer> <FirstName>Keith</FirstName> <LastName>Harris</LastName> <EmailAddress>[email protected]</EmailAddress> </Customer> <Customer> <FirstName>Donna</FirstName> <LastName>Carreras</LastName> <EmailAddress>[email protected]</EmailAddress> </Customer> <Customer> <FirstName>Janet</FirstName> <LastName>Gates</LastName> <EmailAddress>[email protected]</EmailAddress> </Customer> <Customer> <FirstName>Lucy</FirstName> <LastName>Harrington</LastName> <EmailAddress>[email protected]</EmailAddress> </Customer> </Customers>
Each Customer
has one parent
(Customers
) and three children
(FirstName
, LastName
, and EmailAddress
). Each of these, in turn, has one
parent (Customer
) and zero
children.
When an element has no content—no child elements and no text—you can optionally use a more compact representation, where you write just a single tag, with a slash just before the closing angle bracket. For example, this:
<Customers/>
means exactly the same as this:
<Customers></Customers>
This empty element tag syntax is the only syntax in which an element is represented by just a single tag. Unless you are using this form, it is illegal to omit the closing tag.
XHTML is an enhanced standard of HTML that follows the stricter rules of XML validity. The two most important XML rules that make XHTML different from plain HTML follow:
No elements may overlap, though they may nest. So this is legal, because the elements are nested:
<element 1> <element2> ... </element 2> </element 1>
You may not write:
<element 1> <element2> ... </element 1> </element 2>
because in the latter case, element2
overlaps element1
rather than being neatly nested
within it. (Ordinary HTML allows this.)
Every element must be closed, which means that for each opened element, you must have a closing tag (or the element tag must be self-closing). So while plain old HTML permits:
<br>
in XHTML we must either write this:
<br></br>
or use the empty element tag form:
<br />
The key point of XML is to provide an extensible markup language. Here’s an incredibly short pop-history lesson: HTML was derived from the Standard Generalized Markup Language (SGML). HTML has many wonderful attributes (if you’ll pardon the pun), but if you want to add a new element to HTML, you have two choices: apply to the W3C and wait, or strike out on your own and be “nonstandard.”
There was a strong need for the ability for two organizations to get together and specify tags that they could use for data exchange. Hey! Presto! XML was born as a more general-purpose markup language that allows users to define their own tags. This is the critical distinction of XML.
Because XML documents are structured text documents, you can create them using a text editor and process them using string manipulation functions. To paraphrase David Platt, you can also have an appendectomy through your mouth, but it takes longer and hurts more.
To make the job easier, .NET implements classes and utilities that
provide XML functionality. There are several to choose from. There are the
streaming XML APIs (which support XmlReader
and XmlWriter
), which never attempt to hold the
whole document in memory—you work one element at a time, and while that
enables you to handle very large documents without using much memory, it
can be tricky to code for. So there are simpler APIs that let you build an
object model that represents an XML document. Even here, you have a
choice. One set of XML APIs is based on the XML Document Object Model
(DOM), a standard API implemented in many programming systems, not just
.NET. However, the DOM is surprisingly cumbersome to work with, so .NET
3.5 introduced a set of APIs that are easier to use from .NET. These are
designed to work well with LINQ, and so they’re often referred to as
LINQ to XML. These are now the preferred XML API if you
don’t need streaming. (Silverlight doesn’t even offer the XML DOM APIs, so
LINQ to XML is your only nonstreaming option there.)
Despite the name, it’s not strictly necessary to use LINQ when using the LINQ to XML classes—Example 12-1 uses this API to write a list of customers to an XML document.
Example 12-1. Creating an XML document
using System; using System.Collections.Generic; using System.Xml.Linq; namespace Programming_CSharp { // Simple customer class public class Customer { public string FirstName { get; set; } public string LastName { get; set; } public string EmailAddress { get; set; } } // Main program public class Tester { static void Main() { List<Customer> customers = CreateCustomerList(); var customerXml = new XDocument(); var rootElem = new XElement("Customers"); customerXml.Add(rootElem); foreach (Customer customer in customers) { // Create new element representing the customer object. var customerElem = new XElement("Customer"); // Add element representing the FirstName property // to the customer element. var firstNameElem = new XElement("FirstName", customer.FirstName); customerElem.Add(firstNameElem); // Add element representing the LastName property // to the customer element. var lastNameElem = new XElement("LastName", customer.LastName); customerElem.Add(lastNameElem); // Add element representing the EmailAddress property // to the customer element. var emailAddress = new XElement("EmailAddress", customer.EmailAddress); customerElem.Add(emailAddress); // Finally add the customer element to the XML document rootElem.Add(customerElem); } Console.WriteLine(customerXml.ToString()); Console.Read(); } // Create a customer list with sample data private static List<Customer> CreateCustomerList() { List<Customer> customers = new List<Customer> { new Customer { FirstName = "Orlando", LastName = "Gee", EmailAddress = "[email protected]"}, new Customer { FirstName = "Keith", LastName = "Harris", EmailAddress = "[email protected]" }, new Customer { FirstName = "Donna", LastName = "Carreras", EmailAddress = "[email protected]" }, new Customer { FirstName = "Janet", LastName = "Gates", EmailAddress = "[email protected]" }, new Customer { FirstName = "Lucy", LastName = "Harrington", EmailAddress = "[email protected]" } }; return customers; } } }
The program will produce this output:
<Customers> <Customer> <FirstName>Orlando</FirstName> <LastName>Gee</LastName> <EmailAddress>[email protected]</EmailAddress> </Customer> <Customer> <FirstName>Keith</FirstName> <LastName>Harris</LastName> <EmailAddress>[email protected]</EmailAddress> </Customer> <Customer> <FirstName>Donna</FirstName> <LastName>Carreras</LastName> <EmailAddress>[email protected]</EmailAddress> </Customer> <Customer> <FirstName>Janet</FirstName> <LastName>Gates</LastName> <EmailAddress>[email protected]</EmailAddress> </Customer> <Customer> <FirstName>Lucy</FirstName> <LastName>Harrington</LastName> <EmailAddress>[email protected]</EmailAddress> </Customer> </Customers>
As it happens, this example would have needed less code if we had used LINQ, but for this first example, we wanted to keep things simple. We’ll show the LINQ version shortly.
In .NET, the System.Xml.Linq
namespace contains the LINQ to XML classes we can use to create and
process XML documents.
The Customer
class and the
CreateCustomerList
function in the main
Tester
class contain straightforward
code to give us some data to work with, so we will not go over them. The
main attraction in this example is the XML creation in the Main
function. First, we create a new XML
document object:
var customerXml = new XDocument();
Next, we create the root element and add it to the document:
var rootElem = new XElement("Customers"); customerXml.Add(rootElem);
After these two operations, the customerXml
object represents an XML document
containing an empty element, which might look either like this:
<Customers></Customers>
or like this:
<Customers />
LINQ to XML tends to use the empty element tag form where possible,
so if you were to call ToString()
on
customerXml
at this point, it would
produce that second version.
Of course, you may already have an XML document, and you may want to
turn that into an XDocument
object.
Example 12-2 shows how to load a string
into a new XDocument
.
Example 12-2. Loading XML from a string
XDocument doc = XDocument.Parse("<Customers><Customer /></Customers>");
There’s also a Load
method, which has
several overloads. You can pass in a URL, in which case it will fetch the
XML from there and then parse it. You can also pass in a Stream
or a TextReader
, the abstract types from the System.IO
namespace that represent a stream of
bytes (such as a file), or a source of text (such as a file of some known
character encoding).
With the root element in hand, you can add each customer as a child node:
foreach (Customer customer in customers) { // Create new element representing the customer object. var customerElem = new XElement("Customer");
In this example, we make each property of the customer object a child element of the customer element:
// Add element representing the FirstName property to the Customer element. var firstNameElem = new XElement("FirstName", customer.FirstName); cstomerElem.Add(firstNameElem);
This adds the FirstName
child
element. We’re passing the customer’s first name as the second
constructor argument, which will make that the content of the element.
The result will look like this:
<FirstName>Orlando</FirstName>
The other two properties, LastName
and EmailAddress
, are added to the customer
element in exactly the same way. Here’s an example of the complete
customer element:
<Customer> <FirstName>Orlando</FirstName> <LastName>Gee</LastName> <EmailAddress>[email protected]</EmailAddress> </Customer>
Finally, the newly created customer element is added to the XML document as a child of the root element:
// Finally add the customer element to the XML document rootElem.Add(customerElem); }
Once all customer elements are created, this example prints the XML document:
Console.WriteLine(customerXml.ToString());
When you call ToString()
on any
of the LINQ to XML objects (whether they represent the whole document,
as in this case, or just some fragment of a document such as an XElement
), it produces the XML text, and it
formats it with indentation, making it easy to read. There are ways to
produce more compact representations—if you’re sending the XML across a
network to another computer, size may be more important than
readability. To see a terser representation, we could do this:
Console.WriteLine(customerXml.ToString(SaveOptions.DisableFormatting));
That will print the XML as one long line with no spaces.
An XML element may have a set of attributes, which store additional information about the element. An attribute is a key/value pair contained in the start tag of an XML element:
<Customer FirstName="Orlando" LastName="Gee"></Customer>
If you’re using an empty element tag, the attributes appear in the one and only tag:
<Customer FirstName="Orlando" LastName="Gee" />
The next example demonstrates how you can mix the use of child elements and attributes. It creates customer elements with the customer’s name stored in attributes and the email address stored as a child element:
<Customer FirstName="Orlando" LastName="Gee"> <EmailAddress>[email protected]</EmailAddress> </Customer>
The only difference between this and Example 12-1 is that we create XAttribute
objects for the FirstName
and LastName
properties instead of XElement
objects:
// Add an attribute representing the FirstName property // to the customer element. var firstNameAttr = new XAttribute("FirstName", customer.FirstName); customerElem.Add(firstNameAttr); // Add an attribute representing the LastName property // to the customer element. var lastNameAttr = new XAttribute("LastName", customer.LastName); customerElem.Add(lastNameAttr);
As with elements, we just add the attribute to the parent element. Example 12-3 shows the complete sample code and output.
Example 12-3. Creating an XML document containing elements and attributes
using System; using System.Collections.Generic; using System.Xml.Linq; namespace Programming_CSharp { // Simple customer class public class Customer { // Same as in Example 12-1 } // Main program public class Tester { static void Main() { List<Customer> customers = CreateCustomerList(); var customerXml = new XDocument(); var rootElem = new XElement("Customers"); customerXml.Add(rootElem); foreach (Customer customer in customers) { // Create new element representing the customer object. var customerElem = new XElement("Customer"); // Add an attribute representing the FirstName property // to the customer element. var firstNameAttr = new XAttribute("FirstName", customer.FirstName); customerElem.Add(firstNameAttr); // Add an attribute representing the LastName property // to the customer element. var lastNameAttr = new XAttribute("LastName", customer.LastName); customerElem.Add(lastNameAttr); // Add element representing the EmailAddress property // to the customer element. var emailAddress = new XElement("EmailAddress", customer.EmailAddress); customerElem.Add(emailAddress); // Finally add the customer element to the XML document rootElem.Add(customerElem); } Console.WriteLine(customerXml.ToString()); Console.Read(); } // Create a customer list with sample data private static List<Customer> CreateCustomerList() { List<Customer> customers = new List<Customer> { new Customer { FirstName = "Orlando", LastName = "Gee", EmailAddress = "[email protected]"}, new Customer { FirstName = "Keith", LastName = "Harris", EmailAddress = "[email protected]" }, new Customer { FirstName = "Donna", LastName = "Carreras", EmailAddress = "[email protected]" }, new Customer { FirstName = "Janet", LastName = "Gates", EmailAddress = "[email protected]" }, new Customer { FirstName = "Lucy", LastName = "Harrington", EmailAddress = "[email protected]" } }; return customers; } } } Output: <Customers> <Customer FirstName="Orlando" LastName="Gee"> <EmailAddress>[email protected]</EmailAddress> </Customer> <Customer FirstName="Keith" LastName="Harris"> <EmailAddress>[email protected]</EmailAddress> </Customer> <Customer FirstName="Donna" LastName="Carreras"> <EmailAddress>[email protected]</EmailAddress> </Customer> <Customer FirstName="Janet" LastName="Gates"> <EmailAddress>[email protected]</EmailAddress> </Customer> <Customer FirstName="Lucy" LastName="Harrington"> <EmailAddress>[email protected]</EmailAddress> </Customer> </Customers>
While it’s often convenient to be able to create and add elements
and attributes one step at a time, these classes offer constructors that
allow us to do more work in a single step. If we know exactly what we
want to put in an element, this can lead to neater looking code. For
example, we can replace the foreach
loop with the code in Example 12-4.
Example 12-4. Constructing an XElement all at once
foreach (Customer customer in customers) { // Create new element representing the customer object. var customerElem = new XElement("Customer", new XAttribute("FirstName", customer.FirstName), new XAttribute("LastName", customer.LastName), new XElement("EmailAddress", customer.EmailAddress) ); // Finally add the customer element to the XML document rootElem.Add(customerElem); }
The only difference is that we’re passing all the XAttribute
and XElement
objects to the containing XElement
constructor, rather than passing them
to Add
one at a time. As well as
being more compact, it’s pretty easy to see how this code relates to the
structure of the XML element being produced. We can also use this
technique in conjunction with LINQ.
We’ve seen several examples that construct an XElement
, passing the name as the first
argument, and the content as the second. We’ve passed strings, child
elements, and attributes, but we can also provide an implementation of
IEnumerable<T>
. So if we add a
using System.Linq;
directive to the
top of our file, we could use a LINQ query as the second constructor
argument as Example 12-5
shows.
Example 12-5. Generating XML elements with LINQ
var customerXml = new XDocument(new XElement("Customers", from customer in customers select new XElement("Customer", new XAttribute("FirstName", customer.FirstName), new XAttribute("LastName", customer.LastName), new XElement("EmailAddress", customer.EmailAddress) )));
This generates the whole of the XML document in a single
statement. So the work that took 25 lines of code in Example 12-1 comes down to just seven. Example 12-6 shows the whole example, with its
much simplified Main
method.
Example 12-6. Building XML with LINQ
using System; using System.Collections.Generic; using System.Linq; using System.Xml.Linq; namespace Programming_CSharp { // Simple customer class public class Customer { // Same as in Example 12-1 } // Main program public class Tester { static void Main() { List<Customer> customers = CreateCustomerList(); var customerXml = new XDocument(new XElement("Customers", from customer in customers select new XElement("Customer", new XAttribute("FirstName", customer.FirstName), new XAttribute("LastName", customer.LastName), new XElement("EmailAddress", customer.EmailAddress) ))); Console.WriteLine(customerXml.ToString()); Console.Read(); } // Create a customer list with sample data private static List<Customer> CreateCustomerList() { List<Customer> customers = new List<Customer> { new Customer { FirstName = "Orlando", LastName = "Gee", EmailAddress = "[email protected]"}, new Customer { FirstName = "Keith", LastName = "Harris", EmailAddress = "[email protected]" }, new Customer { FirstName = "Donna", LastName = "Carreras", EmailAddress = "[email protected]" }, new Customer { FirstName = "Janet", LastName = "Gates", EmailAddress = "[email protected]" }, new Customer { FirstName = "Lucy", LastName = "Harrington", EmailAddress = "[email protected]" } }; return customers; } } }
We’re not really doing anything special here—this LINQ query is just relying on plain old LINQ to Objects—the same techniques we already saw in Chapter 8. But this is only half the story. LINQ to XML is not just about creating XML. It also supports reading XML.
Being able to create XML documents to store data to be processed or exchanged is great, but it would not be of much use if you could not find information in them easily. LINQ to XML lets you use the standard LINQ operators to search for information in XML documents.
We’ll need an example document to search through. Here’s the document from Example 12-3, reproduced here for convenience:
<Customers> <Customer FirstName="Orlando" LastName="Gee"> <EmailAddress>[email protected]</EmailAddress> </Customer> <Customer FirstName="Keith" LastName="Harris"> <EmailAddress>[email protected]</EmailAddress> </Customer> <Customer FirstName="Donna" LastName="Carreras"> <EmailAddress>[email protected]</EmailAddress> </Customer> <Customer FirstName="Janet" LastName="Gates"> <EmailAddress>[email protected]</EmailAddress> </Customer> <Customer FirstName="Lucy" LastName="Harrington"> <EmailAddress>[email protected]</EmailAddress> </Customer> </Customers>
Example 12-7 lists the code for the example.
Example 12-7. Searching an XML document using LINQ
using System; using System.Collections.Generic; using System.Linq; using System.Xml.Linq; namespace Programming_CSharp { public class Customer { public string FirstName { get; set; } public string LastName { get; set; } public string EmailAddress { get; set; } } public class Tester { private static XDocument CreateCustomerListXml() { List<Customer> customers = CreateCustomerList(); var customerXml = new XDocument(new XElement("Customers", from customer in customers select new XElement("Customer", new XAttribute("FirstName", customer.FirstName), new XAttribute("LastName", customer.LastName), new XElement("EmailAddress", customer.EmailAddress) ))); return customerXml; } private static List<Customer> CreateCustomerList() { List<Customer> customers = new List<Customer> { new Customer {FirstName = "Douglas", LastName = "Adams", EmailAddress = "[email protected]"}, new Customer {FirstName = "Richard", LastName = "Dawkins", EmailAddress = "[email protected]"}, new Customer {FirstName = "Kenji", LastName = "Yoshino", EmailAddress = "[email protected]"}, new Customer {FirstName = "Ian", LastName = "McEwan", EmailAddress = "[email protected]"}, new Customer {FirstName = "Neal", LastName = "Stephenson", EmailAddress = "[email protected]"}, new Customer {FirstName = "Randy", LastName = "Shilts", EmailAddress = "[email protected]"}, new Customer {FirstName = "Michelangelo", LastName = "Signorile ", EmailAddress = "[email protected]"}, new Customer {FirstName = "Larry", LastName = "Kramer", EmailAddress = "[email protected]"}, new Customer {FirstName = "Jennifer", LastName = "Baumgardner", EmailAddress = "[email protected]"} }; return customers; } static void Main() { XDocument customerXml = CreateCustomerListXml(); Console.WriteLine("Search for single element..."); var query = from customer in customerXml.Element("Customers").Elements("Customer") where customer.Attribute("FirstName").Value == "Douglas" select customer; XElement oneCustomer = query.SingleOrDefault(); if (oneCustomer != null) { Console.WriteLine(oneCustomer); } else { Console.WriteLine("Not found"); } Console.WriteLine(" Search using descendant axis... "); query = from customer in customerXml.Descendants("Customer") where customer.Attribute("FirstName").Value == "Douglas" select customer; oneCustomer = query.SingleOrDefault(); if (oneCustomer != null) { Console.WriteLine(oneCustomer); } else { Console.WriteLine("Not found"); } Console.WriteLine(" Search using element values... "); query = from emailAddress in customerXml.Descendants("EmailAddress") where emailAddress.Value == "[email protected]" select emailAddress; XElement oneEmail = query.SingleOrDefault(); if (oneEmail != null) { Console.WriteLine(oneEmail); } else { Console.WriteLine("Not found"); } Console.WriteLine(" Search using child element values... "); query = from customer in customerXml.Descendants("Customer") where customer.Element("EmailAddress").Value == "[email protected]" select customer; oneCustomer = query.SingleOrDefault(); if (oneCustomer != null) { Console.WriteLine(oneCustomer); } else { Console.WriteLine("Not found"); } } // end main } // end class } // end namespace Output: Search for single element... <Customer FirstName="Douglas" LastName="Adams"> <EmailAddress>[email protected]</EmailAddress> </Customer> Search using descendant axis... <Customer FirstName="Douglas" LastName="Adams"> <EmailAddress>[email protected]</EmailAddress> </Customer> Search using element values... <EmailAddress>[email protected]</EmailAddress> Search using child element values... <Customer FirstName="Douglas" LastName="Adams"> <EmailAddress>[email protected]</EmailAddress> </Customer>
This example refactors Example 12-3 by extracting the
creation of the sample customer list XML document into the CreateCustomerListXml()
method. You can now
simply call this function in the Main
()
function to create the XML document.
The first search in Example 12-7 is to find a customer whose first name is “Douglas”:
var query = from customer in customerXml.Element("Customers").Elements("Customer") where customer.Attribute("FirstName").Value == "Douglas" select customer; XElement oneCustomer = query.SingleOrDefault(); if (oneCustomer != null) { Console.WriteLine(oneCustomer); } else { Console.WriteLine("Not found"); }
In general, you will have some ideas about the structure of XML
documents you are going to process; otherwise, it will be difficult to
find the information you want. Here we know the node we are looking for
sits just one level below the root element. So the source of the LINQ
query—the part after the in
keyword—fetches the root Customers
element using the singular
Element
method, and
then asks for all of its children called Customers
by using the plural Elements
method:
from customer in customerXml.Element("Customers").Elements("Customer")
We specify the search conditions with a where
clause, as we would do in any LINQ
query. In this case, we want to search on the value of the FirstName
attribute:
where customer.Attribute("FirstName").Value == "Douglas"
The select
clause is trivial—we
just want the query to return all matching elements. Finally, we execute
the query using the standard LINQ SingleOrDefault
operator, which, as you may
recall, returns the one result of the query, unless it failed to match
anything, in which case it will return null. (And if there are multiple
matches, it throws an exception.) We therefore test the result against
null before attempting to use it:
if (oneCustomer != null) { Console.WriteLine(oneCustomer); } else { Console.WriteLine("Not found"); }
In this example, the method is successful, and the resultant element is displayed.
In practice, you don’t always know exactly where the information you require will be in the XML document when you write the code. For these cases, LINQ to XML provides the ability to search in different ways—if you are familiar with the XPath query language[25] for XML, this is equivalent to the XPath concept of a search axis. This specifies the relationship between the element you’re starting from and the search target nodes.
The Element
and Elements
methods we used earlier only ever
search one level—they look in the children of whatever object you call
them on. But we can instead use the Descendants
method to
look not just in the children, but also in their children’s children,
and so on. So the source for the next query in Example 12-7 looks for all elements
called Customer
anywhere in the
document. This is more compact, but also less precise.
query = from customer in customerXml.Descendants("Customer")
Other methods available for querying along different axes include
Parent
, Ancestors
, ElementsAfterSelf
, ElementsBeforeSelf
, and Attributes
. The first two look up the tree and
are similar to Elements
and Descendants
, in that Parent
looks up just one level, while Ancestors
will search up through the document
all the way to the root. ElementsBeforeSelf
and ElementsAfterSelf
search for elements that
have the same parent as the current item, and which appear either before
or after it in the document. Attributes
searches in an element’s attributes
rather than its child elements. (If you are familiar with XPath, you
will know that these correspond to the parent
, ancestor
, following-sibling
, preceding-sibling
, and attribute
axes.)
The first query in Example 12-7 included a where
clause that looked for a particular
attribute value on an element. You can, of course, use other criteria.
The third query looks at the content of the element itself—it uses the
Value
property to extract the content
as text:
where emailAddress.Value == "[email protected]"
You can get more ambitious, though—the where
clause can dig further into the
structure of the XML. The fourth query’s where
clause lets through only those elements
whose child EmailAddress
element has
a particular value:
where customer.Element("EmailAddress").Value == "[email protected]"
So far, our code has constructed the objects representing
the Customer
XML elements by hand. As
XML is becoming popular, especially with the increasingly widespread use
of web services, it can be useful to automate this process. If you expect
to work with XML elements that always have a particular structure, it can
be convenient to serialize objects to or from XML. Working with
conventional objects can be a lot easier than using lots of explicit XML
code.
The .NET Framework provides a built-in serialization mechanism to
reduce the coding efforts by application developers. The System.Xml.Serialization
namespace defines the
classes and utilities that implement methods required for serializing and
deserializing objects. Example 12-8 illustrates
this.
Example 12-8. Simple XML serialization and deserialization
using System; using System.IO; using System.Xml.Serialization; namespace Programming_CSharp { // Simple customer class public class Customer { public string FirstName { get; set; } public string LastName { get; set; } public string EmailAddress { get; set; } // Overrides the Object.ToString() to provide a // string representation of the object properties. public override string ToString() { return string.Format("{0} {1} Email: {2}", FirstName, LastName, EmailAddress); } } // Main program public class Tester { static void Main() { Customer c1 = new Customer { FirstName = "Orlando", LastName = "Gee", EmailAddress = "[email protected]" }; XmlSerializer serializer = new XmlSerializer(typeof(Customer)); StringWriter writer = new StringWriter(); serializer.Serialize(writer, c1); string xml = writer.ToString(); Console.WriteLine("Customer in XML: {0} ", xml); Customer c2 = serializer.Deserialize(new StringReader(xml)) as Customer; Console.WriteLine("Customer in Object: {0}", c2.ToString()); Console.ReadKey(); } } } Output: Customer in XML: <?xml version="1.0" encoding="utf-16"?> <Customer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <FirstName>Orlando</FirstName> <LastName>Gee</LastName> <EmailAddress>[email protected]</EmailAddress> </Customer> Customer in Object: Orlando Gee Email: [email protected]
To serialize an object using .NET XML serialization, you need to
create an XmlSerializer
object:
XmlSerializer serializer = new XmlSerializer(typeof(Customer));
You must pass in the type of the object to be serialized to the
XmlSerializer
constructor. If you don’t
know the object type at design time, you can discover it by calling its
GetType()
method:
XmlSerializer serializer = new XmlSerializer(c1.GetType());
You also need to decide where the serialized XML document should be
stored. In this example, you simply send it to a StringWriter
:
StringWriter writer = new StringWriter(); serializer.Serialize(writer, c1); string xml = writer.ToString(); Console.WriteLine("Customer in XML: {0} ", xml);
The resultant XML string is then displayed on the console:
<?xml version="1.0" encoding="utf-16"?> <Customer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <FirstName>Orlando</FirstName> <LastName>Gee</LastName> <EmailAddress>[email protected]</EmailAddress> </Customer>
The first line is an XML declaration. This is to let the consumers (human users and software applications) of this document know that this is an XML file, the official version to which this file conforms, and the encoding format used. This is optional in XML, but this code always produces one.
The root element here is the Customer
element, with each property represented
as a child element. The xmlns:xsi
and
xmlns:xsd
attributes relate to the XML
Schema specification. They are optional, and don’t do anything useful in
this example, so we will not explain them further. If you are interested,
please read the XML specification or other documentation, such as the MSDN
Library, for more details.
Aside from those optional parts, this XML representation of the
Customer
object is equivalent to the
one created in Example 12-1. However,
instead of writing numerous lines of code to deal with the XML specifics,
you need only three lines using .NET XML serialization classes.
Furthermore, it is just as easy to reconstruct an object from its XML form:
Customer c2 = serializer.Deserialize(new StringReader(xml)) as Customer; Console.WriteLine("Customer in Object: {0}", c2.ToString());
All it needs is to call the XmlSerializer.Deserialize
method. It has several overloaded versions, one of which takes a TextReader
instance as an input parameter.
Because StringReader
is derived from
TextReader
, you just pass an instance
of StringReader
to read from the XML
string. The Deserialize
method returns
an object, so it is necessary to cast it to the correct type.
Of course, there’s a price to pay. XML serialization is less flexible than working with the XML APIs directly—with serialization you decide exactly what XML elements and attributes you expect to see when you write the code. If you need to be able to adapt dynamically to elements whose names you only learn at runtime, you will need to stick with the XML-aware APIs.
By default, all public read/write properties are serialized as child elements. You can customize your classes by specifying the type of XML node you want for each of your public properties, as shown in Example 12-9.
Example 12-9. Customizing XML serialization with attributes
using System; using System.IO; using System.Xml.Serialization; namespace Programming_CSharp { // Simple customer class public class Customer { [XmlAttribute] public string FirstName { get; set; } [XmlIgnore] public string LastName { get; set; } public string EmailAddress { get; set; } // Overrides the Object.ToString() to provide a // string representation of the object properties. public override string ToString() { return string.Format("{0} {1} Email: {2}", FirstName, LastName, EmailAddress); } } // Main program public class Tester { static void Main() { Customer c1 = new Customer { FirstName = "Orlando", LastName = "Gee", EmailAddress = "[email protected]" }; //XmlSerializer serializer = new XmlSerializer(c1.GetType()); XmlSerializer serializer = new XmlSerializer(typeof(Customer)); StringWriter writer = new StringWriter(); serializer.Serialize(writer, c1); string xml = writer.ToString(); Console.WriteLine("Customer in XML: {0} ", xml); Customer c2 = serializer.Deserialize(new StringReader(xml)) as Customer; Console.WriteLine("Customer in Object: {0}", c2.ToString()); Console.ReadKey(); } } } Output: Customer in XML: <?xml version="1.0" encoding="utf-16"?> <Customer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" FirstName="Orlando"> <EmailAddress>[email protected]</EmailAddress> </Customer> Customer in Object: Orlando Email: [email protected]
The only changes in this example are a couple of XML serialization
attributes added in the Customer
class:
[XmlAttribute] public string FirstName { get; set; }
The first change is to specify that you want to serialize the
FirstName
property into an attribute
of the Customer
element by adding the
XmlAttributeAttribute
to the
property:
[XmlIgnore] public string LastName { get; set; }
The other change is to tell XML serialization that you in fact do
not want the LastName
property to be
serialized at all. You do this by adding the XmlIgnoreAttribute
to the property. As you can
see from the sample output, the Customer
object is serialized without LastName
, exactly as we asked.
However, you have probably noticed that when the object is
deserialized, its LastName
property
is lost. Because it is not serialized, the XmlSerializer
is unable to assign it any
value. Therefore, its value is left as the default, which is an empty
string. So in practice, you would exclude from serialization only those
properties you don’t need or can compute or can retrieve in other
ways.
In this chapter, we saw how to use the LINQ to XML classes to build objects representing the structure of an XML document, which can then be converted into an XML document, and we saw how the same classes can be used to load XML from a string or file back into memory as objects. These classes support LINQ, both for building new XML documents and for searching for information in existing XML documents. And we also saw how XML serialization can hide some of the details of XML handling behind ordinary C# classes in situations where you know exactly what structure of XML to expect.
[25] XPath is supported by both LINQ to XML and the DOM APIs. (Unless you’re using Silverlight, in which case the DOM API is missing entirely, and the XPath support is absent from LINQ to XML.) So if you prefer that, you can use it instead, or you can use a mixture of LINQ and XPath.