Chapter 17. XML in Excel 2007

IN THIS CHAPTER

One of the features introduced in Office 2003 was the capability to better handle XML data. If you are working with a company that offers XML data and schemas, the opportunities are endless. Even if your company does not offer XML data, you can still test out these features using publicly available XML data. Later in this chapter, we look at using XML to retrieve data from Amazon.com.

Note that support for XML was only in certain editions of Excel 2003. You either needed the full boxed version of Excel 2003 or the Enterprise Edition of Excel 2003 to use the XML features. In Excel 2007, Microsoft has made the XML functionality available in every edition.

What Is XML?

If you have ever looked at the source code for a Web page, you are familiar with HTML tags. Near the top of a Web page’s HTML source, you will see a tag that defines the Web page title to appear in the blue bar at the top of the browser window. The tag might look something like this:

<TITLE>Excel Tips from MrExcel</TITLE>

A Web page can contain tags to identify titles, paragraphs, tables, rows within tables, and so on. This is the language of HTML.

XML is like HTML on steroids. With XML, you can define absolutely any fields in XML. The following XML file contains information about today’s orders received. There is no magic in creating an XML file; I actually typed this into Notepad:

<TodaysOrders>
    <SalesOrder>
        <Customer>BCA Co</Customer>
        <Address>123 North</Address>
        <City>Stow</City>
        <State>OH</State>
        <Zip>44224</Zip>
        <ItemSKU>23456</ItemSKU>
        <Quantity>500</Quantity>
        <UnitPrice>21.75</UnitPrice>
    </SalesOrder>
    <SalesOrder>
        <Customer>DEF Co</Customer>
        <Address>234 Carapace Lane</Address>
        <City>South Bend</City>
        <State>IN</State>
        <Zip>44685</Zip>
        <ItemSKU>34567</ItemSKU>
        <Quantity>20</Quantity>
        <UnitPrice>50.00</UnitPrice>
    </SalesOrder>
</TodaysOrders>

Simple XML Rules

If you are familiar with HTML, you need to aware of a few simple rules that differentiate XML from HTML:

  • Every data element has to begin and end with an identical tag. Tag names are case sensitive. <TagName>Data</Tagname> is not valid. <TagName>Data</TagName> is valid.

  • The XML file must begin and end with a root tag. There can be only one root tag in the file. In the example earlier, the root tag is <TodaysOrders>.

  • It is valid to have an empty tag. Put a slash at the end of the tag. If there is no zip code for an international order, for example, use <Zip/> to indicate that there is no data for this field for this record.

  • If you nest tags, the inner tag must be closed before you close the outer tag. This differs from HTML. In HTML, it is valid to code <b>XML is very <i>cool</b></i>. This is not valid in XML. <Item><a>data</a></Item> will work, but <Item><a>data</Item></a> will not.

For a more thorough discussion of XML, see Benoit Marchal’s XML by Example (978-0-7897-2504-2).

Universal File Format

For many years, CSV was considered the universal file format. Just about any application could produce data in comma-separated values, and just about any spreadsheet could read in CSV data. XML promises to become the universal file format of the future, and it is much more powerful.

You may deal with CSV data every day. As developers, we need to talk to the other developer who is generating the CSV data, and we have to understand that the fifth column contains a zip code, and the sixth column contains a product SKU. If the source system ever leaves out a field, we are likely to end up with wrong results. For years, CSV has been the universal format to get data from another system into a spreadsheet. With CSV, both developers must understand that the fifth column contains a zip code, and the sixth column contains a product SKU (see Figure 17.1).

Open a CSV file and you may have this confusing view of the data.

Figure 17.1. Open a CSV file and you may have this confusing view of the data.

XML as the New Universal File Format

XML is much more powerful. If you use Excel to open the simple XML file shown in Figure 17.1, Excel can offer headings and can even infer something called the schema of the data. A schema is a separate file that describes the columns and relationships inherent in the data. If the source system would happen to be missing a zip code for a record, we do not have to worry about the SKU moving over from Column F. Compare Figure 17.2 with the CSV example shown in Figure 17.1. Excel can present a much more meaningful view of the data.

Data can be written in XML format by any system and intelligently read into Excel. Unlike CSV data, the format of XML data is something Excel can understand.

Figure 17.2. Data can be written in XML format by any system and intelligently read into Excel. Unlike CSV data, the format of XML data is something Excel can understand.

If you insert new records into the XML list in Excel, you can save the file back out as just the XML data. Use Office Icon, Save As, Other Formats. In the Save As Type drop-down, choose XML Data (*.xml). Excel can natively read from and write to XML files, allowing you to share data between applications.

After you save the document as an XML table, Excel nicely formats the XML with an appropriate structure and tab characters for subelements. This allows the data to be viewed in a simple text editor such as Notepad (see Figure 17.3).

After you add records to the XML data in Excel and save, the resulting XML is neatly formatted.

Figure 17.3. After you add records to the XML data in Excel and save, the resulting XML is neatly formatted.

The Alphabet Soup of XML

Note that in the preceding example we had only an XML file, and Excel was able to accurately read the data, allow editing, and write the data back out for use by another application.

Two additional file types—schemas and transforms—enhance XML files.

Although the XML file contains the data and field names, an XML schema file defines data relationships and data validation requirements. For example, a zip code field could require five numeric digits. XML schemas are typically stored in XSD files.

XSL files are called transforms or solutions. A transform describes how the fields in the XML file should be mapped to your document. If your data contains 20 elements, you can define in the transform file that you want to see only particular elements in this spreadsheet. You can have many XSL files for a particular schema to enable many views of the same data.

If you are reading someone else’s XML data, XSD and XSL files have probably been provided for you. If you are reading your own XML data, Excel actually infers a data schema for you. After opening the XML file with TodaysOrders introduced at the start of this chapter, you can go to the immediate pane of the VB Editor and retrieve the XSD file. Type the following:

Print ActiveWorkbook.XmlMaps(1).Schemas(1).xml

Copy the result to a Notepad window and save as TodaysOrders.xsd. Figure 17.4 shows the inferred XSD file, although I have added line breaks and spacing to improve the readability.

Excel is able to infer a default schema for any XML file that it encounters. Having an XSD schema file is a requirement to use any higher-level XML features such as repurposing data.

Figure 17.4. Excel is able to infer a default schema for any XML file that it encounters. Having an XSD schema file is a requirement to use any higher-level XML features such as repurposing data.

The final step to fully use XML in Office 2007 is to create XSL transform files. There is no easy way to create this with the tools in Excel. For a discussion of the rather awkward process of manually creating an XSL file, see www.mrexcel.com/tip064.shtml.

Microsoft’s Use of XML as a File Type

Microsoft’s Use of XML as a File Type

Microsoft has been rapidly improving their support for XML:

  • In Excel 2002, you were first able to read from an XML file.

  • In Excel 2003, Excel could write XML files. This was also the first version in which you could opt to save Excel or Word files as XML rather than the usual binary XLS or DOC file type. In the previous edition of this book, we marveled at the ability for Excel to round-trip a file to XML and back to Excel without losing any formulas or formatting (see Figure 17.5). In Excel 2003, the XML file type would not support charts or VBA.

    In Excel 2003, most spreadsheet elements, except charts and VBA, were supported by the XML file type.

    Figure 17.5. In Excel 2003, most spreadsheet elements, except charts and VBA, were supported by the XML file type.

  • In Excel 2007, the default file type for Excel files is a zipped XML file. The next section describes how Excel 2007 stores workbooks with XML.

How Excel 2007 Stores Workbooks with XML

Excel 2007 introduces new file formats such as XLSX, XLSM, and XLSB. The last file format is still a proprietary binary format. However, the XLSX and XLSM file formats are 100 percent XML formats. Excel saves the workbook, including charts, SmartArt graphics, formulas, formatting, and numbers in various XML files. All the XML formats are then zipped into a single file and the extension is changed from .zip to .xlsk.

In addition, the XLSM format includes support for macros, modules, and userforms.

You can easily explore the inner workings of an Excel 2007 file by following these steps.

  1. Create an Excel file. Add formulas, formatting, a chart, and a bit of VBA macro code.

  2. Save the file as an XLSM file.

  3. Close the file.

  4. Using Windows Explorer, browse to the folder that contains the file.

  5. Rename the file to have a .zip extension. You may be warned that this might make the file unusable. This is okay.

  6. Open the file with WinZip or your favorite unzipping utility. You will see that the file is composed of many XML parts. The chart and the VBA will each get its own XML part. You will have an XML part for each worksheet. The calculation tree is now fully exposed and explorable in CalcChain.xml.

Figure 17.6 shows the original workbook, the files in the zip file and one of the XML parts. Excel stores shared strings here in order to make the file size smaller.

In Excel 2007, Microsoft stores most spreadsheet parts in plain text XML files, and then zips them into a single file.

Figure 17.6. In Excel 2007, Microsoft stores most spreadsheet parts in plain text XML files, and then zips them into a single file.

 

Next Steps

Although XML enables you to transfer data between unrelated applications, you already can transfer data programmatically between applications in the Microsoft Office suite. Chapter 18, “Automating Word,” looks at using Excel VBA to automate and control Microsoft Word.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset