Chapter 16. XML

XML

XML (eXtensible Markup Language) is a general-purpose text file format that is popular for data interchange and data storage. It was developed by the World Wide Web Consortium (W3C) as a lightweight alternative to SGML (Standard Generalized Markup Language). The syntax is similar to HTML, but XML is a metalanguage and as such does not mandate specific tags, attributes, or entities. The XML-compliant version of HTML is called XHTML.

For the popular SVG (Scalable Vector Graphics) XML format, the QtSvg module provides classes that can load and render SVG images. For rendering documents that use the MathML (Mathematical Markup Language) XML format, the QtMmlWidget from Qt Solutions can be used.

For general XML processing, Qt provides the QtXml module, which is the subject of this chapter.[*] The QtXml module offers three distinct APIs for reading XML documents:

  • QXmlStreamReader is a fast parser for reading well-formed XML.

  • DOM (Document Object Model) converts an XML document into a tree structure, which the application can then navigate.

  • SAX (Simple API for XML) reports “parsing events” directly to the application through virtual functions.

The QXmlStreamReader class is the fastest and easiest to use and offers an API that is consistent with the rest of Qt. It is ideal for writing one-pass parsers. DOM’s main benefit is that it lets us navigate a tree representation of the XML document in any order, allowing us to implement multi-pass parsing algorithms. Some applications even use the DOM tree as their primary data structure. SAX is provided mainly for historical reasons; using QXmlStreamReader usually leads to simpler and faster code.

For writing XML files, Qt also offers three options:

  • We can use a QXmlStreamWriter.

  • We can represent the data as a DOM tree in memory and ask the tree to write itself to a file.

  • We can generate the XML by hand.

Using QXmlStreamWriter is by far the easiest approach, and is more reliable than hand-generating XML. Using DOM to produce XML really makes sense only if a DOM tree is already used as the application’s primary data structure. All three approaches to reading and writing XML are shown in this chapter.

Reading XML with QXmlStreamReader

Using QXmlStreamReader is the fastest and easiest way to read XML in Qt. Because the parser works incrementally, it is particularly useful for finding all occurrences of a given tag in an XML document, for reading very large files that may not fit in memory, and for populating custom data structures to reflect an XML document’s contents.

The QXmlStreamReader parser works in terms of the tokens listed in Figure 16.1. Each time the readNext() function is called, the next token is read and becomes the current token. The current token’s properties depend on the token’s type and are accessible using the getter functions listed in the table.

Table 16.1. The QXmlStreamReader’s tokens

Token Type

Example

Getter Functions

StartDocument

N/A

isStandaloneDocument()

EndDocument

N/A

isStandaloneDocument()

StartElement

<item>

namespaceUri(), name(), attributes(), namespaceDeclarations()

EndElement

</item>

namespaceUri(), name()

Characters

AT&amp;T

text(), isWhitespace(), isCDATA()

Comment

<!-- fix -->

text()

DTD

<!DOCTYPE ...>

text(), notationDeclarations(), entityDeclarations()

EntityReference

&trade;

name(), text()

ProcessingInstruction

<?alert?>

processingInstructionTarget(), processingInstructionData()

Invalid

>&<!

error(), errorString()

Consider the following XML document:

<doc>
    <quote>Einmal ist keinmal</quote>
</doc>

If we parse this document, each readNext() call will produce a new token, with extra information available using getter functions:

StartDocument
StartElement (name() == "doc")
StartElement (name() == "quote")
Characters (text() == "Einmal ist keinmal")
EndElement (name() == "quote")
EndElement (name() == "doc")
EndDocument

After each readNext() call, we can test for the current token’s type using isStartElement(), isCharacters(), and similar functions, or simply using state().

We will review an example that shows how to use QXmlStreamReader to parse an ad hoc XML file format and render its contents in a QTreeWidget. The format we will parse is that of a book index, with index entries and sub-entries. Here’s the book index file that is displayed in the QTreeWidget in Figure 16.2:

<?xml version="1.0"?>
<bookindex>
    <entry term="sidebearings">
        <page>10</page>
        <page>34-35</page>
        <page>307-308</page>
    </entry>
    <entry term="subtraction">
        <entry term="of pictures">
            <page>115</page>
            <page>244</page>
        </entry>
        <entry term="of vectors">
            <page>9</page>
        </entry>
    </entry>
</bookindex>
The XML Stream Reader application

Figure 16.2. The XML Stream Reader application

We will begin by looking at an extract from the application’s main() function, to see how the XML reader is used in context, and then we will look at the reader’s implementation.

int main(int argc, char *argv[])
{
    QApplication app(argc, argv);
    QStringList args = QApplication::arguments();
    ...
    QTreeWidget treeWidget;
    ...
    XmlStreamReader reader(&treeWidget);
    for (int i = 1; i < args.count(); ++i)
        reader.readFile(args[i]);
    return app.exec();
}

The application shown in Figure 16.2 begins by creating a QTreeWidget. It then creates an XmlStreamReader, passing it the tree widget and asking it to parse each file specified on the command line.

class XmlStreamReader
{
public:
    XmlStreamReader(QTreeWidget *tree);

    bool readFile(const QString &fileName);

private:
    void readBookindexElement();
    void readEntryElement(QTreeWidgetItem *parent);
    void readPageElement(QTreeWidgetItem *parent);
    void skipUnknownElement();

    QTreeWidget *treeWidget;
    QXmlStreamReader reader;
};

The XmlStreamReader class provides two public functions: the constructor and parseFile(). The class uses a QXmlStreamReader instance to parse the XML file, and populates the QTreeWidget to reflect the XML data that is read. The parsing is done using recursive descent:

  • readBookindexElement() parses a <bookindex>...</bookindex> element that contains zero or more <entry> elements.

  • readEntryElement() parses an <entry>...</entry> element that contains zero or more <page> elements and zero or more <entry> elements nested to any depth.

  • readPageElement() parses a <page>...</page> element.

  • skipUnknownElement() skips an unrecognized element.

We will now look at the XmlStreamReader class’s implementation, beginning with the constructor.

XmlStreamReader::XmlStreamReader(QTreeWidget *tree)
{
    treeWidget = tree;
}

The constructor is used only to establish which QTreeWidget the reader should use. All the action takes place in the readFile() function (called from main()), which we will look at in three parts.

bool XmlStreamReader::readFile(const QString &fileName)
{
    QFile file(fileName);
    if (!file.open(QFile::ReadOnly | QFile::Text)) {
        std::cerr << "Error: Cannot read file " << qPrintable(fileName)
                  << ": " << qPrintable(file.errorString())
                  << std::endl;
        return false;
    }
    reader.setDevice(&file);

The readFile() function begins by trying to open the file. If it fails, it outputs an error message and returns false. If the file is opened successfully, it is set as the QXmlStreamReader’s input device.

    reader.readNext();
    while (!reader.atEnd()) {
        if (reader.isStartElement()) {
            if (reader.name() == "bookindex") {
                readBookindexElement();
            } else {
                reader.raiseError(QObject::tr("Not a bookindex file"));
            }
        } else {
            reader.readNext();
        }
    }

The QXmlStreamReader’s readNext() function reads the next token from the input stream. If a token is successfully read and the end of the XML file has not been reached, the function enters the while loop. Because of the structure of the index files, we know that inside this loop there are just three possibilities: A <bookindex> start tag has just been read, another start tag has been read (in which case the file is not a book index), or some other token has been read.

If we have the correct start tag, we call readBookindexElement() to continue processing. Otherwise, we call QXmlStreamReader::raiseError() with an error message. The next time atEnd() is called (in the while loop condition), it will return true. This ensures that parsing stops as soon as possible after an error has been encountered. The error can be queried later by calling error() and errorString() on the QFile. An alternative would have been to return right away when we detect an error in the book index file. Using raiseError() is usually more convenient, because it lets us use the same error-reporting mechanism for low-level XML parsing errors, which are raised automatically when QXmlStreamReader runs into invalid XML, and for application-specific errors.

    file.close();
    if (reader.hasError()) {
        std::cerr << "Error: Failed to parse file "
                  << qPrintable(fileName) << ": "
                  << qPrintable(reader.errorString()) << std::endl;
        return false;
    } else if (file.error() != QFile::NoError) {
        std::cerr << "Error: Cannot read file " << qPrintable(fileName)
                  << ": " << qPrintable(file.errorString())
                  << std::endl;
        return false;
    }
    return true;
}

Once the processing has finished, the file is closed. If there was a parser error or a file error, the function outputs an error message and returns false; otherwise, it returns true to report a successful parse.

void XmlStreamReader::readBookindexElement()
{
    reader.readNext();
    while (!reader.atEnd()) {
        if (reader.isEndElement()) {
            reader.readNext();
            break;
        }

        if (reader.isStartElement()) {
            if (reader.name() == "entry") {
                readEntryElement(treeWidget->invisibleRootItem());
            } else {
                skipUnknownElement();
            }
        } else {
            reader.readNext();
        }
    }
}

The readBookindexElement() is responsible for reading the main part of the file. It starts by skipping the current token (which at this point can be only a <bookindex> start tag) and then loops over the input.

If an end tag is read, it can be only the </bookindex> tag, since otherwise, QXmlStreamReader would have reported an error (UnexpectedElementError). In that case, we skip the tag and break out of the loop. Otherwise, we should have a top-level index <entry> start tag. If this is the case, we call readEntryElement() to process the entry’s data; if not, we call skipUnknownElement(). Using skipUnknownElement() rather than calling raiseError() means that if we extend the book index format in the future to include new tags, this reader will continue to work, since it will simply ignore the tags it does not recognize.

The readEntryElement() takes a QTreeWidgetItem * argument that identifies a parent item. We pass QTreeWidget::invisibleRootItem() as the parent to make the new items root items. In readEntryElement(), we will call readEntryElement() recursively, with a different parent.

void XmlStreamReader::readEntryElement(QTreeWidgetItem *parent)
{
    QTreeWidgetItem *item = new QTreeWidgetItem(parent);
    item->setText(0, reader.attributes().value("term").toString());

    reader.readNext();
    while (!reader.atEnd()) {
        if (reader.isEndElement()) {
            reader.readNext();
            break;
        }

        if (reader.isStartElement()) {
            if (reader.name() == "entry") {
                readEntryElement(item);
            } else if (reader.name() == "page") {
                readPageElement(item);
            } else {
                skipUnknownElement();
            }
        } else {
            reader.readNext();
        }
    }
}

The readEntryElement() function is called whenever an <entry> start tag is encountered. We want a tree widget item to be created for every index entry, so we create a new QTreeWidgetItem, and set its first column’s text to be the entry’s term attribute’s text.

Once the entry has been added to the tree, the next token is read. If it is an end tag, we skip the tag and break out of the loop. If a start tag is encountered, it will be an <entry> tag (signifying a sub-entry), a <page> tag (a page number for this entry), or an unknown tag. If the start tag is a sub-entry, we call readEntryElement() recursively. If the tag is a <page> tag, we call readPageElement().

void XmlStreamReader::readPageElement(QTreeWidgetItem *parent)
{
    QString page = reader.readElementText();
    if (reader.isEndElement())
        reader.readNext();

    QString allPages = parent->text(1);
    if (!allPages.isEmpty())
        allPages += ", ";
    allPages += page;
    parent->setText(1, allPages);
}

The readPageElement() function is called whenever we get a <page> tag. It is passed the tree item that corresponds to the entry to which the page text belongs. We begin by reading the text between the <page> and </page> tags. On success, the readElementText() function will leave the parser on the </page> tag, which we must skip.

The pages are stored in the tree widget item’s second column. We begin by extracting the text that is already there. If the text is not empty, we append a comma to it, ready for the new page text. We then append the new text and update the column’s text accordingly.

void XmlStreamReader::skipUnknownElement()
{
    reader.readNext();
    while (!reader.atEnd()) {
        if (reader.isEndElement()) {
            reader.readNext();
            break;
        }

        if (reader.isStartElement()) {
            skipUnknownElement();
        } else {
            reader.readNext();
        }
    }
}

Finally, when unknown tags are encountered, we keep reading until we get the unknown element’s end tag, which we also skip. This means that we will skip over well-formed but unrecognized elements, and read as much of the recognizable data as possible from the XML file.

The example presented here could be used as the basis for similar XML recursive descent parsers. Nonetheless, sometimes implementing a parser like this can be tricky, if a readNext() call is missing or out of place. Some programmers address the problem by using assertions in their code. For example, at the beginning of readBookindexElement(), we could add the line

Q_ASSERT(reader.isStartElement() && reader.name() == "bookindex");

A similar assertion could be made in the readEntryElement() and readPageElement() functions. For skipUnknownElement(), we would simply assert that we have a start element.

A QXmlStreamReader can take input from any QIODevice, including QFile, QBuffer, QProcess, and QTcpSocket. Some input sources may not be able to provide the data that the parser needs when it needs it—for example, due to network latency. It is still possible to use QXmlStreamReader under such circumstances; more information on this is provided in the reference documentation for QXmlStreamReader under the heading “Incremental Parsing”.

The QXmlStreamReader class used in this application is part of the QtXml library. To link against this library, we must add this line to the .pro file:

QT += xml

In the next two sections, we will see how to write the same application with DOM and SAX.

Reading XML with DOM

DOM is a standard API for parsing XML developed by the W3C. Qt provides a non-validating DOM Level 2 implementation for reading, manipulating, and writing XML documents.

DOM represents an XML file as a tree in memory. We can navigate through the DOM tree as much as we want, and we can modify the tree and save it back to disk as an XML file.

Let’s consider the following XML document:

<doc>
    <quote>Scio me nihil scire</quote>
    <translation>I know that I know nothing</translation>
</doc>

It corresponds to the following DOM tree:

Reading XML with DOM

The DOM tree contains nodes of different types. For example, an Element node corresponds to an opening tag and its matching closing tag. The material that falls between the tags appears as child nodes of the Element node. In Qt, the node types (like all other DOM-related classes) have a QDom prefix; thus, QDomElement represents an Element node, and QDomText represents a Text node.

Different types of nodes can have different kinds of child nodes. For example, an Element node can contain other Element nodes, as well as EntityReference, Text, CDATASection, ProcessingInstruction, and Comment nodes. Figure 16.3 shows which nodes can have which kinds of child nodes. The nodes shown in gray cannot have any child nodes of their own.

Parent–child relationships between DOM nodes

Figure 16.3. Parent–child relationships between DOM nodes

To illustrate how to use DOM for reading XML files, we will write a parser for the book index file format described in the preceding section (p. 389).

class DomParser
{
public:
    DomParser(QTreeWidget *tree);

    bool readFile(const QString &fileName);

private:
    void parseBookindexElement(const QDomElement &element);
    void parseEntryElement(const QDomElement &element,
                           QTreeWidgetItem *parent);
    void parsePageElement(const QDomElement &element,
                          QTreeWidgetItem *parent);
    QTreeWidget *treeWidget;
};

We define a class called DomParser that will parse a book index XML file and display the result in a QTreeWidget.

DomParser::DomParser(QTreeWidget *tree)
{
    treeWidget = tree;
}

In the constructor, we simply assign the given tree widget to the member variable. All the parsing is done inside the readFile() function.

bool DomParser::readFile(const QString &fileName)
{
    QFile file(fileName);
    if (!file.open(QFile::ReadOnly | QFile::Text)) {
        std::cerr << "Error: Cannot read file " << qPrintable(fileName)
                  << ": " << qPrintable(file.errorString())
                  << std::endl;
        return false;
    }

    QString errorStr;
    int errorLine;
    int errorColumn;

    QDomDocument doc;
    if (!doc.setContent(&file, false, &errorStr, &errorLine,
                        &errorColumn)) {
        std::cerr << "Error: Parse error at line " << errorLine << ", "
                  << "column " << errorColumn << ": "
                  << qPrintable(errorStr) << std::endl;
        return false;
    }

    QDomElement root = doc.documentElement();
    if (root.tagName() != "bookindex") {
        std::cerr << "Error: Not a bookindex file" << std::endl;
        return false;
    }

    parseBookindexElement(root);
    return true;
}

In readFile(), we begin by trying to open the file whose name was passed in. If an error occurs, we output an error message and return false to signify failure. Otherwise, we set up some variables to hold parse error information, should they be needed, and then create a QDomDocument. When we call setContent() on the DOM document, the entire XML document provided by the QIODevice is read and parsed. The setContent() function automatically opens the device if it isn’t already open. The false argument to setContent() disables namespace processing; refer to the QtXml reference documentation for an introduction to XML namespaces and how to handle them in Qt.

If an error occurs, we output an error message and return false to indicate failure. If the parse is successful, we call documentElement() on the QDomDocument to obtain its single QDomElement child, and we check that it is a <bookindex> element. If we have a <bookindex>, we call parseBookindexElement() to parse it. As in the preceding section, the parsing is done using recursive descent.

void DomParser::parseBookindexElement(const QDomElement &element)
{
    QDomNode child = element.firstChild();
    while (!child.isNull()) {
        if (child.toElement().tagName() == "entry")
            parseEntryElement(child.toElement(),
                              treeWidget->invisibleRootItem());
        child = child.nextSibling();
    }
}

In parseBookindexElement(), we iterate over all the child nodes. We expect each node to be an <entry> element, and for each one that is, we call parseEntry() to parse it. We ignore unknown nodes, to allow for the book index format to be extended in the future without preventing old parsers from working. All <entry> nodes that are direct children of the <bookindex> node are top-level nodes in the tree widget we are populating to reflect the DOM tree, so when we want to parse each one we pass both the node element and the tree’s invisible root item to be the widget tree item’s parent.

The QDomNode class can store any type of node. If we want to process a node further, we must first convert it to the right data type. In this example, we only care about Element nodes, so we call toElement() on the QDomNode to convert it to a QDomElement and then call tagName() to retrieve the element’s tag name. If the node is not of type Element, the toElement() function returns a null QDomElement object, with an empty tag name.

void DomParser::parseEntryElement(const QDomElement &element,
                                  QTreeWidgetItem *parent)
{
    QTreeWidgetItem *item = new QTreeWidgetItem(parent);
    item->setText(0, element.attribute("term"));

    QDomNode child = element.firstChild();
    while (!child.isNull()) {
        if (child.toElement().tagName() == "entry") {
            parseEntryElement(child.toElement(), item);
        } else if (child.toElement().tagName() == "page") {
            parsePageElement(child.toElement(), item);
        }
        child = child.nextSibling();
    }
}

In parseEntryElement(), we create a tree widget item. The parent item that is passed in is either the tree’s invisible root item (if this is a top-level entry) or another entry (if this is a sub-entry). We call setText() to set the text shown in the item’s first column to the value of the <entry> tag’s term attribute.

Once we have initialized the QTreeWidgetItem, we iterate over the child nodes of the QDomElement node corresponding to the current <entry> tag. For each child element that is an <entry> tag, we call parseEntryElement() recursively with the current item as the second argument. Each child’s QTreeWidgetItem will then be created with the current entry as its parent. If the child element is a <page>, we call parsePageElement().

void DomParser::parsePageElement(const QDomElement &element,
                                 QTreeWidgetItem *parent)
{
    QString page = element.text();
    QString allPages = parent->text(1);
    if (!allPages.isEmpty())
         allPages += ", ";
    allPages += page;
    parent->setText(1, allPages);
}

In parsePageElement(), we call text() on the element to obtain the text that occurs between the <page> and </page> tags; then we add the text to the comma-separated list of page numbers in the QTreeWidgetItem’s second column. The QDomElement::text() function navigates through the element’s child nodes and concatenates all the text stored in Text and CDATA nodes.

Let’s now see how we can use the DomParser class to parse a file:

int main(int argc, char *argv[])
{
    QApplication app(argc, argv);
    QStringList args = QApplication::arguments();
    ...
    QTreeWidget treeWidget;
    ...
    DomParser parser(&treeWidget);
    for (int i = 1; i < args.count(); ++i)
        parser.readFile(args[i]);

    return app.exec();
}

We start by setting up a QTreeWidget. Then we create a DomParser. For each file listed on the command line, we call DomParser::readFile() to open and parse each file and populate the tree widget.

Like the previous example, we need the following line in the application’s .pro file to link against the QtXml library:

QT += xml

As the example illustrates, navigating through a DOM tree is straightforward, although not quite as convenient as using QXmlStreamReader. Programmers who use DOM a lot often write their own higher-level wrapper functions to simplify commonly needed operations.

Reading XML with SAX

SAX is a public domain de facto standard API for reading XML documents. Qt’s SAX classes are modeled after the SAX2 Java implementation, with some differences in naming to match the Qt conventions. Compared with DOM, SAX is more low-level and usually faster. But since the QXmlStreamReader class presented earlier in this chapter offers a more Qt-like API and is faster than the SAX parser, the main use of the SAX parser is for porting code that uses the SAX API into Qt. For more information about SAX, see http://www.saxproject.org/.

Qt provides a SAX-based non-validating XML parser called QXmlSimpleReader. This parser recognizes well-formed XML and supports XML namespaces. When the parser goes through the document, it calls virtual functions in registered handler classes to indicate parsing events. (These “parsing events” are unrelated to Qt events, such as key and mouse events.) For example, let’s assume the parser is analyzing the following XML document:

<doc>
    <quote>Gnothi seauton</quote>
</doc>

The parser would call the following parsing event handlers:

startDocument()
startElement("doc")
startElement("quote")
characters("Gnothi seauton")
endElement("quote")
endElement("doc")
endDocument()

The preceding functions are all declared in QXmlContentHandler. For simplicity, we omitted some of the arguments to startElement() and endElement().

QXmlContentHandler is just one of many handler classes that can be used in conjunction with QXmlSimpleReader. The others are QXmlEntityResolver, QXmlDTDHandler, QXmlErrorHandler, QXmlDeclHandler, and QXmlLexicalHandler. These classes only declare pure virtual functions and give information about different kinds of parsing events. For most applications, QXmlContentHandler and QXmlErrorHandler are the only two that are needed. The class hierarchy we have used is shown in Figure 16.4.

Inheritance tree for SaxHandler

Figure 16.4. Inheritance tree for SaxHandler

For convenience, Qt also provides QXmlDefaultHandler, a class that is derived from all the handler classes and that provides trivial implementations for all the functions. This design, with many abstract handler classes and one trivial subclass, is unusual for Qt; it was adopted to closely follow the model Java implementation.

The most significant difference between using the SAX API and QXmlStreamReader or the DOM API is that the SAX API requires us to manually keep track of the parser’s state using member variables, something that is not necessary in the other two approaches, which both allowed recursive descent.

To illustrate how to use SAX for reading XML files, we will write a parser for the book index file format described earlier in this chapter (p. 389). Here we will parse using a QXmlSimpleReader and a QXmlDefaultHandler subclass called SaxHandler.

The first step to implement the parser is to subclass QXmlDefaultHandler:

class SaxHandler : public QXmlDefaultHandler
{
public:
    SaxHandler(QTreeWidget *tree);

    bool readFile(const QString &fileName);

protected:
    bool startElement(const QString &namespaceURI,
                      const QString &localName,
                      const QString &qName,
                      const QXmlAttributes &attributes);
    bool endElement(const QString &namespaceURI,
                    const QString &localName,
                    const QString &qName);
    bool characters(const QString &str);
    bool fatalError(const QXmlParseException &exception);

private:
    QTreeWidget *treeWidget;
    QTreeWidgetItem *currentItem;
    QString currentText;
};

The SaxHandler class is derived from QXmlDefaultHandler and reimplements four functions: startElement(), endElement(), characters(), and fatalError(). The first three functions are declared in QXmlContentHandler; the last function is declared in QXmlErrorHandler.

SaxHandler::SaxHandler(QTreeWidget *tree)
{
    treeWidget = tree;
}

The SaxHandler constructor accepts the QTreeWidget we want to populate with the information stored in the XML file.

bool SaxHandler::readFile(const QString &fileName)
{
    currentItem = 0;

    QFile file(fileName);
    QXmlInputSource inputSource(&file);
    QXmlSimpleReader reader;
    reader.setContentHandler(this);
    reader.setErrorHandler(this);
    return reader.parse(inputSource);
}

This function is called when we have the name of a file to be parsed. We create a QFile object for the file and create a QXmlInputSource to read the file’s contents. Then we create a QXmlSimpleReader to parse the file. We set the reader’s content and error handlers to this class (SaxHandler), and then we call parse() on the reader to perform the parsing. In SaxHandler, we only reimplement functions from the QXmlContentHandler and QXmlErrorHandler classes; if we had implemented functions from other handler classes, we would also have needed to call the corresponding setXxxHandler() functions.

Instead of passing a simple QFile object to the parse() function, we pass a QXmlInputSource. This class opens the file it is given, reads it (taking into account any character encoding specified in the <?xml?> declaration), and provides an interface through which the parser reads the file.

bool SaxHandler::startElement(const QString & /* namespaceURI */,
                              const QString & /* localName */,
                              const QString &qName,
                              const QXmlAttributes &attributes)
{
    if (qName == "entry") {
        currentItem = new QTreeWidgetItem(currentItem ?
                currentItem : treeWidget->invisibleRootItem());
        currentItem->setText(0, attributes.value("term"));
    } else if (qName == "page") {
        currentText.clear();
    }
    return true;
}

The startElement() function is called when the reader encounters a new opening tag. The third parameter is the tag’s name (or more precisely, its “qualified name”). The fourth parameter is the list of attributes. In this example, we ignore the first and second parameters. They are useful for XML files that use XML’s namespace mechanism, a subject that is discussed in detail in the reference documentation.

If the tag is <entry>, we create a new QTreeWidgetItem. If the tag is nested within another <entry> tag, the new tag defines a sub-entry in the index, and the new QTreeWidgetItem is created as a child of the QTreeWidgetItem that represents the encompassing entry. Otherwise, we create the QTreeWidgetItem as a top-level item, using the tree widget’s invisible root item as its parent. We call setText() to set the text shown in column 0 to the value of the <entry> tag’s term attribute.

If the tag is <page>, we set the currentText variable to be an empty string. The variable serves as an accumulator for the text located between the <page> and </page> tags.

At the end, we return true to tell SAX to continue parsing the file. If we wanted to report unknown tags as errors, we would return false in those cases. We would then also reimplement errorString() from QXmlDefaultHandler to return an appropriate error message.

bool SaxHandler::characters(const QString &str)
{
    currentText += str;
    return true;
}

The characters() function is called to report character data in the XML document. We simply append the characters to the currentText variable.

bool SaxHandler::endElement(const QString & /* namespaceURI */,
                            const QString & /* localName */,
                            const QString &qName)
{
    if (qName == "entry") {
        currentItem = currentItem->parent();
    } else if (qName == "page") {
        if (currentItem) {
            QString allPages = currentItem->text(1);
            if (!allPages.isEmpty())
                allPages += ", ";
            allPages += currentText;
            currentItem->setText(1, allPages);
        }
    }
    return true;
}

The endElement() function is called when the reader encounters a closing tag. Just as with startElement(), the third parameter is the name of the tag.

If the tag is </entry>, we update the currentItem private variable to point to the current QTreeWidgetItem’s parent. (For historical reasons, top-level items return 0 as their parent rather than the invisible root item.) This ensures that the currentItem variable is restored to the value it held before the corresponding <entry> tag was read.

If the tag is </page>, we add the specified page number or page range to the comma-separated list in the current item’s text in column 1.

bool SaxHandler::fatalError(const QXmlParseException &exception)
{
    std::cerr << "Parse error at line " << exception.lineNumber()
              << ", " << "column " << exception.columnNumber() << ": "
              << qPrintable(exception.message()) << std::endl;
    return false;
}

The fatalError() function is called when the reader fails to parse the XML file. If this occurs, we simply print a message to the console, giving the line number, the column number, and the parser’s error text.

This completes the implementation of SaxHandler. The main() function that uses it is almost identical to the one we reviewed in the previous section for DomParser, the difference being that we use a SaxHandler rather than a DomParser.

Writing XML

Most applications that can read XML files also need to write such files. There are three approaches for generating XML files from Qt applications:

  • We can use a QXmlStreamWriter.

  • We can build a DOM tree and call save() on it.

  • We can generate XML by hand.

The choice between these approaches is mostly independent of whether we use QXmlStreamReader, DOM, or SAX for reading XML documents, although if the data is held in a DOM tree it often makes sense to save the tree directly.

Writing XML using the QXmlStreamWriter class is particularly easy since the class takes care of escaping special characters for us. If we wanted to output the book index data from a QTreeWidget using QXmlStreamWriter, we could do so using just two functions. The first function would take a file name and a QTreeWidget *, and would iterate over all the top-level items in the tree:

bool writeXml(const QString &fileName, QTreeWidget *treeWidget)
{
    QFile file(fileName);
    if (!file.open(QFile::WriteOnly | QFile::Text)) {
        std::cerr << "Error: Cannot write file "
                  << qPrintable(fileName) << ": "
                  << qPrintable(file.errorString()) << std::endl;
        return false;
    }

    QXmlStreamWriter xmlWriter(&file);
    xmlWriter.setAutoFormatting(true);
    xmlWriter.writeStartDocument();
    xmlWriter.writeStartElement("bookindex");
    for (int i = 0; i < treeWidget->topLevelItemCount(); ++i)
        writeIndexEntry(&xmlWriter, treeWidget->topLevelItem(i));
    xmlWriter.writeEndDocument();
    file.close();
    if (file.error()) {
        std::cerr << "Error: Cannot write file "
                  << qPrintable(fileName) << ": "
                  << qPrintable(file.errorString()) << std::endl;
        return false;
    }
    return true;
}

If we switch on auto-formatting, the XML is output in a more human-friendly style, with indentation used to show the data’s recursive structure. The writeStartDocument() function writes the XML header line

<?xml version="1.0" encoding="UTF-8"?>

The writeStartElement() function generates a new start tag with the given tag text. The writeEndDocument() function closes any open start tags. For each top-level item, we call writeIndexEntry(), passing it the QXmlStreamWriter, and the item to output. Here is the code for writeIndexEntry():

void writeIndexEntry(QXmlStreamWriter *xmlWriter, QTreeWidgetItem *item)
{
    xmlWriter->writeStartElement("entry");
    xmlWriter->writeAttribute("term", item->text(0));
    QString pageString = item->text(1);
    if (!pageString.isEmpty()) {
        QStringList pages = pageString.split(", ");
        foreach (QString page, pages)
            xmlWriter->writeTextElement("page", page);
    }
    for (int i = 0; i < item->childCount(); ++i)
        writeIndexEntry(xmlWriter, item->child(i));
    xmlWriter->writeEndElement();
}

The function creates an <entry> element corresponding to the QTreeWidgetItem it receives as a parameter. The writeAttribute() function adds an attribute to the tag that has just been written; for example, it might turn <entry> into <entry term="sidebearings">. If there are page numbers, they are split on comma-spaces, and for each one, a separate <page>...</page> tag pair is written, with the page text in between. This is all achieved by calling writeTextElement() and passing it a tag name and the text to put between the start and end tags. In all cases, QXmlStreamWriter takes care of escaping XML special characters, so we never have to worry about this.

If the item has child items, we recursively call writeIndexEntry() on each of them. Finally, we call writeEndElement() to output </entry>.

Using QXmlStreamWriter to write XML is the easiest and safest approach, but if we already have the XML in a DOM tree, we can simply ask the tree to output the relevant XML by calling save() on the QDomDocument object. By default, save() uses UTF-8 as the encoding for the generated file. We can use another encoding by prepending an <?xml?> declaration such as

<?xml version="1.0" encoding="ISO-8859-1"?>

to the DOM tree. The following code snippet shows how to do this:

const int Indent = 4;

QDomDocument doc;
...
QTextStream out(&file);
QDomNode xmlNode = doc.createProcessingInstruction("xml",
                             "version="1.0" encoding="ISO-8859-1"");
doc.insertBefore(xmlNode, doc.firstChild());
doc.save(out, Indent);

Starting with Qt 4.3, an alternative is to set the encoding on the QTextStream using setCodec() and to pass QDomNode::EncodingFromTextStream as third parameter to save().

Generating XML files by hand isn’t much harder than using DOM. We can use QTextStream and write the strings as we would do with any other text file. The trickiest part is to escape special characters in text and attribute values. The Qt::escape() function escapes the characters ‘<’, ‘>’, and ‘&’. Here’s some code that makes use of it:

QTextStream out(&file);
out.setCodec("UTF-8");
out << "<doc>
"
    << "   <quote>" << Qt::escape(quoteText) << "</quote>
"
    << "   <translation>" << Qt::escape(translationText)
    << "</translation>
"
    << "</doc>
";

When generating XML files like this, in addition to having to write the correct <?xml?> declaration and setting the right encoding, we must also remember to escape the text we write, and if we use attributes we must escape single or double quotes in their values. Using QXmlStreamWriter is much easier since it handles all of this for us.



[*] Qt 4.4 is expected to include additional high-level classes for handling XML, providing support for XQuery and XPath, in a separate module called QtXmlPatterns.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset