XML (eXtensible Markup Language) is a general-purpose text file format that is popular for data interchange and data storage. It was developed by the World Wide Web Consortium (W3C) as a lightweight alternative to SGML (Standard Generalized Markup Language). The syntax is similar to HTML, but XML is a metalanguage and as such does not mandate specific tags, attributes, or entities. The XML-compliant version of HTML is called XHTML.
For the popular SVG (Scalable Vector Graphics) XML format, the QtSvg module provides classes that can load and render SVG images. For rendering documents that use the MathML (Mathematical Markup Language) XML format, the QtMmlWidget
from Qt Solutions can be used.
For general XML processing, Qt provides the QtXml module, which is the subject of this chapter.[*] The QtXml module offers three distinct APIs for reading XML documents:
QXmlStreamReader
is a fast parser for reading well-formed XML.
DOM (Document Object Model) converts an XML document into a tree structure, which the application can then navigate.
SAX (Simple API for XML) reports “parsing events” directly to the application through virtual functions.
The QXmlStreamReader
class is the fastest and easiest to use and offers an API that is consistent with the rest of Qt. It is ideal for writing one-pass parsers. DOM’s main benefit is that it lets us navigate a tree representation of the XML document in any order, allowing us to implement multi-pass parsing algorithms. Some applications even use the DOM tree as their primary data structure. SAX is provided mainly for historical reasons; using QXmlStreamReader
usually leads to simpler and faster code.
For writing XML files, Qt also offers three options:
Using QXmlStreamWriter
is by far the easiest approach, and is more reliable than hand-generating XML. Using DOM to produce XML really makes sense only if a DOM tree is already used as the application’s primary data structure. All three approaches to reading and writing XML are shown in this chapter.
Using QXmlStreamReader
is the fastest and easiest way to read XML in Qt. Because the parser works incrementally, it is particularly useful for finding all occurrences of a given tag in an XML document, for reading very large files that may not fit in memory, and for populating custom data structures to reflect an XML document’s contents.
The QXmlStreamReader
parser works in terms of the tokens listed in Figure 16.1. Each time the readNext()
function is called, the next token is read and becomes the current token. The current token’s properties depend on the token’s type and are accessible using the getter functions listed in the table.
Table 16.1. The QXmlStreamReader
’s tokens
Token Type | Example | Getter Functions |
---|---|---|
| N/A |
|
| N/A |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Consider the following XML document:
<doc> <quote>Einmal ist keinmal</quote> </doc>
If we parse this document, each readNext()
call will produce a new token, with extra information available using getter functions:
StartDocument StartElement (name() == "doc") StartElement (name() == "quote") Characters (text() == "Einmal ist keinmal") EndElement (name() == "quote") EndElement (name() == "doc") EndDocument
After each readNext()
call, we can test for the current token’s type using isStartElement()
, isCharacters()
, and similar functions, or simply using state()
.
We will review an example that shows how to use QXmlStreamReader
to parse an ad hoc XML file format and render its contents in a QTreeWidget
. The format we will parse is that of a book index, with index entries and sub-entries. Here’s the book index file that is displayed in the QTreeWidget
in Figure 16.2:
<?xml version="1.0"?> <bookindex> <entry term="sidebearings"> <page>10</page> <page>34-35</page> <page>307-308</page> </entry> <entry term="subtraction"> <entry term="of pictures"> <page>115</page> <page>244</page> </entry> <entry term="of vectors"> <page>9</page> </entry> </entry> </bookindex>
We will begin by looking at an extract from the application’s main()
function, to see how the XML reader is used in context, and then we will look at the reader’s implementation.
int main(int argc, char *argv[]) { QApplication app(argc, argv); QStringList args = QApplication::arguments(); ... QTreeWidget treeWidget; ... XmlStreamReader reader(&treeWidget); for (int i = 1; i < args.count(); ++i) reader.readFile(args[i]); return app.exec(); }
The application shown in Figure 16.2 begins by creating a QTreeWidget
. It then creates an XmlStreamReader
, passing it the tree widget and asking it to parse each file specified on the command line.
class XmlStreamReader { public: XmlStreamReader(QTreeWidget *tree); bool readFile(const QString &fileName); private: void readBookindexElement(); void readEntryElement(QTreeWidgetItem *parent); void readPageElement(QTreeWidgetItem *parent); void skipUnknownElement(); QTreeWidget *treeWidget; QXmlStreamReader reader; };
The XmlStreamReader
class provides two public functions: the constructor and parseFile()
. The class uses a QXmlStreamReader
instance to parse the XML file, and populates the QTreeWidget
to reflect the XML data that is read. The parsing is done using recursive descent:
readBookindexElement()
parses a <bookindex>...</bookindex>
element that contains zero or more <entry>
elements.
readEntryElement()
parses an <entry>...</entry>
element that contains zero or more <page>
elements and zero or more <entry>
elements nested to any depth.
readPageElement()
parses a <page>...</page>
element.
skipUnknownElement()
skips an unrecognized element.
We will now look at the XmlStreamReader
class’s implementation, beginning with the constructor.
XmlStreamReader::XmlStreamReader(QTreeWidget *tree) { treeWidget = tree; }
The constructor is used only to establish which QTreeWidget
the reader should use. All the action takes place in the readFile()
function (called from main()
), which we will look at in three parts.
bool XmlStreamReader::readFile(const QString &fileName) { QFile file(fileName); if (!file.open(QFile::ReadOnly | QFile::Text)) { std::cerr << "Error: Cannot read file " << qPrintable(fileName) << ": " << qPrintable(file.errorString()) << std::endl; return false; } reader.setDevice(&file);
The readFile()
function begins by trying to open the file. If it fails, it outputs an error message and returns false
. If the file is opened successfully, it is set as the QXmlStreamReader
’s input device.
reader.readNext(); while (!reader.atEnd()) { if (reader.isStartElement()) { if (reader.name() == "bookindex") { readBookindexElement(); } else { reader.raiseError(QObject::tr("Not a bookindex file")); } } else { reader.readNext(); } }
The QXmlStreamReader
’s readNext()
function reads the next token from the input stream. If a token is successfully read and the end of the XML file has not been reached, the function enters the while
loop. Because of the structure of the index files, we know that inside this loop there are just three possibilities: A <bookindex>
start tag has just been read, another start tag has been read (in which case the file is not a book index), or some other token has been read.
If we have the correct start tag, we call readBookindexElement()
to continue processing. Otherwise, we call QXmlStreamReader::raiseError()
with an error message. The next time atEnd()
is called (in the while
loop condition), it will return true
. This ensures that parsing stops as soon as possible after an error has been encountered. The error can be queried later by calling error()
and errorString()
on the QFile
. An alternative would have been to return right away when we detect an error in the book index file. Using raiseError()
is usually more convenient, because it lets us use the same error-reporting mechanism for low-level XML parsing errors, which are raised automatically when QXmlStreamReader
runs into invalid XML, and for application-specific errors.
file.close(); if (reader.hasError()) { std::cerr << "Error: Failed to parse file " << qPrintable(fileName) << ": " << qPrintable(reader.errorString()) << std::endl; return false; } else if (file.error() != QFile::NoError) { std::cerr << "Error: Cannot read file " << qPrintable(fileName) << ": " << qPrintable(file.errorString()) << std::endl; return false; } return true; }
Once the processing has finished, the file is closed. If there was a parser error or a file error, the function outputs an error message and returns false
; otherwise, it returns true
to report a successful parse.
void XmlStreamReader::readBookindexElement() { reader.readNext(); while (!reader.atEnd()) { if (reader.isEndElement()) { reader.readNext(); break; } if (reader.isStartElement()) { if (reader.name() == "entry") { readEntryElement(treeWidget->invisibleRootItem()); } else { skipUnknownElement(); } } else { reader.readNext(); } } }
The readBookindexElement()
is responsible for reading the main part of the file. It starts by skipping the current token (which at this point can be only a <bookindex>
start tag) and then loops over the input.
If an end tag is read, it can be only the </bookindex>
tag, since otherwise, QXmlStreamReader
would have reported an error (UnexpectedElementError
). In that case, we skip the tag and break out of the loop. Otherwise, we should have a top-level index <entry>
start tag. If this is the case, we call readEntryElement()
to process the entry’s data; if not, we call skipUnknownElement()
. Using skipUnknownElement()
rather than calling raiseError()
means that if we extend the book index format in the future to include new tags, this reader will continue to work, since it will simply ignore the tags it does not recognize.
The readEntryElement()
takes a QTreeWidgetItem *
argument that identifies a parent item. We pass QTreeWidget::invisibleRootItem()
as the parent to make the new items root items. In readEntryElement()
, we will call readEntryElement()
recursively, with a different parent.
void XmlStreamReader::readEntryElement(QTreeWidgetItem *parent) { QTreeWidgetItem *item = new QTreeWidgetItem(parent); item->setText(0, reader.attributes().value("term").toString()); reader.readNext(); while (!reader.atEnd()) { if (reader.isEndElement()) { reader.readNext(); break; } if (reader.isStartElement()) { if (reader.name() == "entry") { readEntryElement(item); } else if (reader.name() == "page") { readPageElement(item); } else { skipUnknownElement(); } } else { reader.readNext(); } } }
The readEntryElement()
function is called whenever an <entry>
start tag is encountered. We want a tree widget item to be created for every index entry, so we create a new QTreeWidgetItem
, and set its first column’s text to be the entry’s term attribute’s text.
Once the entry has been added to the tree, the next token is read. If it is an end tag, we skip the tag and break out of the loop. If a start tag is encountered, it will be an <entry>
tag (signifying a sub-entry), a <page>
tag (a page number for this entry), or an unknown tag. If the start tag is a sub-entry, we call readEntryElement()
recursively. If the tag is a <page>
tag, we call readPageElement()
.
void XmlStreamReader::readPageElement(QTreeWidgetItem *parent) { QString page = reader.readElementText(); if (reader.isEndElement()) reader.readNext(); QString allPages = parent->text(1); if (!allPages.isEmpty()) allPages += ", "; allPages += page; parent->setText(1, allPages); }
The readPageElement()
function is called whenever we get a <page>
tag. It is passed the tree item that corresponds to the entry to which the page text belongs. We begin by reading the text between the <page>
and </page>
tags. On success, the readElementText()
function will leave the parser on the </page>
tag, which we must skip.
The pages are stored in the tree widget item’s second column. We begin by extracting the text that is already there. If the text is not empty, we append a comma to it, ready for the new page text. We then append the new text and update the column’s text accordingly.
void XmlStreamReader::skipUnknownElement() { reader.readNext(); while (!reader.atEnd()) { if (reader.isEndElement()) { reader.readNext(); break; } if (reader.isStartElement()) { skipUnknownElement(); } else { reader.readNext(); } } }
Finally, when unknown tags are encountered, we keep reading until we get the unknown element’s end tag, which we also skip. This means that we will skip over well-formed but unrecognized elements, and read as much of the recognizable data as possible from the XML file.
The example presented here could be used as the basis for similar XML recursive descent parsers. Nonetheless, sometimes implementing a parser like this can be tricky, if a readNext()
call is missing or out of place. Some programmers address the problem by using assertions in their code. For example, at the beginning of readBookindexElement()
, we could add the line
Q_ASSERT(reader.isStartElement() && reader.name() == "bookindex");
A similar assertion could be made in the readEntryElement()
and readPageElement()
functions. For skipUnknownElement()
, we would simply assert that we have a start element.
A QXmlStreamReader
can take input from any QIODevice
, including QFile
, QBuffer
, QProcess
, and QTcpSocket
. Some input sources may not be able to provide the data that the parser needs when it needs it—for example, due to network latency. It is still possible to use QXmlStreamReader
under such circumstances; more information on this is provided in the reference documentation for QXmlStreamReader
under the heading “Incremental Parsing”.
The QXmlStreamReader
class used in this application is part of the QtXml library. To link against this library, we must add this line to the .pro
file:
QT += xml
In the next two sections, we will see how to write the same application with DOM and SAX.
DOM is a standard API for parsing XML developed by the W3C. Qt provides a non-validating DOM Level 2 implementation for reading, manipulating, and writing XML documents.
DOM represents an XML file as a tree in memory. We can navigate through the DOM tree as much as we want, and we can modify the tree and save it back to disk as an XML file.
Let’s consider the following XML document:
<doc> <quote>Scio me nihil scire</quote> <translation>I know that I know nothing</translation> </doc>
It corresponds to the following DOM tree:
The DOM tree contains nodes of different types. For example, an Element
node corresponds to an opening tag and its matching closing tag. The material that falls between the tags appears as child nodes of the Element
node. In Qt, the node types (like all other DOM-related classes) have a QDom
prefix; thus, QDomElement
represents an Element
node, and QDomText
represents a Text
node.
Different types of nodes can have different kinds of child nodes. For example, an Element
node can contain other Element
nodes, as well as EntityReference
, Text
, CDATASection
, ProcessingInstruction
, and Comment
nodes. Figure 16.3 shows which nodes can have which kinds of child nodes. The nodes shown in gray cannot have any child nodes of their own.
To illustrate how to use DOM for reading XML files, we will write a parser for the book index file format described in the preceding section (p. 389).
class DomParser { public: DomParser(QTreeWidget *tree); bool readFile(const QString &fileName); private: void parseBookindexElement(const QDomElement &element); void parseEntryElement(const QDomElement &element, QTreeWidgetItem *parent); void parsePageElement(const QDomElement &element, QTreeWidgetItem *parent); QTreeWidget *treeWidget; };
We define a class called DomParser
that will parse a book index XML file and display the result in a QTreeWidget
.
DomParser::DomParser(QTreeWidget *tree) { treeWidget = tree; }
In the constructor, we simply assign the given tree widget to the member variable. All the parsing is done inside the readFile()
function.
bool DomParser::readFile(const QString &fileName) { QFile file(fileName); if (!file.open(QFile::ReadOnly | QFile::Text)) { std::cerr << "Error: Cannot read file " << qPrintable(fileName) << ": " << qPrintable(file.errorString()) << std::endl; return false; } QString errorStr; int errorLine; int errorColumn; QDomDocument doc; if (!doc.setContent(&file, false, &errorStr, &errorLine, &errorColumn)) { std::cerr << "Error: Parse error at line " << errorLine << ", " << "column " << errorColumn << ": " << qPrintable(errorStr) << std::endl; return false; } QDomElement root = doc.documentElement(); if (root.tagName() != "bookindex") { std::cerr << "Error: Not a bookindex file" << std::endl; return false; } parseBookindexElement(root); return true; }
In readFile()
, we begin by trying to open the file whose name was passed in. If an error occurs, we output an error message and return false
to signify failure. Otherwise, we set up some variables to hold parse error information, should they be needed, and then create a QDomDocument
. When we call setContent()
on the DOM document, the entire XML document provided by the QIODevice
is read and parsed. The setContent()
function automatically opens the device if it isn’t already open. The false
argument to setContent()
disables namespace processing; refer to the QtXml reference documentation for an introduction to XML namespaces and how to handle them in Qt.
If an error occurs, we output an error message and return false
to indicate failure. If the parse is successful, we call documentElement()
on the QDomDocument
to obtain its single QDomElement
child, and we check that it is a <bookindex>
element. If we have a <bookindex>
, we call parseBookindexElement()
to parse it. As in the preceding section, the parsing is done using recursive descent.
void DomParser::parseBookindexElement(const QDomElement &element) { QDomNode child = element.firstChild(); while (!child.isNull()) { if (child.toElement().tagName() == "entry") parseEntryElement(child.toElement(), treeWidget->invisibleRootItem()); child = child.nextSibling(); } }
In parseBookindexElement()
, we iterate over all the child nodes. We expect each node to be an <entry>
element, and for each one that is, we call parseEntry()
to parse it. We ignore unknown nodes, to allow for the book index format to be extended in the future without preventing old parsers from working. All <entry>
nodes that are direct children of the <bookindex>
node are top-level nodes in the tree widget we are populating to reflect the DOM tree, so when we want to parse each one we pass both the node element and the tree’s invisible root item to be the widget tree item’s parent.
The QDomNode
class can store any type of node. If we want to process a node further, we must first convert it to the right data type. In this example, we only care about Element
nodes, so we call toElement()
on the QDomNode
to convert it to a QDomElement
and then call tagName()
to retrieve the element’s tag name. If the node is not of type Element
, the toElement()
function returns a null QDomElement
object, with an empty tag name.
void DomParser::parseEntryElement(const QDomElement &element, QTreeWidgetItem *parent) { QTreeWidgetItem *item = new QTreeWidgetItem(parent); item->setText(0, element.attribute("term")); QDomNode child = element.firstChild(); while (!child.isNull()) { if (child.toElement().tagName() == "entry") { parseEntryElement(child.toElement(), item); } else if (child.toElement().tagName() == "page") { parsePageElement(child.toElement(), item); } child = child.nextSibling(); } }
In parseEntryElement()
, we create a tree widget item. The parent item that is passed in is either the tree’s invisible root item (if this is a top-level entry) or another entry (if this is a sub-entry). We call setText()
to set the text shown in the item’s first column to the value of the <entry>
tag’s term
attribute.
Once we have initialized the QTreeWidgetItem
, we iterate over the child nodes of the QDomElement
node corresponding to the current <entry>
tag. For each child element that is an <entry>
tag, we call parseEntryElement()
recursively with the current item as the second argument. Each child’s QTreeWidgetItem
will then be created with the current entry as its parent. If the child element is a <page>
, we call parsePageElement()
.
void DomParser::parsePageElement(const QDomElement &element, QTreeWidgetItem *parent) { QString page = element.text(); QString allPages = parent->text(1); if (!allPages.isEmpty()) allPages += ", "; allPages += page; parent->setText(1, allPages); }
In parsePageElement()
, we call text()
on the element to obtain the text that occurs between the <page>
and </page>
tags; then we add the text to the comma-separated list of page numbers in the QTreeWidgetItem
’s second column. The QDomElement::text()
function navigates through the element’s child nodes and concatenates all the text stored in Text
and CDATA
nodes.
Let’s now see how we can use the DomParser
class to parse a file:
int main(int argc, char *argv[]) { QApplication app(argc, argv); QStringList args = QApplication::arguments(); ... QTreeWidget treeWidget; ... DomParser parser(&treeWidget); for (int i = 1; i < args.count(); ++i) parser.readFile(args[i]); return app.exec(); }
We start by setting up a QTreeWidget
. Then we create a DomParser
. For each file listed on the command line, we call DomParser::readFile()
to open and parse each file and populate the tree widget.
Like the previous example, we need the following line in the application’s .pro
file to link against the QtXml library:
QT += xml
As the example illustrates, navigating through a DOM tree is straightforward, although not quite as convenient as using QXmlStreamReader
. Programmers who use DOM a lot often write their own higher-level wrapper functions to simplify commonly needed operations.
SAX is a public domain de facto standard API for reading XML documents. Qt’s SAX classes are modeled after the SAX2 Java implementation, with some differences in naming to match the Qt conventions. Compared with DOM, SAX is more low-level and usually faster. But since the QXmlStreamReader
class presented earlier in this chapter offers a more Qt-like API and is faster than the SAX parser, the main use of the SAX parser is for porting code that uses the SAX API into Qt. For more information about SAX, see http://www.saxproject.org/.
Qt provides a SAX-based non-validating XML parser called QXmlSimpleReader
. This parser recognizes well-formed XML and supports XML namespaces. When the parser goes through the document, it calls virtual functions in registered handler classes to indicate parsing events. (These “parsing events” are unrelated to Qt events, such as key and mouse events.) For example, let’s assume the parser is analyzing the following XML document:
<doc> <quote>Gnothi seauton</quote> </doc>
The parser would call the following parsing event handlers:
startDocument() startElement("doc") startElement("quote") characters("Gnothi seauton") endElement("quote") endElement("doc") endDocument()
The preceding functions are all declared in QXmlContentHandler
. For simplicity, we omitted some of the arguments to startElement()
and endElement()
.
QXmlContentHandler
is just one of many handler classes that can be used in conjunction with QXmlSimpleReader
. The others are QXmlEntityResolver
, QXmlDTDHandler
, QXmlErrorHandler
, QXmlDeclHandler
, and QXmlLexicalHandler
. These classes only declare pure virtual functions and give information about different kinds of parsing events. For most applications, QXmlContentHandler
and QXmlErrorHandler
are the only two that are needed. The class hierarchy we have used is shown in Figure 16.4.
For convenience, Qt also provides QXmlDefaultHandler
, a class that is derived from all the handler classes and that provides trivial implementations for all the functions. This design, with many abstract handler classes and one trivial subclass, is unusual for Qt; it was adopted to closely follow the model Java implementation.
The most significant difference between using the SAX API and QXmlStreamReader
or the DOM API is that the SAX API requires us to manually keep track of the parser’s state using member variables, something that is not necessary in the other two approaches, which both allowed recursive descent.
To illustrate how to use SAX for reading XML files, we will write a parser for the book index file format described earlier in this chapter (p. 389). Here we will parse using a QXmlSimpleReader
and a QXmlDefaultHandler
subclass called SaxHandler
.
The first step to implement the parser is to subclass QXmlDefaultHandler
:
class SaxHandler : public QXmlDefaultHandler { public: SaxHandler(QTreeWidget *tree); bool readFile(const QString &fileName); protected: bool startElement(const QString &namespaceURI, const QString &localName, const QString &qName, const QXmlAttributes &attributes); bool endElement(const QString &namespaceURI, const QString &localName, const QString &qName); bool characters(const QString &str); bool fatalError(const QXmlParseException &exception); private: QTreeWidget *treeWidget; QTreeWidgetItem *currentItem; QString currentText; };
The SaxHandler
class is derived from QXmlDefaultHandler
and reimplements four functions: startElement()
, endElement()
, characters()
, and fatalError()
. The first three functions are declared in QXmlContentHandler
; the last function is declared in QXmlErrorHandler
.
SaxHandler::SaxHandler(QTreeWidget *tree) { treeWidget = tree; }
The SaxHandler
constructor accepts the QTreeWidget
we want to populate with the information stored in the XML file.
bool SaxHandler::readFile(const QString &fileName) { currentItem = 0; QFile file(fileName); QXmlInputSource inputSource(&file); QXmlSimpleReader reader; reader.setContentHandler(this); reader.setErrorHandler(this); return reader.parse(inputSource); }
This function is called when we have the name of a file to be parsed. We create a QFile
object for the file and create a QXmlInputSource
to read the file’s contents. Then we create a QXmlSimpleReader
to parse the file. We set the reader’s content and error handlers to this class (SaxHandler
), and then we call parse()
on the reader to perform the parsing. In SaxHandler
, we only reimplement functions from the QXmlContentHandler
and QXmlErrorHandler
classes; if we had implemented functions from other handler classes, we would also have needed to call the corresponding set
Xxx
Handler()
functions.
Instead of passing a simple QFile
object to the parse()
function, we pass a QXmlInputSource
. This class opens the file it is given, reads it (taking into account any character encoding specified in the <?xml?>
declaration), and provides an interface through which the parser reads the file.
bool SaxHandler::startElement(const QString & /* namespaceURI */, const QString & /* localName */, const QString &qName, const QXmlAttributes &attributes) { if (qName == "entry") { currentItem = new QTreeWidgetItem(currentItem ? currentItem : treeWidget->invisibleRootItem()); currentItem->setText(0, attributes.value("term")); } else if (qName == "page") { currentText.clear(); } return true; }
The startElement()
function is called when the reader encounters a new opening tag. The third parameter is the tag’s name (or more precisely, its “qualified name”). The fourth parameter is the list of attributes. In this example, we ignore the first and second parameters. They are useful for XML files that use XML’s namespace mechanism, a subject that is discussed in detail in the reference documentation.
If the tag is <entry>
, we create a new QTreeWidgetItem
. If the tag is nested within another <entry>
tag, the new tag defines a sub-entry in the index, and the new QTreeWidgetItem
is created as a child of the QTreeWidgetItem
that represents the encompassing entry. Otherwise, we create the QTreeWidgetItem
as a top-level item, using the tree widget’s invisible root item as its parent. We call setText()
to set the text shown in column 0 to the value of the <entry>
tag’s term
attribute.
If the tag is <page>
, we set the currentText
variable to be an empty string. The variable serves as an accumulator for the text located between the <page>
and </page>
tags.
At the end, we return true
to tell SAX to continue parsing the file. If we wanted to report unknown tags as errors, we would return false
in those cases. We would then also reimplement errorString()
from QXmlDefaultHandler
to return an appropriate error message.
bool SaxHandler::characters(const QString &str) { currentText += str; return true; }
The characters()
function is called to report character data in the XML document. We simply append the characters to the currentText
variable.
bool SaxHandler::endElement(const QString & /* namespaceURI */, const QString & /* localName */, const QString &qName) { if (qName == "entry") { currentItem = currentItem->parent(); } else if (qName == "page") { if (currentItem) { QString allPages = currentItem->text(1); if (!allPages.isEmpty()) allPages += ", "; allPages += currentText; currentItem->setText(1, allPages); } } return true; }
The endElement()
function is called when the reader encounters a closing tag. Just as with startElement()
, the third parameter is the name of the tag.
If the tag is </entry>
, we update the currentItem
private variable to point to the current QTreeWidgetItem
’s parent. (For historical reasons, top-level items return 0 as their parent rather than the invisible root item.) This ensures that the currentItem
variable is restored to the value it held before the corresponding <entry>
tag was read.
If the tag is </page>
, we add the specified page number or page range to the comma-separated list in the current item’s text in column 1.
bool SaxHandler::fatalError(const QXmlParseException &exception) { std::cerr << "Parse error at line " << exception.lineNumber() << ", " << "column " << exception.columnNumber() << ": " << qPrintable(exception.message()) << std::endl; return false; }
The fatalError()
function is called when the reader fails to parse the XML file. If this occurs, we simply print a message to the console, giving the line number, the column number, and the parser’s error text.
This completes the implementation of SaxHandler
. The main()
function that uses it is almost identical to the one we reviewed in the previous section for DomParser
, the difference being that we use a SaxHandler
rather than a DomParser
.
Most applications that can read XML files also need to write such files. There are three approaches for generating XML files from Qt applications:
We can use a QXmlStreamWriter
.
We can build a DOM tree and call save()
on it.
We can generate XML by hand.
The choice between these approaches is mostly independent of whether we use QXmlStreamReader
, DOM, or SAX for reading XML documents, although if the data is held in a DOM tree it often makes sense to save the tree directly.
Writing XML using the QXmlStreamWriter
class is particularly easy since the class takes care of escaping special characters for us. If we wanted to output the book index data from a QTreeWidget
using QXmlStreamWriter
, we could do so using just two functions. The first function would take a file name and a QTreeWidget *
, and would iterate over all the top-level items in the tree:
bool writeXml(const QString &fileName, QTreeWidget *treeWidget) { QFile file(fileName); if (!file.open(QFile::WriteOnly | QFile::Text)) { std::cerr << "Error: Cannot write file " << qPrintable(fileName) << ": " << qPrintable(file.errorString()) << std::endl; return false; } QXmlStreamWriter xmlWriter(&file); xmlWriter.setAutoFormatting(true); xmlWriter.writeStartDocument(); xmlWriter.writeStartElement("bookindex"); for (int i = 0; i < treeWidget->topLevelItemCount(); ++i) writeIndexEntry(&xmlWriter, treeWidget->topLevelItem(i)); xmlWriter.writeEndDocument(); file.close(); if (file.error()) { std::cerr << "Error: Cannot write file " << qPrintable(fileName) << ": " << qPrintable(file.errorString()) << std::endl; return false; } return true; }
If we switch on auto-formatting, the XML is output in a more human-friendly style, with indentation used to show the data’s recursive structure. The writeStartDocument()
function writes the XML header line
<?xml version="1.0" encoding="UTF-8"?>
The writeStartElement()
function generates a new start tag with the given tag text. The writeEndDocument()
function closes any open start tags. For each top-level item, we call writeIndexEntry()
, passing it the QXmlStreamWriter
, and the item to output. Here is the code for writeIndexEntry()
:
void writeIndexEntry(QXmlStreamWriter *xmlWriter, QTreeWidgetItem *item) { xmlWriter->writeStartElement("entry"); xmlWriter->writeAttribute("term", item->text(0)); QString pageString = item->text(1); if (!pageString.isEmpty()) { QStringList pages = pageString.split(", "); foreach (QString page, pages) xmlWriter->writeTextElement("page", page); } for (int i = 0; i < item->childCount(); ++i) writeIndexEntry(xmlWriter, item->child(i)); xmlWriter->writeEndElement(); }
The function creates an <entry>
element corresponding to the QTreeWidgetItem
it receives as a parameter. The writeAttribute()
function adds an attribute to the tag that has just been written; for example, it might turn <entry>
into <entry term="sidebearings">
. If there are page numbers, they are split on comma-spaces, and for each one, a separate <page>...</page>
tag pair is written, with the page text in between. This is all achieved by calling writeTextElement()
and passing it a tag name and the text to put between the start and end tags. In all cases, QXmlStreamWriter
takes care of escaping XML special characters, so we never have to worry about this.
If the item has child items, we recursively call writeIndexEntry()
on each of them. Finally, we call writeEndElement()
to output </entry>
.
Using QXmlStreamWriter
to write XML is the easiest and safest approach, but if we already have the XML in a DOM tree, we can simply ask the tree to output the relevant XML by calling save()
on the QDomDocument
object. By default, save()
uses UTF-8 as the encoding for the generated file. We can use another encoding by prepending an <?xml?>
declaration such as
<?xml version="1.0" encoding="ISO-8859-1"?>
to the DOM tree. The following code snippet shows how to do this:
const int Indent = 4; QDomDocument doc; ... QTextStream out(&file); QDomNode xmlNode = doc.createProcessingInstruction("xml", "version="1.0" encoding="ISO-8859-1""); doc.insertBefore(xmlNode, doc.firstChild()); doc.save(out, Indent);
Starting with Qt 4.3, an alternative is to set the encoding on the QTextStream
using setCodec()
and to pass QDomNode::EncodingFromTextStream
as third parameter to save()
.
Generating XML files by hand isn’t much harder than using DOM. We can use QTextStream
and write the strings as we would do with any other text file. The trickiest part is to escape special characters in text and attribute values. The Qt::escape()
function escapes the characters ‘<’, ‘>’, and ‘&’. Here’s some code that makes use of it:
QTextStream out(&file); out.setCodec("UTF-8"); out << "<doc> " << " <quote>" << Qt::escape(quoteText) << "</quote> " << " <translation>" << Qt::escape(translationText) << "</translation> " << "</doc> ";
When generating XML files like this, in addition to having to write the correct <?xml?>
declaration and setting the right encoding, we must also remember to escape the text we write, and if we use attributes we must escape single or double quotes in their values. Using QXmlStreamWriter
is much easier since it handles all of this for us.
[*] Qt 4.4 is expected to include additional high-level classes for handling XML, providing support for XQuery and XPath, in a separate module called QtXmlPatterns.