XML APIs have always been available for the Java developer, usually supplied as third-party libraries that could be added to the runtime classpath. However, in Java 7, you will find that the Java API for XML Processing (JAXP), Java API for XML Binding (JAXB), and even the Java API for XML Web Services (JAX-WS) have been included in the core runtime libraries.
The most fundamental XML processing tasks that you will encounter involve only a few use cases: writing and reading XML documents, validating those documents, and using JAXB to assist in marshalling/unmarshalling Java objects. This chapter provides recipes for these common tasks.
0Note The source code for this chapter’s examples is available in the org.java7recipes.chapter22
package. Please see the introductory chapters for instructions on how to find and download this book’s sample source code.
You want to create an XML document to store application data.
To write an XML document, use the javax.xml.stream.XMLStreamWriter
class. The following code iterates over an array of Patient
objects and writes their data to an .xml
file. This sample code comes from the org.java7recipes.chapter22.DocWriter
example:
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;
…
public void run(String outputFile) throws FileNotFoundException, XMLStreamException, IOException {
Patient[] patients = new Patient[3];
patients[0].setId(BigInteger.valueOf(1));
patients[0].setName("John Smith");
patients[0].setDiagnosis("Common Cold");
patients[1].setId(BigInteger.valueOf(2));
patients[1].setName("Jane Doe");
patients[1].setDiagnosis("Broken Ankle");
patients[2].setId(BigInteger.valueOf(3));
patients[2].setName("Jack Brown");
patients[2].setDiagnosis("Food Allergy");
XMLOutputFactory factory = XMLOutputFactory.newFactory();
try (FileOutputStream fos = new FileOutputStream(outputFile)) {
XMLStreamWriter writer = factory.createXMLStreamWriter(fos, "UTF-8");
writer.writeStartDocument();
writer.writeCharacters("
");
writer.writeStartElement("patients");
writer.writeCharacters("
");
for(Patient p: patients) {
writer.writeCharacters(" ");
writer.writeStartElement("patient");
writer.writeAttribute("id", String.valueOf(p.getId()));
writer.writeCharacters("
");
writer.writeStartElement("name");
writer.writeCharacters(p.getName());
writer.writeEndElement();
writer.writeCharacters("
");
writer.writeStartElement("diagnosis");
writer.writeCharacters(p.getDiagnosis());
writer.writeEndElement();
writer.writeCharacters("
");
writer.writeEndElement();
writer.writeCharacters("
");
}
writer.writeEndElement();
writer.writeEndDocument();
writer.close();
}
}
The previous code writes the following file contents:
<?xml version="1.0" ?>
<patients>
<patient id="1">
<name>John Smith</name>
<diagnosis>Common Cold</diagnosis>
</patient>
<patient id="2">
<name>Jane Doe</name>
<diagnosis>Broken ankle</diagnosis>
</patient>
<patient id="3">
<name>Jack Brown</name>
<diagnosis>Food allergy</diagnosis>
</patient>
</patients>
Java 7 provides several ways to write XML documents. One model is the Simple API for XML (SAX). The newer, simpler, and more efficient model is the Streaming API for XML (StAX). This recipe uses StAX defined in the javax.xml.stream
package. Writing an XML document takes only five steps:
Create a file output stream using the java.io.FileOutputStream
class. You can use a try-block
to open and close this stream. Learn more about the new try-block
syntax in Chapter 6.
The javax.xml.stream.XMLOutputFactory
provides a static method that creates an output factory. Use the factory to create a javax.xml.stream.XMLStreamWriter
.
Once you have the writer, wrap the file stream object within the XML writer instance. You will use the various write methods to create the XML document elements and attributes. Finally, simply close the writer when you finish writing to the file. Some of the more useful methods of the XMLStreamWriter
instance are these:
writeStartDocument()
writeStartElement()
writeEndElement()
writeEndDocument()
writeAttribute()
After creating the file and XMLStreamWriter
, you always should begin the document by calling the writeStartDocumentMethod()
method. Follow this by writing individual elements using the writeStartElement()
and writeEndElement()
methods in combination. Of course, elements can have nested elements. You have the responsibility to call these in proper sequence to create well-formed documents. Use the writeAttribute()
method to place an attribute name and value into the current element. You should call writeAttribute()
immediately after calling the writeStartElement()
method. Finally, signal the end of the document with the writeEndDocument()
method and close the Writer
instance.
One interesting point of using the XMLStreamWriter
is that it does not format the document output. Unless you specifically use the writeCharacters()
method to output space and new-line characters, the output will stream to a single unformatted line. Of course, this doesn’t invalidate the resulting XML file, but it does make it inconvenient and difficult for a human to read. Therefore, you should consider using the writeCharacters()
method to output spacing and new-line characters as needed to create a human readable document. You can safely ignore this method of writing additional whitespace and line breaks if you do not need a document for human readability. Regardless of the format, the XML document will be well formed in that it is adheres to correct XML syntax.
The command-line usage pattern for this example code is this:
java org.java7recipes.chapter22.DocWriter <outputXmlFile>
Invoke this application to create a file named patients.xml
in the following way:
java org.java7recipes.chapter22.DocWriter patients.xml
You need to parse an XML document, retrieving known elements and attributes.
Use the javax.xml.stream.XMLStreamReader
interface to read documents. Using this API, your code will pull XML elements using a cursor-like interface similar to that in SQL to process each element in turn. The following code snippet from org.java7recipes.DocReader
demonstrates how to read the patients.xml
file from the previous recipe:
public void cursorReader(String xmlFile)
throws FileNotFoundException, IOException, XMLStreamException {
XMLInputFactory factory = XMLInputFactory.newFactory();
try (FileInputStream fis = new FileInputStream(xmlFile)) {
XMLStreamReader reader = factory.createXMLStreamReader(fis);
boolean inName = false;
boolean inDiagnosis = false;
String id = null;
String name = null;
String diagnosis = null;
while (reader.hasNext()) {
int event = reader.next();
switch (event) {
case XMLStreamConstants.START_ELEMENT:
String elementName = reader.getLocalName();
switch (elementName) {
case "patient":
id = reader.getAttributeValue(0);
break;
case "name":
inName = true;
break;
case "diagnosis":
inDiagnosis = true;
break;
default:
break;
}
break;
case XMLStreamConstants.END_ELEMENT:
String elementname = reader.getLocalName();
if (elementname.equals("patient")) {
System.out.printf("Patient: %s
Name: %s
Diagnosis: %s
",id, name,
diagnosis);
id = name = diagnosis = null;
inName = inDiagnosis = false;
}
break;
case XMLStreamConstants.CHARACTERS:
if (inName) {
name = reader.getText();
inName = false;
} else if (inDiagnosis) {
diagnosis = reader.getText();
inDiagnosis = false;
}
break;
default:
break;
}
}
reader.close();
}
}
Use the XMLEventReader
to read and process events using an event-oriented interface. This API is called an iterator-oriented API as well. The following code is much like that of Solution 1, except that it uses the event-oriented API instead of the cursor-oriented API. This code snippet is also available from the same org.java7recipes.DocReader
class used in Solution 1:
public void eventReader(String xmlFile)
throws FileNotFoundException, IOException, XMLStreamException {
XMLInputFactory factory = XMLInputFactory.newFactory();
XMLEventReader reader = null;
try(FileInputStream fis = new FileInputStream(xmlFile)) {
reader = factory.createXMLEventReader(fis);
boolean inName = false;
boolean inDiagnosis = false;
String id = null;
String name = null;
String diagnosis = null;
while(reader.hasNext()) {
XMLEvent event = reader.nextEvent();
String elementName = null;
switch(event.getEventType()) {
case XMLEvent.START_ELEMENT:
StartElement startElement = event.asStartElement();
elementName = startElement.getName().getLocalPart();
switch(elementName) {
case "patient":
id =
startElement.getAttributeByName(QName.valueOf("id")).getValue();
break;
case "name":
inName = true;
break;
case "diagnosis":
inDiagnosis = true;
break;
default:
break;
}
break;
case XMLEvent.END_ELEMENT:
EndElement endElement = event.asEndElement();
elementName = endElement.getName().getLocalPart();
if (elementName.equals("patient")) {
System.out.printf("Patient: %s
Name: %s
Diagnosis: %s
",id, name, diagnosis);
id = name = diagnosis = null;
inName = inDiagnosis = false;
}
break;
case XMLEvent.CHARACTERS:
String value = event.asCharacters().getData();
if (inName) {
name = value;
inName = false;
} else if (inDiagnosis) {
diagnosis = value;
inDiagnosis = false;
}
break;
}
}
}
if(reader != null) {
reader.close();
}
}
Java 7 provides several ways to read XML documents. One way is to use StAX, a streaming model. It is better than the older SAX API in that it allows you to both read and write XML documents. Although StAX is not quite as powerful as a DOM API, it is an excellent and efficient API that is less taxing on memory resources.
StAX provides two methods for reading XML documents: a cursor-oriented API and an iterator-based, event-oriented API. The event-oriented, iterator API is preferred over the cursor API at this time because it provides XMLEvent
objects with the following benefits:
XMLEvent
objects are immutable and can persist even though the StAX parser has moved on to subsequent events. You can pass these XMLEvent
objects to other processes or store them in lists, arrays, and maps.XMLEvent
, creating your own specialized events as needed.To use StAX to read documents, create an XML event reader on your file input stream. Check that events are still available with the hasNext()
method, and read each event using the nextEvent()
method. The nextEvent()
method will return a specific type of XMLEvent
, which corresponds to the start and stop elements, attributes, and value data in the XML file. Remember to close your readers and file streams when finished with those objects.
You can invoke the example application like this, using the patients.xml
file as your <xmlFile>
argument:
java org.java7recipes.chapter22.DocReader <xmlFile>
You want to convert an XML document to another format, for example HTML.
Use the javax.xml.transform
package to transform an XML document to another document format.
The following code demonstrates how to read a source document, apply an Extensible Stylesheet Language (XSL) transform file, and produce the transformed, new document. Use the sample code from the org.java7recipes.chapter22.TransformXml
class to read the patients.xml
file and create a patients.html
file. The following snippet shows the important pieces of this class:
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
…
public void run(String xmlFile, String xslFile, String outputFile)
throws FileNotFoundException, TransformerConfigurationException, TransformerException {
InputStream xslInputStream = new FileInputStream(xslFile);
Source xslSource = new StreamSource(xslInputStream);
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(xslSource);
InputStream xmlInputStream = new FileInputStream(xmlFile);
StreamSource in = new StreamSource(xmlInputStream);
StreamResult out = new StreamResult(outputFile);
transformer.transform(in, out);
…
}
The javax.xml.transform
package contains all the classes you need to transform an XML document into any other document type. The most common use case is to convert data-oriented XML documents to user-readable HTML documents.
Transforming from one document type to another requires three files:
The XML source document is, of course, your source data file. It will most often contain data-oriented content that is easy to parse programmatically. However, people don’t easily read XML files, especially complex, data-rich files. Instead, people are much more comfortable reading properly rendered HTML documents.
The XSL transformation document specifies how an XML document should be transformed into a different format. An XSL file will usually contain an HTML template that specifies dynamic fields that will hold the extracted contents of a source XML file.
In this example’s source code, you’ll find two source documents:
resources/patients.xml
resources/patients.xsl
The patients.xml
file is short, containing the following data:
<?xml version="1.0" encoding="UTF-8"?>
<patients>
<patient id="1">
<name>John Smith</name>
<diagnosis>Common Cold</diagnosis>
</patient>
<patient id="2">
<name>Jane Doe</name>
<diagnosis>Broken ankle</diagnosis>
</patient>
<patient id="3">
<name>Jack Brown</name>
<diagnosis>Food allergy</diagnosis>
</patient>
</patients>
The patients.xml
file defines a root element called patients
. It has three nested patient
elements. The patient
element contains three pieces of data:
id
attribute of the patient elementname
subelementdiagnosis
subelementThe transformation XSL document (patients.xsl
) is quite small as well, and it simply maps the patient data to a more user-readable, HTML format using XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html"/>
<xsl:template match="/">
<html>
<head>
<title>Patients</title>
</head>
<body>
<table border="1">
<tr>
<th>Id</th>
<th>Name</th>
<th>Diagnosis</th>
</tr>
<xsl:for-each select="patients/patient">
<tr>
<td>
<xsl:value-of select="@id"/>
</td>
<td>
<xsl:value-of select="name"/>
</td>
<td>
<xsl:value-of select="diagnosis"/>
</td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Using this stylesheet, the sample code transforms the XMLinto an HTML table containing all the patients and their data. Rendered in a browser, the HTML table should look like the one in Figure 22-1.
Figure 22-1. A common rendering of an HTML table
The process for using this XSL file to convert the XML file to an HTML file is straightforward, but every step can be enhanced with additional error checking and processing. For this example, refer to the previous code in the solution section.
The most basic transformation steps are these:
Source
object.Transformer
instance and provide your XSL Source
instance for it to use during its operation.SourceStream
that represents the source XML contents.StreamResult
instance for your output document, which is an HTML file in this case.Transformer
object’s transform()
method to perform the conversion.If you choose to execute the sample code, you should invoke it in the following way, using patients.xml
, patients.xsl
, and patients.html
as arguments:
java org.java7recipes.chapter22.TransformXml <xmlFile><xslFile><outputFile>
You want to confirm that your XML is valid, conforming to a known document definition or schema.
Validate that your XML conforms to a specific schema by using the javax.xml.validation
package. The following code snippet from org.java7recipes.chapter22.ValidateXml
demonstrates how to validate against an XML schema file:
import java.io.File;
import java.io.IOException;
import javax.xml.XMLConstants;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import org.xml.sax.SAXException;
…
public void run(String xmlFile, String validationFile) {
boolean valid = true;
SchemaFactory sFactory =
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
try {
Schema schema = sFactory.newSchema(new File(validationFile));
Validator validator = schema.newValidator();
Source source = new StreamSource(new File(xmlFile));
validator.validate(source);
} catch (SAXException | IOException | IllegalArgumentException ex) {
valid = false;
}
System.out.printf("XML file is %s.
", valid ? "valid" : "invalid");
}
…
The javax.xml.validation
package provides all the classes needed to reliably validate an XML file against a variety of schemas. The most common schemas that you will use for XML validation are defined as constant URIs within the XMLConstants
class:
XMLConstants.W3C_XML_SCHEMA_NS_URI
XMLConstants.RELAXNG_NS_URI
Begin by creating a SchemaFactory
for a specific type of schema definition. A SchemaFactory
knows how to parse a particular schema type and prepares it for validation. Use the SchemaFactory
instance to create a Schema
object. The Schema
object is an in-memory representation of the schema definition grammar. You can use the Schema
instance to retrieve a Validator
instance that understands this grammar. Finally, use the validate()
method to check your XML. The method call will generate several different exceptions if anything goes wrong during the validation. Otherwise, the validate()
method returns quietly, and you can continue to use the XML file.
Note The XML Schema was the first schema to receive “Recommendation” status from the World Wide Web consortium (W3C) in 2001. Competing schemas have since become available. One competing schema is the Regular Language for XML Next Generation (RELAX NG) schema. RELAX NG may be a simpler schema, and its specification also defines a non-XML, compact syntax. This recipe’s example uses the XML schema.
Run the example code using the following command-line syntax, preferably with the sample .xml
file and validation files provided as resources/patients.xml
and patients.xsl
, respectively:
java org.java7recipes.chapter22.ValidateXml <xmlFile><validationFile>
You would like to generate a set of Java classes (Java bindings) that represent the objects within an XML schema.
The JDK provides a tool that can turn schema documents into representative Java class files. Use the <JDK_HOME>/bin/xjc
command-line tool to generate Java bindings for your XML schemas. To create the Java classes for the patients.xsd
file from section 22-3, you could issue the following command from within a console:
xjc –p org.java7recipes.chapter22 patients.xsd
This command will process the patients.xsd
file and create all the classes needed to process an XML file that validates with this schema. For this example, the patients.xsd
file looks like the following:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="patients">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" name="patient" type="Patient"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="Patient">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="diagnosis" type="xs:string"/>
</xs:sequence>
<xs:attribute name="id" type="xs:integer" use="required"/>
</xs:complexType>
</xs:schema>
Executed on the previous xsd
file, the xjc
command creates the following files in the org.java7recipes.chapter22
package:
ObjectFactory.java
Patients.java
Patient.java
The JDK includes the <JDK_HOME>/bin/xjc
utility. The xjc
utility is a command-line application that creates Java bindings from schema files. The source schema files can be of several types, including XML Schemas, RELAX NG, and others.
The xjc
command has several options for performing its work. Some of the most common options specify the source schema file, the package of the generated Java binding files, and the output directory that will receive the Java binding files.
You can get detailed descriptions of all the command line options by using the tools’ –help
option:
xjc –help
A Java binding contains annotated fields that correspond to the fields defined in the XML Schema file. These annotations mark the root element of the schema file and all other subelements. This is useful during the next step of XML processing, which is either unmarshalling or marshalling these bindings.
You want to unmarshall an XML file and create its corresponding Java object tree.
JAXB provides an unmarshalling service that parses an XML file and generates the Java objects from the bindings you created in recipe 22-4. The following code can read the file patients.xml
from the org.java7recipes.chapter22
package to create a Patients
root object and its list of Patient
objects:
public void run(String xmlFile, String context)
throws JAXBException, FileNotFoundException {
JAXBContext jc = JAXBContext.newInstance(context);
Unmarshaller u = jc.createUnmarshaller();
FileInputStream fis = new FileInputStream(xmlFile);
Patients patients = (Patients)u.unmarshal(fis);
for (Patient p: patients.getPatient()) {
System.out.printf("ID: %s
", p.getId());
System.out.printf("NAME: %s
", p.getName());
System.out.printf("DIAGNOSIS: %s
", p.getDiagnosis());
}
}
If you run the sample code on the resources/patients.xml
file and use the org.java7recipes.chapter22
context, the application will print the following to the console as it iterates over the Patient
object list:
ID: 1
NAME: John Smith
DIAGNOSIS: Common Cold
ID: 2
NAME: Jane Doe
DIAGNOSIS: Broken ankle
ID: 3
NAME: Jack Brown
DIAGNOSIS: Food allergy
Note The previous output comes directly from instances of the Java Patient
class that was created from XML representations. The code does not print the contents of the XML file directly. Instead, it is printing the contents of the Java bindings after the XML has been marshalled into appropriate Java binding instances.
Unmarshalling an XML file into its Java object representation has at least two criteria:
The Java bindings don’t have to be autogenerated from the xjc
command. Once you’ve gained some experience with Java bindings and the annotation features, you may prefer to create and control all aspects of Java binding by handcrafting your Java bindings. Whatever your preference, Java’s unmarshalling service utilizes the bindings and their annotations to map XML objects to a target Java object and to map XML elements to target object fields.
Execute the example application for this recipe using this syntax, substituting patients.xml
and org.java7recipes.chapter22
for the respective parameters:
java org.java7recipes.chapter22.UnmarshalPatients <xmlfile><context>
You need to write an object’s data to an XML representation.
Assuming you have created Java binding files for your XML schema as described in recipe 22-4, use a JAXBContext
instance to create a Marshaller
object. Use the Marshaller
object to serialize your Java object tree to an XML document. The following code demonstrates this:
public void run(String xmlFile, String context)
throws JAXBException, FileNotFoundException {
Patients patients = new Patients();
List<Patient> patientList = patients.getPatient();
Patient p = new Patient();
p.setId(BigInteger.valueOf(1));
p.setName("John Doe");
p.setDiagnosis("Schizophrenia");
patientList.add(p);
JAXBContext jc = JAXBContext.newInstance(context);
Marshaller m = jc.createMarshaller();
m.marshal(patients, new FileOutputStream(xmlFile));
}
The previous code produces an unformatted but well-formed and valid XML document. For readability, the XML document is formatted here:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<patients>
<patient id="1">
<name>John Doe</name>
<diagnosis>Schizophrenia</diagnosis>
</patient>
</patients>
Note The getPatient()
method in the previous code returns a List of Patient objects instead of a single patient. This is a naming oddity of the JAXB code generation from the XSD schema in this example.
A Marshaller
object understands JAXB annotations. As it processes classes, it uses the JAXB annotations to provide it the context it needs for creating the object tree in XML.
You can run the previous code from the org.java7recipes.chapter22.MarshalPatients
application using the following command line:
java org.java7recipes.chapter22.MarshalPatients <xmlfile><context>
The context argument refers to the package of the Java classes that you will marshal. In the previous example, because the code marshals a Patients
object tree, the correct context is the package name of the Patients
class. In this case, the context is org.java7recipes.chapter22
.