Parsing XML

We’ve downloaded the XML as a Data object, and created two structs to hold the useful parts of that data. Now the question is how we go through the data to populate those structures. The way we do this is to use a parser, an object that understands the syntax and structure of XML. The parser can walk through the tree-structured XML data and expose its element names, attributes, text, parent-child attributes, and so on.

As we might expect from a modern computing platform, the iOS SDK does have built-in parsing for XML, in the form of Foundation’s XMLParser class.

XMLParser is an event-driven parser, rather than a document-oriented parser like you might be used to from working with the DOM of a web page. The difference is that instead of parsing and delivering an entire document model all at once, an event-driven parser fires off events as it goes. Our code can be notified when the parser starts or ends XML elements, encounters text or CDATA, stops on an error, and so on.

The trick is to act only on the events relevant to our needs and build up a data model that makes sense for our app—that is to say, using the PodcastFeed and PodcastEpisode types we just created.

Setting Up the XMLParser

Looking at the XMLParser docs, there are inits that take URLs and Data, which is great, since our URLSession downloads the podcast feed to a Data. But there doesn’t seem to be anything about parsing the XML elements themselves.

The way it works is that the parser has a delegate property of type XMLParserDelegate. Look up XMLParserDelegate and that’s where we find all the methods about starting and ending elements, finding text, and so on. Notice also that XMLParserDelegate is a protocol—the idea is that we implement this protocol in one of our classes and assign it to the protocol’s delegate property, and it gets callbacks as the parser does its work.

This is a lot like the target-action pattern we saw with key-value observing back in Constantly Changing Outlets. Here the pattern is called delegation. The idea is that when the parser doesn’t know what to do at some point (for example, when it encounters a new XML element), it can delegate that work to another object. Delegation is an older pattern in iOS, but we still see it in many of the frameworks.

First thing we’ll want to do is declare that the PodcastFeedParser class we’ve been working in can be the delegate. Unfortunately, if we just append : XMLParserDelegate to its declaration, we get an error that it doesn’t conform to NSObjectProtocol. Unfortunately, the XMLParser API is so old that any delegate has to be a subclass of NSObject, so change the class declaration like this:

 class​ ​PodcastFeedParser​ : ​NSObject​, ​XMLParserDelegate​ {
Joe asks:
Joe asks:
Why Does an XMLParserDelegate Need to Be an NSObject?

An essential part of how delegate protocols work is that the class calling back to the delegate (XMLParser, in this case) needs to be able to discover whether a given delegate method has or hasn’t been implemented, since we’re free to ignore any delegate methods that aren’t useful to us. If the delegate method exists, it gets called; if not, the parser continues on.

This is an example of language dynamism, the ability to perform certain tasks at runtime that more static languages perform at compile time. In some languages, you can invoke a function by just providing its name as a string, but in something like C, a function is basically a memory address, and must be figured out at compile time. Swift can be somewhat dynamic, but dynamism is not as baked into the language and basic types as it was in Objective-C, which the older iOS APIs were built around. Swift prefers figuring things out at compile time—which tends to produce safer and faster code—and this makes it a more static language than Objective-C.

So the trade-off is that for using something like XMLParser, we can’t use Swift types like structures, enumerations, or even basic classes: we have to subclass the legacy NSObject to pick up those dynamic behaviors.

Now that we’re a delegate, we can start parsing. Rewrite the init method as follows (we’re only changing a few lines, as described after the listing).

1: init(contentsOf url: ​URL​) {
super​.init()
let​ urlSession = ​URLSession​(configuration: .​default​)
let​ dataTask = urlSession.dataTask(with: url) {dataMb, responseMb, errorMb ​in
5: if​ ​let​ data = dataMb {
let​ parser = ​XMLParser​(data: data)
parser.delegate = ​self
parser.parse()
}
10:  }
dataTask.resume()
}

Lines 6-8 replace the dataString we were logging earlier. On line 6, we create the XMLParser, and then set its delegate to self on line 7. Then we can kick off the parser on line 8 by calling its parse method.

The only other change is that since we’re referring to self inside the init that creates self, we need to add the explicit call to super.init() on line 2, so that self already exists before we capture it in the closure.

Go ahead and run this and…well…nothing happens! The player UI just comes up as usual. The XMLParser actually has parsed the podcast feed, but since we haven’t implemented any of the delegate methods, the parser never called back to our code. Let’s try adding something simple. After the end of the init method, but still inside the class, add a trivial implementation of the XMLParserDelegate method parserDidStartDocument.

 func​ parserDidStartDocument(_ parser: ​XMLParser​) {
  print (​"parserDidStartDocument, "​ +
 "currently on line ​​(​parser.lineNumber​)​​"​)
 }

Run again, check the console at the bottom of the Xcode window, and you should see:

 PragmaticPodcasts[6520:2911015] parserDidStartDocument, currently on line 1

And there we go; we are now parsing XML. Now to pull out the parts we want.

Parsing XML Elements

So now that we are downloading the podcast feed and have a parser set up, how do we pull out the title, description, and other fields that will go into our PodcastFeed and PodcastEpisode structures?

Let’s think about how the XMLParserDelegate works. Looking at the documentation, it can call us when it starts an element, ends an element, and gets characters (text) inside an element. It’s a little primitive, but it does give us everything we need. Here’s a strategy:

  1. When the parser starts an element that we care about, initialize a “current element text” property.

  2. When the parser gets characters, append them to the current element text.

  3. When an element ends, look at its name to see which field of our structure it goes with, and write the current element text to it.

To take it easy at first, we’ll start with populating the PodcastFeed, before we move on to building individual PodcastEpisodes. In our PodcastFeedParser.swift, start by declaring two variable properties.

 var​ currentFeed : ​PodcastFeed​?
 var​ currentElementText: ​String​?

These properties represent a PodcastFeed we’re creating, plus the current element property that we strategized previously. We’ll reset the currentFeed when we start parsing a new document, so let’s rewrite the parserDidStartDocument delegate method to do that:

 func​ parserDidStartDocument(_ parser: ​XMLParser​) {
  currentFeed = ​PodcastFeed​()
 }

So far, so good. Next, we’ll implement the “start element” callback.

 func​ parser(_ parser: ​XMLParser​, didStartElement elementName: ​String​,
  namespaceURI: ​String​?, qualifiedName qName: ​String​?,
  attributes attributeDict: [​String​ : ​String​] = [:]) {
 switch​ elementName {
 case​ ​"title"​, ​"link"​, ​"description"​, ​"itunes:author"​:
  currentElementText = ​""
 case​ ​"itunes:image"​:
 if​ ​let​ urlAttribute = attributeDict[​"href"​] {
  currentFeed?.iTunesImageURL = ​URL​(string: urlAttribute)
  }
 default​:
  currentElementText = ​nil
  }
 }

There are three things we might want to do here, based on the elementName, so we deal with them in a switch:

  • If the element is one whose text we want to capture, we set currentElementText to an empty string, which we will append later as we receive text callbacks.

  • If the element is itunes:image, then the URL we want is available from the tag’s attributes, so we try to grab it from attributeDict.

  • Any other cases are tags we don’t care to parse, so we nil out currentElementText, as a sign we can use later to tell our parser to not bother with this element’s text.

Next, we’ll deal with text inside an element. That means that if we’re currently parsing a title element whose markup looks like this:

 <title>CocoaConf Podcast</title>

Then the text in this case is CocoaConf Podcast. But we have to be careful. The documentation for parser(foundCharacters:) reads:

Sent by a parser object to provide its delegate with a string representing all or part of the characters of the current element.

The important part here is “all or part of the characters.” It’s not guaranteed we’ll get the string all at once. We could receive it over the course of several callbacks. And that means we need to gradually build up currentElementText over the course of however many times parser(foundCharacters:) is called:

 func​ parser(_ parser: ​XMLParser​, foundCharacters string: ​String​) {
  currentElementText?.append(string)
 }

Notice the use of the optional-chaining operator, ?. If currentElementText is nil—as it is for all the tags we aren’t interested in parsing—this line will quietly do nothing.

Finally, our third step is to implement the element-did-end method, using the currentElementText to populate the corresponding field of the currentFeed. This delegate method is parser(didEndElement:namespaceURI:qualifiedName:), another long signature that you’ll probably want to use autocomplete for, rather than writing by hand.

 func​ parser(_ parser: ​XMLParser​, didEndElement elementName: ​String​,
  namespaceURI: ​String​?, qualifiedName qName: ​String​?) {
 switch​ elementName {
 case​ ​"title"​:
  currentFeed?.title = currentElementText
 case​ ​"link"​:
 if​ ​let​ linkText = currentElementText {
  currentFeed?.link = ​URL​(string: linkText)
  }
 case​ ​"description"​:
  currentFeed?.description = currentElementText
 case​ ​"itunes:author"​:
  currentFeed?.iTunesAuthor = currentElementText
 default​:
 break
  }
 }

Our implementation is basically one big switch statement, switching on the various elementName values we care about, and either assigning the currentElementText value to the struct fields directly, or converting them to URLs as needed.

This is all we need to parse the information at the beginning of the feed. We might as well stop before parsing any of the episodes—something we’ll tackle in the next section—particularly because they use some of the same element names we’re currently handling, but for different purposes. So go up to the didStartElement we wrote earlier, and add another case to the switch, prior to the default:

 case​ ​"item"​:
  parser.abortParsing()
  print(​"aborted parsing. podcastFeed = ​​(​currentFeed​)​​"​)

The idea here is that the first time we encounter an element named item, we know it’s an episode, so we abort parsing (for now, anyways) and log what we’ve received so far. Look in the console and you should see output like this:

 PragmaticPodcasts[8393:3720948] aborted parsing. podcastFeed =
 Optional(PragmaticPodcasts.PodcastFeed(title: Optional("CocoaConf Podcast"),
 link: Optional(http://cocoaconf.com/podcast), description: Optional("The
 CocoaConf Podcast features members of the iOS and OS X community offering
 tips, insight, facts, and opinions for other iOS and OS X developers. You'll
 recognize many of the voices from the popular CocoaConf conference series."),
 iTunesAuthor: Optional("Daniel H Steinberg"), iTunesImageURL: nil))

So we’re now populating our PodcastFeed structure from the contents at the top of the XML feed. All we have to do now is get the individual episodes, and we’ll finally have a complete data model for our podcast app.

Marking Sections of Code

images/aside-icons/tip.png

When we’re editing source, the jump bar at the top of the content pane shows a series of pop-up menu items. Going left to right, they become more specific: the current project (click to see its targets); the group folder for the current source file (click to see other groups); the source filename (click to see other files in the group); and finally, the property, method, or function that the cursor is currently in (click to jump around to other methods).

As source files grow, this last one can become a huge list of method names, so long that it becomes unreadable. One way to clean it up is to add special comments. If you add //MARK: and some text, it creates a section header. For example, to create the image seen here:

images/network/jump-bar-section-mark.png

we just used the following syntax on a line by itself:

 // MARK: - XMLParserDelegate implementation

The hyphen after the colon is optional; it creates a divider line. You can also use the syntax //TODO: or //FIXME: to create reminders for yourself that will appear in this pop-up menu.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset