Combining XML Parsers

Once the parser gets through the XML tags describing the podcast feed as a whole, it gets into a series of <item> tags that describe individual episodes. If we think of the XML as a tree structure, the contents of each episode are one level down from the top-level metadata. Let’s look at what an episode item looks like in the XML (we’ve added some line breaks and omitted a huge block of HTML in the <description>tag to fit the book’s formatting):

 <item>
  <title>Episode 1: The Pilot Episode</title>
  <pubDate>Tue, 27 May 2014 16:50:00 +0000</pubDate>
  <guid isPermaLink="false">
  <![CDATA[https://www.signalleaf.com/podcasts/CocoaConf-Podcast/
  5384fccb9caead020000000f]]></guid>
  <link><![CDATA[https://www.signalleaf.com/podcasts/CocoaConf-Podcast/
  5384fccb9caead020000000f]]></link>
  <itunes:image href=
  "http://static.libsyn.com/p/assets/5/f/d/1/
  5fd1ec09a6797067/podcast-icon-1400x1400.jpg" />
  <description>...</description>
  <enclosure length="25453104" type="audio/x-m4a"
  url="http://traffic.libsyn.com/cocoaconf/
  CocoaConf001.m4a?dest-id=305853" />
  <itunes:duration>29:07</itunes:duration>
  <itunes:explicit>no</itunes:explicit>
  <itunes:keywords />
  <itunes:subtitle><![CDATA[In which we begin]]></itunes:subtitle>
 </item>

As it is, the parser will descend into each of these tags and keep emitting its element-started and -ended callbacks. So we’d get a didStartElement for the <item> tag, and then another for its first child, like the episode’s <title>.

We could let the parser keep going like this, digging deeper and deeper into the XML tags’ hierarchy. But this will get hard to code after a while, because some tags—like title or description—are used for different purposes depending where in the tree structure we are. That means some of the cases in our switch might need their own switches or if-elses to figure it all out. For example, case "title" would then have to have some logic for “is this the title of an episode or the whole podcast?” That could get really ugly fast.

What works better, actually, is to have multiple XMLParsers. That’s what we’re going to do next.

Parsing Podcast Episodes

First, though, since we know we need to store the parsed episodes somewhere, let’s switch over to PodcastFeed.swift and add an array to hold them:

 var​ episodes : [​PodcastEpisode​] = []

Next, we’ll create a new class that knows how to parse episodes. Do FileNewFile… and once again choose an iOS “Swift File” from the template sheet. Name it PodcastEpisodeParser.swift.

We’ve called this class a “parser,” but really all that matters is that it can implement the XMLParserDelegate protocol.

 class​ ​PodcastEpisodeParser​ : ​NSObject​, ​XMLParserDelegate​ {
 
 }

The reason this class needs to also implement the XMLParserDelegate protocol is the crucial trick of iOS XML parsing: we’re going to pass the delegate from one object to another during the parsing process, and it’s going to work like this:

  1. When the PodcastFeedParser starts an item (i.e., an episode), create a PodcastEpisodeParser and make it the XMLParser’s delegate.

  2. The PodcastEpisodeParser is then responsible for populating the fields of a PodcastEpisode.

  3. When the PodcastEpisodeParser reaches the end of an item, reassign the XMLParser’s delegate back to the PodcastFeedParser, which can then collect the completed PodcastEpisode and add it to the feed’s array.

Let’s start putting the pieces in place. In our empty PodcastEpisodeParser.swift, add two new variables: one for the PodcastFeedParser that creates the episode parser, and one for the episode that will be built up.

 let​ feedParser : ​PodcastFeedParser
 var​ currentEpisode : ​PodcastEpisode

Those are non-optional properties, so they’ll give an error for a moment, because they haven’t been initialized by the end of init. Good thing we’re ready to write the init:

 init(feedParser: ​PodcastFeedParser​, xmlParser: ​XMLParser​) {
 self​.feedParser = feedParser
 self​.currentEpisode = ​PodcastEpisode​()
 super​.init()
  xmlParser.delegate = ​self
 }

Providing this one init means the only way to create our parser is by providing the feed parser that we can return the delegate to when we’re done, and the XMLParser itself, which we need only so we can reassign its delegate. The practical upshot of this is that as soon as the feed parser creates the episode parser, the latter will be all set to parse out the contents of the <item> tag.

As before, we need to build up a string of the current element’s text, so add that as a property.

 var​ currentElementText: ​String​?

Now we’re ready to get started with our delegate callbacks as before. One thing that’s different is that the URL of the podcast episode is not in its own tag. Instead, it is an attribute of the <enclosure> tag. The started-element callback is the only time we get a look at a tag’s attributes, so we’ll want to grab the URL attribute then.

 func​ parser(_ parser: ​XMLParser​, didStartElement elementName: ​String​,
  namespaceURI: ​String​?, qualifiedName qName: ​String​?,
  attributes attributeDict: [​String​ : ​String​] = [:]) {
 switch​ elementName {
 case​ ​"title"​, ​"itunes:duration"​:
  currentElementText = ​""
 case​ ​"itunes:image"​:
 if​ ​let​ urlAttribute = attributeDict[​"href"​] {
  currentEpisode.iTunesImageURL = ​URL​(string: urlAttribute)
  }
  currentElementText = ​nil
 case​ ​"enclosure"​:
 if​ ​let​ href = attributeDict[​"url"​], ​let​ url = ​URL​(string:href) {
  currentEpisode.enclosureURL = url
  }
  currentElementText = ​nil
 default​:
  currentElementText = ​nil
  }
 }

Our switch statement here has four cases:

  • For tags that we need to populate in PodcastEpisode, which is currently only title and itunes:duration, reset the currentElementText to an empty string.

  • If and when the itunes:image element is encountered, get its href attribute, and save it as the iTunesImageURL property.

  • Similarly, when the enclosure element is encountered, find the url attribute, and save it to the currentEpisode.

  • As with the feed parser, for tags we don’t care about, nil out the currentElementText, so that appending characters do nothing. Notice in the cases where we get our contents from an attribute, we don’t want to collect any textual contents of the tag, so we nil out currentElementText there too.

Now we can move on to collecting the text of elements we care about, which is exactly as before:

 func​ parser(_ parser: ​XMLParser​, foundCharacters string: ​String​) {
  currentElementText?.append(string)
 }

The last delegate callback method we need to implement is when the element ends. In the feed parser, this was where we would store the contents of currentElementText. For the episode parser, this is also the time we will look for the end of the <item> tag itself, which is our signal to hand control back to the feed parser.

1: func​ parser(_ parser: ​XMLParser​, didEndElement elementName: ​String​,
namespaceURI: ​String​?, qualifiedName qName: ​String​?) {
switch​ elementName {
case​ ​"title"​:
5:  currentEpisode.title = currentElementText
case​ ​"itunes:duration"​:
currentEpisode.iTunesDuration = currentElementText
case​ ​"item"​:
parser.delegate = feedParser
10:  feedParser.parser(parser, didEndElement: elementName,
namespaceURI: namespaceURI, qualifiedName: qName)
default​:
break
}
15: }

The handling of title is similar to how we stored the fields in the feed parser, so the real difference here is how we handle the ending of the item itself. On line 9, we reset the XMLParser’s delegate back to the feedParser, so it (and not this episode parser) will get the subsequent callbacks for later contents of the XML.

Then, on lines 10-11, we manually call the feed parser’s did-end-element method. With the episode parser’s work done, this gives the feed parser a chance to collect the parsed episode.

Using the Episode Parser

That’s it for the PodcastEpisodeParser. Now we just have to create and use it from the feed parser. Switch back to PodcastFeedParser.swift, and start by creating a property for the episode parser:

 var​ episodeParser : ​PodcastEpisodeParser​?

Next, in parser(didStartElement:namespaceURI:qualifiedName:), take out the case that aborted parsing when we first start an item, and instead use the opportunity to create the PodcastEpisodeParser:

 case​ ​"item"​:
  episodeParser = ​PodcastEpisodeParser​(feedParser: ​self​,
  xmlParser: parser)
  parser.delegate = episodeParser

This is the first step of our original plan: passing the XMLParser’s delegate to the episode parser, so it starts getting the callbacks as elements are encountered. We’ve written all the episode-parsing logic, and when it’s done, it re-points the delegate back to the feed parser and manually calls the did-end-element method. Now, we need to grab the completed episode and add it to the array of episodes. So add a case in parser(didEndElement:namespaceURI:qualifiedName:) to handle that:

 case​ ​"item"​:
 if​ ​var​ episode = episodeParser?.currentEpisode {
 if​ episode.iTunesImageURL == ​nil​ {
  episode.iTunesImageURL = currentFeed?.iTunesImageURL
  }
  currentFeed?.episodes.append(episode)
  }
  episodeParser = ​nil

Notice we do one other thing here: if the episode did not find an episode-specific iTunesImageURL, we assign it the feed’s iTunesImageURL, if any. That way, we should pretty much always have an image to show in our UI.

Finally, since we took out the one logging statement that showed whether or not any of this worked, let’s log the currentFeed that we end up with when we reach the end of the document:

 func​ parserDidEndDocument(_ parser: ​XMLParser​) {
  print (​"parsing done, feed is ​​(​currentFeed​)​​"​)
 }

We are finally done! Run the app again, and look in the console for the results of our parsing. You should see something like this (we’ve reformatted the output for the book and cut it off after the first few episodes):

 parsing done, feed is Optional(PragmaticPodcasts.PodcastFeed(title:
 Optional("CocoaConf Podcast"), link:
 Optional(http://cocoaconf.com/podcast), description:
 Optional("The CocoaConf Podcast features members of the iOS and OS X
 community offering tips, insight, facts, and opinions for other iOS and OS X
 developers. You'll recognize many of the voices from the popular CocoaConf
 conference series."), iTunesAuthor: Optional("Daniel H Steinberg"),
 iTunesImageURL:
 Optional(http://static.libsyn.com/p/assets/5/f/d/1/5fd1ec09a6797067/
 podcast-icon-1400x1400.jpg), episodes: [PragmaticPodcasts.PodcastEpisode(title:
 Optional("Episode 22: Anastasiia Voitova"), enclosureURL:
 Optional(http://traffic.libsyn.com/cocoaconf/anastasiia.mp3?dest-id=305853),
 iTunesDuration: Optional("29:01"), iTunesImageURL:
 Optional(http://static.libsyn.com/p/assets/5/f/d/1/5fd1ec09a6797067/
 podcast-icon-1400x1400.jpg)), PragmaticPodcasts.PodcastEpisode(title:
 Optional("Episode 21: Marc Edwards"), enclosureURL:
 Optional(http://traffic.libsyn.com/cocoaconf/marc.mp3?dest-id=305853),
 iTunesDuration: Optional("01:00:01"), iTunesImageURL:
 Optional(http://static.libsyn.com/p/assets/5/f/d/1/5fd1ec09a6797067/
 podcast-icon-1400x1400.jpg)),
 ...

Notice this includes the metadata for the podcast itself in the first few lines, and then the properties we parsed for each episode: title, enclosure URL, duration, and image.

What have we accomplished? We started with a Data object that was mostly XML syntax and many tags that aren’t useful to our app. At the end of the parse, we have Swift objects that contain just the data that we need to show a podcast feed’s metadata and its individual episodes in our app. With those URLs, we can use our player scene to present and play any episode of the podcast.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset