Chapter 14

XML

This chapter deals with XML, short for eXtensible Markup Language. XML is technically a completely language-independent topic. Scala was developed with XML in mind and makes it easy to work with.

In chapter 9, we learned how to read "from" and "write to" text files. The files that we used in that and the following chapters are what are called "flat" text files. They have the data in them with nothing that tells us about the nature of the data other than formatting. The advantages of this are that it is fairly simple to read and write and it can be both read and written with standard tools like a text editor. The disadvantages are that it can be slow and it lacks any inherent meaning so it is hard to move the information from one program to another. It is also somewhat error prone because so much information is in the formatting. XML addresses the latter disadvantage, without losing the advantages.

The eXtensible Markup Language is a standard for formatting text files to encode any type of information. It is a markup language, not a programming language. The standard simply defines a format for encoding information in a structured way. It is likely that you have heard of a different markup language called HTML, the HyperText Markup Language. HTML is used to encode web pages and has a format very similar to XML. Indeed, a newer standard called XHTML is basically HTML that conforms to the XML rules.

14.1 Description of XML

Everything in an XML file can be classified as either markup or content. Markup in XML is found between '<' and '>' or between '&' and ';'. The content is anything that is not markup. To help you understand XML we will look at an example XML file. This example is built on the idea of calculating grades for a course.

<course name="CSCI 1320">
  <student fname="Jason" lname="Hughes">
  <quiz grade="98"/>
  <quiz grade="100"/>
  <quiz grade="90"/>
  <test grade="94"/>
  <assignment grade="100">
  <!-- Feedback -->
  Code compiled and runs fine.
  </assignment>
  </student>
  <student fname="Kevin" lname="Peese">
  <quiz grade="85"/>
  <quiz grade="78"/>
  <test grade="67"/>
  <assignment grade="20">
  Code didn't compile.
  </assignment>
  </student>
</course>

Just reading this should tell you what information it contains. Now we want to go through the different pieces of XML to see how this was built.

14.1.1 Tags

Text between '<' and '>' characters are called tags. Nearly everything in our sample XML file is inside of a tag. The first word in the tag is the name of the tag and it is the only thing that is required in the tag. There are three types of tags that can appear in an XML file.

  • Start tag - Begins with '<' and ends with '>',
  • End tag - Begins with '</' and ends with '>',
  • Empty-element tag - Begins with '<' and ends with '/>'.

Tags can also include attributes, which come after the name and before the close of the tag. These are discussed below.

14.1.2 Elements

The tags are used to define elements which give structure to the XML document. An element is either a start tag and an end tag with everything in between or else it is an empty element tag. The empty element tag is simply a shorter version of a start tag followed by an end tag with nothing in between.

Elements can be nested inside of one another. In our sample file, there is a course element that encloses everything else. There are two student elements inside of that. Each of those includes elements for the different grades. Most of these are empty elements, but the assignments are not empty and have contents in them.

Any time there is a start tag, there should be a matching end tag. When elements are nested, they have to be nested properly. That is to say that if element-2 begins inside of element-1, then element-2 must also end before the end of element-1. In addition to other elements, you can place general text inside of elements.

14.1.3 Attributes

Additional information can be attached to both start tags and empty element tags in the form of attributes. An attribute is a name value pair where the value is in quotes and the two are separated by an equal sign. The example XML is loaded with attributes. In fact, every start or empty element tag in the example has an attribute. It is not required that these tags have an attribute, but it is a good way to associate simple data with a tag. Some of the tags in the example show that you can also have multiple attributes associated with them.

14.1.4 Content

Between a start tag and an end tag you can not only put other tags, you can put plain text. This text is the contents of the element. In the example XML, the assignment element has text in it that serves as a comment on the grade. The text that you put inside of an element is unformatted and can include anything you want. Unlike the markup part of the XML document, there is no special formatting on content.

14.1.5 Special Characters

While the content does not have any special formatting, it is still embedded in an XML document. There are certain characters that are special in XML that you can not include directly in the content. For example, if you put a '<' in the content it will be interpreted as the beginning of a tag. For this reason, there is a special syntax that you can use to include symbols in the content of an XML file. This syntax uses the other form of markup that begins with & and ens with ;. There are five standard values defined for XML.

  • &amp; = &
  • &apos; = '
  • &gt; = >
  • &lt; = <
  • &quot; = "

There are many more defined for other specific markup languages. If you have ever looked at HTML you have probably seen &nbsp; used to represent spaces.

14.1.6 Comments

Just like with code, it is helpful to occasionally put comments in your XML. A comment begins with '<!--' and ends with '-->'. You can put whatever text you want between these as long as it does not include the sequence to end the comment.

14.1.7 Overall Format

There are two other rules that are significant for XML. First, the entire XML document must be inside of a single element. In the example above everything was in the course element. Had we wanted to have more than one course element, we would have had to create some higher-level element to hold them.

In addition, most XML files will begin with an XML declaration that comes right before the element containing the information. The declaration has a slightly different syntax and might look like the following line.

<?xml version="1.0" encoding="UTF-8" ?>

14.1.8 Comparison to Flat File

To better understand the benefit of XML, we will compare our XML file to a flat file that might be used to store basically the same information. Here is a flat file representation of the grade information from the XML above.

CSCI 1320
2
Jason Hughes
98 100 90
90
100
Kevin Peese
85 78
67
20

This is a lot shorter, but unless you happen to know what it is encoding, you can not figure much out about it. Namely, the numbers in the file are hard to distinguish. The 2 near the top you might be able to figure out, but without additional information, it is impossible to determine which lines of grades are assignments, quizzes, or tests.

The flat file also lacks some of the information that was in the XML. In particular, the comments on the assignments in the XML format are missing in this file. It would be possible to make the flat file contain such information, but doing so would cause the code required to parse the flat file to be much more complex.

14.1.8.1 Flexibility in XML

A significant "Catch 22" of XML is that there are lots of different ways to express the same information. For example, the comment could have been given as an attribute with the name comment instead of as contents of the element. Similarly, if you do not want to allow comments, you could shorten the XML to be more like the flat file by changing it to the following format.

<course name="CSCI 1320">
  <student fname="Jason" lname="Hughes">
  <quizzes>98 100 90</quizzes>
  <tests>94</tests>
  <assignments>100</assignments>
  </student>
  <student fname="Kevin" lname="Peese">
  <quizzes>85 78</quizzes>
  <tests>67</tests>
  <assignments>20</assignments>
  </student>
</course>

Here all the grades of the same type have been given as contents of elements with the proper names. This makes things much shorter and does not significantly increase the difficulty of parsing the grades. It does remove the flexibility of attaching additional information with each grade such as the comments.

There is not a rule that tells you if information should be stored in separate tags, as attributes in tags, or as text inside of elements. For specific applications there will be certain standards. XHTML is a perfect example of that. For your own data, it will be up to you as a developer or your team to determine how you want to encode the information. Whatever is decided, you need to hold to it consistently.

14.2 XML in Scala

XML is not part of the normal first semester topics in Computer Science, but Scala makes it so easy that there is no reason not too. To see this, simply go to the REPL and type in valid XML.

scala> <tag>Some XML.</tag>
res0: scala.xml.Elem = <tag>Some XML.</tag>

The Scala language has a built in XML parser that allows you to write XML directly into your code. You can see that the type of this expression is scala.xml.Elem.

The scala.xml package contains the types related XML. We will run through some of the more significant here.

  • Elem - Represents a single element. This is a subtype of Node.
  • Node - Represents a more general node in the XML document. This is a subtype of NodeSeq.
  • NodeSeq - Represents a sequence of Nodes.
  • XML - A helper object that has methods for reading and writing XML files.

There are quite a few other types that you can see in the API, but we will focus on these as they give us the functionality that we need for our purposes.

14.2.1 Loading XML

The XML object has methods we can use to either read from files or write to files. The loadFile method can be used to read in a file. If the first example XML that was shown is put in a file with the name 'grades.xml', then the following call would load it in.

scala> xml.XML.loadFile("grades.xml")
res4: scala.xml.Elem =
<course name="CSCI 1320">
  <student lname="Hughes" fname="Jason">
  <quiz grade="98"></quiz>
  <quiz grade="100"></quiz>
  <quiz grade="90"></quiz>
  <test grade="94"></test>
  <assignment grade="100">
    Code compiled and runs fine.
  </assignment>
  </student>
  <student lname="Peese" fname="Kevin">
  <quiz grade="85"></quiz>
  <quiz grade="78"></quiz>
  <test grade="67"></test>
  <assignment grade="20">
    Code did not compile.
  </assignment>
  </student>
</course>

Clearly this is the contents of the XML file that we had created. All that is missing is the comment, which was there for human purposes, not for other programs to worry about. In addition, the empty tags have also been converted to start and end tags with nothing in between. This illustrates that the empty tags were also just for human convenience and their meaning is the same as an empty pair of start and end tags.

14.2.2 Parsing XML

Once we have this Elem object stored in res4, the question becomes how you get the information out of it. The NodeSeq type, and hence the Node and Elem types which are subtypes of it, declare operators called and \. These operators are used to search inside the contents of an object. Both operators take a second argument of a String that gives the name of what you want to look for. The difference is how far they search. The operator looks only for things at the top level, either Nodes in the current sequence if we have a true NodeSeq, or children of this node if we have a Node or Elem. The \, on the hand, finds anything that matches at any depth below the current level. To illustrate this, we will do three example searches.

scala> res4  "student"
res5: scala.xml.NodeSeq =
NodeSeq(<student lname="Hughes" fname="Jason">
   <quiz grade="98"></quiz>
   <quiz grade="100"></quiz>
   <quiz grade="90"></quiz>
   <test grade="94"></test>
   <assignment grade="100">


  
     Code compiled and runs fine.
   </assignment>
  </student>, <student lname="Peese" fname="Kevin">
   <quiz grade="85"></quiz>
   <quiz grade="78"></quiz>
   <test grade="67"></test>
   <assignment grade="20">
     Code did not compile.
   </assignment>
   </student>)


  
scala> res4  "test"
res6: scala.xml.NodeSeq = NodeSeq()


  
scala> res4 \ "test"
res7: scala.xml.NodeSeq = NodeSeq(<test grade="94"></test>, <test grade="67"></test>)

The first two searches use the operator. The first one searches for elements that have the tag name "student". It finds two of them because they are at the top level and gives us back a NodeSeq with them in it. The second search looks for tags that have the name "test". This search returns an empty NodeSeq. This is because while there are tags with the name "test" in res4, they are nested more deeply inside of the "student" elements as as such, are not found by the operator. The last example searches for the same tag name, but does so with \, which searches more deeply, and hence gives back a NodeSeq with two Nodes inside of it.

The and \ operators can also be used to get the attributes from elements. To get an attribute instead of a tag, simply put a '@' at the beginning of the string you are searching for. Here are three searches to illustrate this.

scala> res4  "@name"
res8: scala.xml.NodeSeq = CSCI 1320


  
scala> res4  "@grade"
res9: scala.xml.NodeSeq = NodeSeq()


  
scala> res4 \ "@grade"
res10: scala.xml.NodeSeq = NodeSeq(98, 100, 90, 94, 100, 85, 78, 67, 20)

The first search uses to get the name of the top level node. Using to look for a @grade at the top level node does not give us anything, but using \ will return the values of all the @grades in the document.

Of course, what you really want to do is put the information from the XML file into a structure that can be used in the program. Given what we have learned, this would mean that we want to put things into case classes. The data in this XML file corresponds very closely to the student type that was created in chapter 10. That case class looked like this.

case class Student(name:String,assignments:List[Double],tests:List[Double], quizzes:List[Double])

In that chapter we parsed a flat file into an Array of Students. Now we will demonstrate how to do the same thing using the XML. We will start with a function that takes a Node that should be a student Element and returns a Student object. Such a function might look like the following.

def studentFromXML(elem:xml.Node):Student =
 Student((elem  "@fname")+" "+(elem  "@lname"),
 (elem  "assignment").map(n =>
 (n  "@grade").toString.toDouble).toList,
 (elem  "test").map(n => (n  "@grade").text.toDouble).toList,
 (elem  "quiz").map(n => (n  "@grade").text.toDouble).toList)

This function builds a Student object and passes in the four required arguments. The first is the name, which is made from the fname and lname attributes of the element. After that are three Lists of grades for the assignments, tests, and quizzes, respectively. These all have a similar form. They start by doing a search for to proper tag name and mapping the result to a function that converts the value of the grade attribute to a Double. The call to text is required because the result of here is a Node, not a String, and the Node type does not have a toDouble method. The last part of each grade type is a call to toList. This is required because the map is working on a NodeSeq and will give back a Seq, but a List is required for the Student type.

The use of map probably does not jump out to you at first. Hopefully at this point you have because quite comfortable with it and other higher-order methods. However, if you think a bit you will realize that is is a bit surprising here because the thing it is being called on is not a List or an Array. Instead, it is a NodeSeq. This works because the NodeSeq is itself a subtype of Seq[Node], meaning that all the methods we have been using on other sequences work just fine on this as well.

This is useful for getting our array of students as well. The following line shows how we can use map and toArray to get the result that we want with the studentFromXML function.

scala> (res4  "student").map(studentFromXML).toArray
res15: Array[Student] = Array(Student(Jason
 Hughes,List(100.0),List(94.0),List(98.0, 100.0, 90.0)), Student(Kevin
 Peese,List(20.0),List(67.0),List(85.0, 78.0)))

Again, the call to toArray gives us back the desired Array[Student] instead of a more general Seq[Student].

The text method applied to a full Elem will give you all of the text that appears inside of it and all subelements. So calling text on res4 gives the two comments along with a lot of whitespace. To get just the comment on any particular grade, you would parse down to that specific element and call the text method on it.

14.2.3 Building XML

So now you know how to get the contents of an XML file into a useful form in Scala. What about going the other way? Assume that the code we just wrote were used in the menu-based application from chapter 10 and that changes were made and now we want to write the results back out to a file. The first step in this would be to build the Node that represents the data.

We saw above that we can put XML directly into a Scala program or the REPL and it will be parsed and understood. However, that alone does not give us the ability to put values from the program back into the XML file. Fortunately, this is not hard to do either. Inside of XML that is embedded in a Scala program you can embed Scala expressions inside of curly braces. We will start with a simple example.

scala> <tag>4+5 is {4+5}</tag>
res19: scala.xml.Elem = <tag>4+5 is 9</tag>

Here the expression 4+5 has been put in curly braces and as you can see it evaluates to the value 9 as it should. The code you put inside of the curly braces can be far more complex and built additional XML content or tags.

We will use this to write a function that packs a Student object into an XML node. This code looks like the following.

def studentToXML(stu:Student):xml.Node = {
  val nameParts=stu.name.split("+")
  <student fname={nameParts(0)} lname={nameParts(1)}>
  {stu.quizzes.map(q => <quiz grade={q.toString}/>)}
  {stu.tests.map(t => <test grade={t.toString}/>)}
  {stu.assignments.map(a => <assignment grade={a.toString}/>)}
  </student>
}

The first line splits the student name into pieces around spaces. It is assumed that the first element is the first name and the second element is the second name. These are used as attribute values in the student tag. Inside of that element are three lines of code, one each for quizzes, tests, and assignments. Each of these maps corresponding Lists to a set of elements with grade attributes.

It is worth noting two things about using code for the attribute values. First, the quotes are not written anywhere. They are automatically provided when the value is Scala code. Second, the type of the Scala expression for the value has to be String. This is apparent with the grade values. They are Doubles and have to be explicitly converted to Strings.

14.2.4 Writing XML to File

Once you have the Node you want to write, the writing process is as easy as a call to the save method of the XML object.

xml.XML.save("grades.xml",node)

The first argument is the name of the file you want to write to. The second is the Node that you want to write.

Validating XML (Advanced)

When XML is used for large applications, it is important to be able to verify that the contents of a file match the format that is required by the applications. This process is called validation. When the XML standard was first released, validation was done with Document Type Definition files (DTDs)DTD). A DTD is a text file that has a fairly simple format that allows you to specify what types of elements should be in a file. For each element you can say what needs to be in it or what could be in it. This ability includes attributes as well at subelements.

DTDs were generally considered to have two problems. First, they were a bit limited and simplistic. For example, you could say that an element must contain an attribute, but you could not put any constraints on the nature of that attribute. You could not say that it had to be a number or a date. In addition, DTDs had their own syntax. The goal of XML was to be a general data storage format and some found it unfitting that you had to use some other format to specify what should go into an XML file.

For these reasons, XML schema were created. An XML schema is an XML document that uses certain tags to specify what can and can not go into a certain XML file. XML schema tend to be large and complex, but they provide great control over the format of an XML file.

There are tools for both DTDs and XML schema that will run through XML files and tell you whether or not they adhere to a given specification. There is also a package in the Scala standard libraries called scala.xml.dtd that can help with validation using DTDs.

14.2.5 XML Patterns

We have seen patterns in a number of places so far. With XML we can add another one. Not only can XML be written directly into the Scala language, it can be used to build patterns. The patterns look just like what you would use to build XML, except that what you put into curly braces should be names you want bound as variables. There is one significant limitation, you can not put attributes into your patterns. This usage can be seen with the following little example in the REPL.

scala> val personXML = <person><name>Mark</name><gender>M</gender></person>
personXML: scala.xml.Elem = <person><name>Mark</name><gender>M</gender></person>


  
scala> val <person><name>{name}</name><gender>{sex}</gender></person> = personXML
name: scala.xml.Node = Mark
sex: scala.xml.Node = M

14.3 Putting it Together

To illustrate the real power of XML we will make a more complete theme park program that includes the functionality of some of the earlier scripts along with editing abilities. All the information will be stored in a single XML file. This last part is something that was not highlighted before, but it implicitly comes with the ability to give meaning to data. In a flat text file, it is the position in the file that gives meaning to something. This makes it very hard to insert new information of different types. That is not a problem for XML as new tags can be added as desired. As long as the new tags have names that do not conflict with earlier tags, the earlier code will continue to work just fine.

In chapter 8, we wrote a script that would help with building schedules. In chapter 13 we wrote another script that could be used to determine the employee of the month. Both of these deal with information related to employees and rides, but there is not a 100% overlap between the required data and the file formats are very different. We want to write a script here that will include the functionality of both of those scripts, along with the ability to add ride and employee information while keeping all of the information stored in a single XML file.

Code for this is shown below. It starts with the definition of a number of case classes followed by functions that can build instances of those case classes from XML or build XML from them. After those functions are four lines that declare the main data for the program while reading it in from an XML file specified on the command-line. This is followed by slightly modified versions of the schedule builder and the employee ranker from previous chapters.

import scala.io.Source
import scala.xml._
case class DayData(ride:String, dayOfWeek:String, operators:Array[String],
  numRiders:Int)
case class MonthData(month:Int, days:List[DayData])
case class YearData(year:Int, months:List[MonthData])
case class RideData(name:String, numberOfOperators:Int, heavyCount:Int)
case class EmployeeData(name:String, rides:List[String])


  
def parseDay(node:Node):DayData = {
  val ride = (node  "@ride").text
  val dow = (node  "@dayOfWeek").text
  val num = (node  "@numRiders").text.toInt
  val ops = (node  "operator").map(_.text).toArray
  DayData(ride, dow, ops, num)
}


  
def dayToXML(day:DayData):Node = {
  <day ride={day.ride} dayOfWeek={day.dayOfWeek} numRiders={day.numRiders.toString}>
  {day.operators.map(op => <operator>{op}</operator>)}
  </day>
}


  
def parseMonth(node:Node):MonthData = {
  val month = (node  "@month").text.toInt
  val days = (node  "day").map(parseDay).toList
  MonthData(month, days)
}


  
def monthToXML(month:MonthData):Node = {
  <month month={month.month.toString}>
  {month.days.map(dayToXML)}
  </month>
}


  
def parseYear(node:Node):YearData = {
  val year = (node  "@year").text.toInt
  val months = (node  "month").map(parseMonth).toList
  YearData(year, months)
}


  
def yearToXML(year:YearData):Node = {
  <year year={year.year.toString}>
  {year.months.map(monthToXML)}
  </year>
}


  
def parseRideData(node:Node):RideData = {
  val name = (node  "@name").text
  val numOps = (node  "@numberOfOperators").text.toInt
  val heavy = (node  "@heavyCount").text.toInt
  RideData(name, numOps, heavy)
}


  
def rideDataToXML(rd:RideData):Node = {
  <ride name={rd.name} numberOfOperators={rd.numberOfOperators.toString}
  heavyCount={rd.heavyCount.toString}/>
}


  
def parseEmployeeData(node:Node):EmployeeData = {
  val name = (node  "@name").text
  val rides = (node  "trainedRide").map(_.text).toList
  EmployeeData(name, rides)
}


  
def employeeToXML(ed:EmployeeData):Node = {
  <employee name={ed.name}>
  {ed.rides.map(r => <trainedRide>{r}</trainedRide>)}
  </employee>
}


  
val xmlData = XML.loadFile(args(0))
var years = (xmlData  "year").map(parseYear).toList
var rideInfo = (xmlData  "ride").map(parseRideData).toList
var employeeInfo = (xmlData  "employee").map(parseEmployeeData).toList


  
def buildWeeklySchedules {
  val daysInfo = for(y <- years; m <- y.months; d <- m.days) yield d
  val days = daysInfo.map(_.dayOfWeek).distinct
  for(day <- days) {
  val thisDay = daysInfo.filter(_.dayOfWeek==day)
  val rides = thisDay.map(_.ride).distinct
  val operatorRides = rides.flatMap(ride => {
  val nums = thisDay.filter(_.ride==ride).map(_.numRiders)
  val avg = nums.sum/nums.length
  val rideData = rideInfo.find(_.name==ride).get
  Array.fill(rideData.numberOfOperators+(if(avg>=rideData.heavyCount) 1 else
   0))(ride)
  })
  val totalOps = operatorRides.length
  for(choice <- employeeInfo.combinations(totalOps)) {
  val perms = operatorRides.permutations
  var works = false
  while(!works && perms.hasNext) {
   val perm = perms.next
   if((perm,choice).zipped.forall((r,op) => op.rides.contains(r)))
   works = true
  }
  if(works) {
   println(day+" - "+choice.map(_.name).mkString(", "))
  }
  }
  }
}


  
case class RideAverage(ride:String, avNum:Double)
case class OperatorDailyData(name:String, ride:String, numRiders:Int)
case class OperatorRideAverages(name:String, rideAvs:List[RideAverage])
case class OperatorEfficiencyFactor(name:String,factor:Double)


  
def insertionSortByEfficiency(a:Array[OperatorEfficiencyFactor]) {
  for(j <- 1 until a.length) {
  var i=j-1
  val tmp=a(j)
  while(i>=0 && a(i).factor>tmp.factor) {
  a(i+1) = a(i)
  i -= 1
  }
  a(i+1) = tmp
  }
}


  
def rankEmployees(data:List[DayData]):Array[OperatorEfficiencyFactor] = {
  val rides = data.map(_.ride).distinct
  val averages = for(ride <- rides) yield {
  val days = data.filter(_.ride==ride)
  RideAverage(ride, days.map(_.numRiders).sum.toDouble/days.length)
  }
  val dataByOperator = for(day <- data; op <- day.operators) yield {
  OperatorDailyData(op, day.ride, day.numRiders)
  }
  val operators = dataByOperator.map(_.name).distinct
  val opRideAverages = for(op <- operators) yield {
  val opDays = dataByOperator.filter(_.name == op)
  val rideAvs = for(ride <- rides; if opDays.exists(_.ride==ride)) yield {
  val opRides = opDays.filter(_.ride == ride)
  RideAverage(ride, opRides.map(_.numRiders).sum.toDouble/opRides.length)
  }
  OperatorRideAverages(op, rideAvs)
  }
  val operatorFactors = (for(OperatorRideAverages(op, rideAvs) <- opRideAverages)
  yield {
  val factors = for(RideAverage(ride,av) <- rideAvs) yield {
   av/averages.filter(_.ride==ride).head.avNum
  }
  OperatorEfficiencyFactor(op,factors.sum/factors.length)
  }).toArray
  insertionSortByEfficiency(operatorFactors)
  operatorFactors
}


  
def rideInput(ri:RideData):Array[String] = {
  println(ri.name)
  println(employeeInfo.filter(_.rides.contains(ri.name)).map(_.name)
  .zipWithIndex.mkString(" "))
  readLine().split("+")
}


  
def inputDay:List[DayData] = {
  println("What day of the week is this for?")
  val dow = readLine()
  println("For each ride displayed, enter the number of riders for the day followed
  by employee numbers from this list with spaces in between.")
  for(ri <- rideInfo;
  val input = rideInput(ri)
  if input.head.toInt>=0) yield {
 DayData(ri.name, dow, input.tail.map(_.toInt).map(employeeInfo).map(_.name),
  input.head.toInt)
  }
}


  
def inputRideDayData {
  println("What month/year do you want to enter data for?")
  readLine().trim.split("/") match {
  case Array(monthText, yearText) =>
  val (month, year) = (monthText.toInt, yearText.toInt)
  if(years.exists(_.year==year)) {
     years = for(y <- years) yield {
   if(y.year==year) {
    y.copy(months = {
    if(y.months.exists(_.month==month)) {
     for(m <- y.months) yield {
     if(m.month==month) {
       m.copy(days = inputDay ::: m.days)
     }else m
     }
    } else MonthData(month, inputDay) :: y.months
    })
   } else y
   }
  } else {
   years ::= YearData(year,MonthData(month, inputDay)::Nil)
  }
  case _ =>
  println("Improper format. Needs to be numeric month followed by numeric year
   with a / between them.")
  }
}


  
def hireEmployee {
  println("What is the new employees name?")
  val name = readLine()
  employeeInfo ::= EmployeeData(name,Nil)
}


  
def trainEmployee {
  println("Which employee is training for a new ride?")
  println(employeeInfo.map(_.name).zipWithIndex.mkString(""))
  val empNum = readInt()
  employeeInfo = for((e,i) <- employeeInfo.zipWithIndex) yield {
  if(i==empNum) {
  val avail = rideInfo.map(_.name).diff(e.rides)
  println("Which rides should be added? (Enter space separated numbers.)")
  println(avail.zipWithIndex.mkString(""))
  e.copy(rides = (readLine().split("+").map(_.toInt)).map(avail).toList :::
   e.rides)
  } else e
  }
}


  
def addRide {
  println("What is the name of the new ride?")
  val name = readLine()
  println("How many operators does it need?")
  val ops = readInt()
  println("At what rider count should another operator be added?")
  val heavy = readInt()
  rideInfo ::= RideData(name, ops, heavy)
}


  
var input = 0
do {
  println("""What would you like to do?
 1) Add ridership for a day.
 2) Add an Employee.
 3) Add training to an employee.
 4) Add a ride.
 5) Get schedule options for a week.
 6) Rank Employees.
 7) Quit.""")
  input = readInt()
  input match {
  case 1 => inputRideDayData
  case 2 => hireEmployee
  case 3 => trainEmployee
  case 4 => addRide
  case 5 => buildWeeklySchedules
  case 6 =>
  println("What month/year or year do you want to rank for?")
  println(readLine().trim.split("/") match {
   case Array(monthText,yearText) =>
   val year = yearText.toInt
   val month = monthText.toInt
   val y = years.filter(_.year==year)
   if(y.isEmpty) "Year not found."
   else {
    val m = y.head.months.filter(_.month==month)
    if(m.isEmpty) "Month not found."
    else {
     rankEmployees(m.head.days).mkString("
")
    }
   }
   case Array(yearText) =>
   val year = yearText.toInt
   val y = years.filter(_.year==year)
   if(y.isEmpty) "Year not found."
   else {
    rankEmployees(y.head.months.flatMap(_.days)).mkString("
")
   }
   case _ => "Invalid input"
  })
  case _ =>
  }
} while(input!=7)


  
XML.save(args(0), <themeParkData>
  {years.map(yearToXML)}
  {rideInfo.map(rideDataToXML)}
  {employeeInfo.map(employeeToXML)}
</themeParkData>)

There is completely new code at the bottom to allow for data entry that is added to the main variables. There is also a do-while loop that handles the menu functionality. The script ends by saving the main data elements back out to the same XML file.

14.4 End of Chapter Material

14.4.1 Summary of Concepts

  • XML is a text markup language that can be used to encode arbitrary data. Being text means that it is as easy to work with as flat text files, but it allow you to attach meaning to the values in the file and the flexibility of the parsing makes it easier to extend XML files than flat text files.
  • An XML file is composed of markup and content. The content is plain text. Markup has a number of options as must follow a certain format.
    • The primary markup is tags. A tag starts with < and ends with >. Each tag has a name.
    • The combination of a matching start and end tag defines an element. An element can contain content and other elements.
    • Start tags can be given attributes to store basic information.
    • Special characters that can not appear as plain text can be specified by markup tokens that begin with a & and end with a ;.
    • You can put comments into an XML file by starting them with <!-- and ending them with -->.
    • The entire contents of an XML file must be held inside of a single element.
  • The Scala language has native support for XML. This makes it significantly easier to work with XML in Scala than in most other languages. XML elements can be written directly into Scala source code.
    • An XML file can be loaded using XML.loadFile(fileName:String):Elem.
    • The and \ operators can be used to pull things out of XML.
    • You can build XML by typing in XML literals. You can put Scala code into the XML by surrounding it with curly braces.
    • XML can be written back out to file using XML.save(fileName:String,node:Node).
    • Patterns can be made using XML with the limitation that attributes can not be part of the match. values can be bound by including names in curly braces.

14.4.2 Self-Directed Study

Enter the following statements into the REPL and see what they do. Some will produce errors. You should try to figure out why. Try some variations to make sure you understand what is going on.

scala> val xml1 = <tag>contents</tag>
scala> xml1.text
scala> val <tag>{str}</tag> = xml1
scala> val xml2 = <data type="simple">
  | <language>Scala</language>
  | <lesson>Programming is an art.</lesson>
  | <lesson>Software runs the world</lesson>
  | </data>
scala> xml2  "language"
scala> xml2  "lesson"
scala> val xml3 = <randPoints>
  | {(1 to 20).map(i => <point x={math.random.toString}
  y={math.random.toString}/>)}
  | </randPoints>
scala> (xml3  "point").map(p => {
  | val x = (p  "@x").text.toDouble
  | val y = (p  "@y").text.toDouble
  | math.sqrt(x*x+y*y)
  |})
scala> for(<point x={x} y={y}/> <- xml3) yield math.sqrt(x*x+y*y)
scala> val xml4 = <randPoints>
  | {(1 to 20).map(i => <point><x>{math.random}</x><y>{math.random}</y></point>)}
  | </randPoints>
scala> for(<point><x>{x}</x><y>{y}</y></point> <- xml4  "point") println(x+" "+y)

14.4.3 Exercises

  1. Chapter 10 had several exercises where you were supposed to design case classes for a number of different types of data. For each of those, write code to convert to and from XML.
    • A transcript for a student.
    • Realtor information for a house.
    • Data for a sports team.
  2. On the website for the book there are a number of XML data files. Write case classes to represent that data in each one and then write code to load the files and build objects from it.
  3. Pick some other case class that you have written and create code to convert it to and from XML.
  4. Find a Really Simple Syndication (RSS) feed for a website you visit and save the feed as a file. Use Scala to look through the XML.

14.4.4 Projects

  1. If you have been working on the different graphics options for earlier projects, the material in this chapter gives you a clear extension, stores your geometry data in an XML file instead of a flat text file. You can use tags like "sphere" and "plane" to give meaning to the information. Add in a "light" tag as well in anticipation of adding lighting in a future project.

    After you have the XML format set up and some data to play with, alter the code from project 7 (p.311) to use this data format.

  2. Project 3 (p.244) on the text adventure is very nicely extended with the use of XML data files. Using XML also makes it easier to extend the file to include additional information. For this project you should convert your map over to an XML format and have the code read in that format. In addition, you should add items. This will involve adding another case class for an item and putting some items into rooms in the XML data file.

    To make the items significant, you need to have it so that the case class for your room includes items in that room and the text description of the room lists what items are there. You should also give you player an inventory and implements commands for "get" and "drop". So if the player enters "get" followed by the name of an item in the room, that item will be taken out of the room and added to the players inventory. An "inv" command would be nice to let the player see what is in his/her inventory. If the player uses the "drop" command followed by an item in inventory, that item should be removed from inventory and placed in the current room. If "get" or "drop" are provided with an invalid item name, print an appropriate message for the user.

  3. If you did project 4 (p.346) extending your game, you had a text file that specified the high scores for players of a particular game. For this project you should modify you code and the text file to use XML instead of a flat file.
  4. If you have been doing the other recipe projects, you can change the format so that it uses XML to save the data. If you do this, you need to add in instructions for the recipes as well as the ability to add comments and other information like how much certain recipes are favored. You can merge what are currently separate text files into a single XML file if you want.

    The fact that your script keeps track of recipes and pantry contents points to one other piece of functionality you should be able to add in. Allow the user to see only the recipes that they have the ingredients to cook. If you have a preference level, sort them by preference.

  5. If you have been doing the scheduling options, convert the data file for courses over to XML. In doing this, you can also add the ability to include comments on courses, instructors, or other things that make sense to you, but which did not fit as well in the limited formatting of a plain text file.
  6. You can extend project 14 (p.312) on your music library by changing the format of the data file from a flat file to an XML file. Use the hierarchical nature of XML to simplify the file. You can have tags for <artist> at a higher level with <album> tags inside of those and <song> tags at the lowest level. With the XML you could add the ability for the user to insert notes about any element that will be displayed in a manner similar to the album cover when the appropriate item is selected in the GUI. Menu options could be used to allow editing of the notes.
  7. This is a continuation of project 5 (p.310) on turtle graphics to draw fractal shapes generated with L-systems. You can find a full description of L-systems in "The Algorithmic Beauty of Plants", which can be found online at http://algorithmicbotany.org/papers/#abop. The first chapter has all the material that will be used for this project and a later one.

    In the last turtle project you made it so that you could use a turtle to draw figures from Strings using the characters 'F', 'f', '+', and ''. L-systems are formal grammars that we will use to generate strings that have interesting turtle representations. An L-system is defined by an initial String and a set of productions. Each production maps a character to a String. So the production F -> F-F++F-F will cause any F in a String to be replaced by F-F++F-F. The way L-systems work is that all productions are applied at the same time to all characters. Characters that do not have productions just stay the same.

    So with this example you might start with F. After one iteration you would have F-F++F-F. After the second iteration you would have F-F++F-F-F-F++F-F++F-F++F-F-F-F++F-F. The next iteration will be about five times longer than that. The String in an L-system grow exponentially in length. As a result, you probably want to have the length of the turtle move for an F or f get exponentially shorter. Start with a good value and divide by an appropriate value for each generation. For this example dividing by a factor of 3 is ideal. This one also works best with a turn angle of 60 degrees.

    The productions for an L-system can be implemented as a List[(Char,String)]. You can use the find method on the List and combine that with flatMap to run through generations of your String. You can decide how elaborate you want your GUI to be and if users should be able to enter productions or if they will be hard coded. Look in "The Algorithmic Beauty of Plants", chapter 1 for examples of other interesting production rules.

  8. If you have done the box score options from any of the last three chapters you can extend to use XML encoding for the box score. You probably want to do this by having functionality to load in a plain text box score and add it to an XML file with all the box scores. To really take advantage of the XML formatting, you should allow the user to add comments to whatever elements you feel are appropriate.

Additional exercises and projects, along with data files, are available on the book's website.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset