Chapter 13. Collections

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 13. Collections

Topics in This Chapter A2

In this chapter, you will learn about the Scala collections library from a library user’s point of view. In addition to arrays and maps, which you have already encountered, you will see other useful collection types. There are many methods that can be applied to collections, and this chapter presents them in an orderly way.

The key points of this chapter are:

All collections extend the Iterable trait.
The three major categories of collections are sequences, sets, and maps.
Scala has mutable and immutable versions of most collections.
A Scala list is either empty, or it has a head and a tail which is again a list.
Sets are unordered collections.
Use a LinkedHashSet to retain the insertion order or a SortedSet to iterate in sorted order.
+ adds an element to an unordered collection; +: and :+ prepend or append to a sequence; ++ concatenates two collections; - and -- remove elements.
The Iterable and Seq traits have dozens of useful methods for common operations. Check them out before writing tedious loops.
Mapping, folding, and zipping are useful techniques for applying a function or operation to the elements of a collection.
You can consider an Option to be a collection of size 0 or 1.
A lazy list is evaluated on demand and can hold an unbounded number of elements.

13.1 The Main Collections Traits

Figure 13–1 shows the most important traits that make up the Scala collections hierarchy.

Images — **Figure 13–1** Key traits in the Scala collections hierarchy

An Iterable is any collection that can yield an Iterator with which you can access all elements in the collection:

Click here to view code image


val coll = ... // some Iterable
val iter = coll.iterator
while iter.hasNext do
  process(iter.next())

In a Seq, the elements are ordered, and the iterator visits the elements in order. An IndexedSeq allows fast random access through an integer index. For example, an ArrayBuffer is indexed but a linked list is not. A LinearSeq allows fast head and tail operations. A Buffer allows fast insertion and removal at the end. An ArrayBuffer is both a Buffer and, as already mentioned, an IndexedSeq.

A Set is an unordered collection of values. In a SortedSet, elements are always visited in sorted order.

A Map is a set of (key, value) pairs. A SortedMap visits the entries as sorted by the keys. See Chapter 4 for more information.

This hierarchy is similar to that in Java, with a couple of welcome improvements:

Maps are a part of the hierarchy and not a separate hierarchy.
IndexedSeq is the supertype of arrays but not of lists, allowing you to tell the two apart.

Note

In Java, both ArrayList and LinkedList implement a common List interface, making it difficult to write efficient code when random access is preferred, for example when searching in a sorted sequence. This was a flawed design decision in the original Java collections framework. In a later version, a marker interface RandomAccess was added to deal with this problem.

Each Scala collection trait has a companion object with an apply method for constructing an instance of the collection. For example,

Click here to view code image


Iterable(0xFF, 0xFF00, 0xFF0000)
Set(Color.RED, Color.GREEN, Color.BLUE)
Map(Color.RED -> 0xFF0000, Color.GREEN -> 0xFF00, Color.BLUE -> 0xFF)
SortedSet("Hello", "World")

This is called the “uniform creation principle.”

There are methods toSeq, toSet, toMap, and so on, as well as a generic to method, that you can use to translate between collection types.

Click here to view code image


val coll = Seq(1, 1, 2, 3, 5, 8, 13)
val set = coll.toSet
val buffer = coll.to(ArrayBuffer)

Note

You can use the == operator to compare any sequence, set, or map with another collection of the same kind. For example, Seq(1, 2, 3) == (1 to 3) yields true. But comparing different kinds, for example Seq(1, 2, 3) == Set(1, 2, 3), is a compilation error. In that case, use the sameElements method.

13.2 Mutable and Immutable Collections

Scala supports both mutable and immutable collections. An immutable collection can never change, so you can safely share a reference to it, even in a multithreaded program. For example, there is a scala.collection.mutable.Map and a scala.collection.immutable.Map. Both have a common supertype scala.collection.Map (which, of course, contains no mutation operations).

Note

When you have a reference to a scala.collection.immutable.Map, you know that nobody can change the map. If you have a scala.collection.Map, then you can’t change it, but someone else might.

Scala gives a preference to immutable collections. The scala package and the Predef object, which are always imported, have type aliases Seq, IndexedSeq, List, Set, and Map that refer to the immutable traits. For example, Predef.Map is the same as scala.collection.immutable.Map.

Tip

With the statement

Click here to view code image

import scala.collection.mutable

you can get an immutable map as Map and a mutable one as mutable.Map.

If you had no prior experience with immutable collections, you may wonder how you can do useful work with them. The key is that you can create new collections out of old ones. For example, if numbers is an immutable set, then

numbers + 9

is a new set containing the numbers together with 9. If 9 was already in the set, you just get a reference to the old set. This is particularly natural in recursive computations. For example, here we compute the set of all digits of an integer:

Click here to view code image

def digits(n: Int): Set[Int] =
  if n < 0 then digits(-n)
  else if n < 10 then Set(n)
  else digits(n / 10) + (n % 10)

This method starts out with a set containing a single digit. At each step, another digit is added. However, adding the digit doesn’t mutate a set. Instead, in each step, a new set is constructed.

Note

In addition to the scala.collection.immutable and scala.collection.mutable packages, there is a scala.collection.concurrent package. It has a Map trait with methods for atomic processing: putIfAbsent, remove, replace, getOrElseUpdate, updateWith. A TrieMap implementation is provided. You can also adapt a Java ConcurrentHashMap to this trait—see Section 13.13, “Interoperability with Java Collections,” on page 204.

13.3 Sequences

Figure 13–2 shows the most important immutable sequences.

A Vector is the immutable equivalent of an ArrayBuffer: an indexed sequence with fast random access. Vectors are implemented as trees where each node has up to 32 children. For a vector with one million elements, one needs four layers of nodes. (Since 10³ ≈ 2¹⁰, 10⁶ ≈ 32⁴.) Accessing an element in such a list will take 4 hops, whereas in a linked list it would take an average of 500,000.

A Range represents an integer sequence, such as 0,1,2,3,4,5,6,7,8,9 or 10,20,30. Of course a Range object doesn’t store all sequence values but only the start, end, and increment. You construct Range objects with the to and until methods, as described in Chapter 2.

We discuss lists in the next section, and lazy lists in Section 13.12, “Lazy Lists,” on page 202.

See Figure 13–3 for the most useful mutable sequences.

We discussed array buffers in Chapter 3. Stacks, queues, and priority queues are standard data structures that are useful for implementing certain algorithms. If you are familiar with these structures, the Scala implementations won’t surprise you.

13.4 Lists

In Scala, a list is either Nil (that is, empty) or an object with a head element and a tail that is again a list. For example, consider the list

val digits = List(4, 2)

The value of digits.head is 4, and digits.tail is List(2). Moreover, digits.tail.head is 2 and digits.tail.tail is Nil.

The :: operator makes a new list from a given head and tail. For example,

9 :: List(4, 2)

is List(9, 4, 2). You can also write that list as

9 :: 4 :: 2 :: Nil

Note that :: is right-associative. With the :: operator, lists are constructed from the end:

9 :: (4 :: (2 :: Nil))

Note

There is a similar right-associative operator to build tuples:

Click here to view code image

1 *: 2.0 *: "three" *: EmptyTuple

yields the tuple (1, 2.0, "three").

In Java or C++, one uses an iterator to traverse a linked list. You can do this in Scala as well, but it is often more natural to use recursion. For example, the following function computes the sum of all elements in a linked list of integers:

Click here to view code image

def sum(lst: List[Int]): Int =
  if lst == Nil then 0 else lst.head + sum(lst.tail)

Or, if you prefer, you can use pattern matching:

Click here to view code image

def sum(lst: List[Int]): Int = lst match
  case Nil => 0
  case h :: t => h + sum(t) // h is lst.head, t is lst.tail

Note the :: operator in the second pattern. It “destructures” the list into head and tail.

Note

Recursion works so naturally because the tail of a list is again a list.

Of course, for this particular example, you do not need to use recursion at all. The Scala library already has a sum method:

Click here to view code image

List(9, 4, 2).sum // Yields 15

If you want to mutate list elements in place, you can use a ListBuffer, a data structure that is backed by a linked list with references to the first and last node. This makes it efficient to add or remove elements at either end of the list.

However, adding or removing elements in the middle is not efficient. For example, suppose you want to remove every second element of a mutable list. With a Java LinkedList, you use an iterator and call remove after every second call to next. There is no analogous operation on a ListBuffer. Of course, removing multiple elements by their index positions is very inefficient in a linked list. Your best bet is to generate a new list with the result (see Exercise 3 on page 206).

13.5 Sets

A set is a collection of distinct elements. Trying to add an existing element has no effect. For example,

Set(2, 0, 1) + 1

is the same as Set(2, 0, 1).

Unlike lists, sets do not retain the order in which elements are inserted. By default, sets are implemented as hash sets in which elements are organized by the value of the hashCode method. (In Scala, as in Java, every object has a hashCode method.)

For example, if you iterate over

Set(1, 2, 3, 4, 5, 6)

the elements are visited in the order

5 1 6 2 3 4

You may wonder why sets don’t retain the element order. It turns out that you can find elements much faster if you allow sets to reorder their elements. Finding an element in a hash set is much faster than in an array or list.

A linked hash set remembers the order in which elements were inserted. It keeps a linked list for this purpose. For example,

Click here to view code image

val weekdays = scala.collection.mutable.LinkedHashSet("Mo", "Tu", "We", "Th", "Fr")

If you want to iterate over elements in sorted order, use a sorted set:

Click here to view code image

val numbers = scala.collection.mutable.SortedSet(1, 2, 3, 4, 5, 6)

A bit set is an implementation of a set of non-negative integers as a sequence of bits. The ith bit is 1 if i is present in the set. This is an efficient implementation as long as the maximum element is not too large. Scala provides both mutable and immutable BitSet classes.

The contains method checks whether a set contains a given value. The subsetOf method checks whether all elements of a set are contained in another set.

Click here to view code image

val digits = Set(1, 7, 2, 9)
digits.contains(0) // false
Set(1, 2).subsetOf(digits) // true

The union, intersect, and diff methods carry out the usual set operations. If you prefer, you can write them as |, &, and &~. You can also write union as ++ and difference as --. For example, if we have the set

val primes = Set(2, 3, 5, 7)

then digits.union(primes) is Set(1, 2, 3, 5, 7, 9), digits & primes is Set(2, 7), and digits -- primes is Set(1, 9).

13.6 Operators for Adding or Removing Elements

When you want to add or remove an element, or a number of elements, the operators to use depend on the collection type. Table 13–1 provides a summary.

Table 13–1 Operators for Adding and Removing Elements

Operator	Description	Collection type
`coll :+ elem` `elem +: coll`	A collection of the same type as `coll` to which `elem` has been appended or prepended.	`Seq`
`coll + elem` `coll - elem`	A collection of the same type as `coll` with the given element added or removed.	Immutable `Set`, `Map`
`coll ++ coll2`	A collection of the same type as `coll`, containing the elements of both collections.	`Iterable`
`coll -- coll2`	A collection of the same type as `coll` from which the elements of `coll2` have been removed. (For sequences, use `diff`.)	Immutable `Set`, `Map`
`elem :: lst` `lst2 ::: lst`	A list with the element or given list prepended to `lst`. Same as `+:` and `++:`.	`List`
`set \| set2` `set & set2` `set &~ set2`	Set union, intersection, difference. `\|` is the same as `++`, and `&~` is the same as `--`.	`Set`
`coll += elem` `coll ++= coll2` `coll -= elem` `coll --= coll2`	Modifies `coll` by adding or removing the given elements.	Mutable collections
`elem +=: coll` `coll2 ++=: coll`	Modifies `coll` by prepending the given element or collection, alias for `prepend`.	`Buffer`

Note that + is used for adding an element to an unordered immutable collection, while +: and :+ add an element to the beginning or end of an ordered collection.

Click here to view code image

Vector(1, 2, 3) :+ 5 // Yields Vector(1, 2, 3, 5)
0 +: 1 +: Vector(1, 2, 3) // Yields Vector(0, 1, 1, 2, 3)

Note that +:, like all operators ending in a colon, is right-associative, and that it is a method of the right operand.

These operators return new collections (of the same type as the original ones) without modifying the original. Mutable collections have a += operator that mutates the left-hand side. For example,

Click here to view code image

val numberBuffer = ArrayBuffer(1, 2, 3)
numberBuffer += 5 // Adds 5 to numberBuffer

With an immutable collection, you can use += or :+= with a var, like this:

Click here to view code image

var numberSet = Set(1, 2, 3)
numberSet += 5 // Sets numberSet to the immutable set numberSet + 5
var numberVector = Vector(1, 2, 3)
numberVector :+= 5 // += does not work since vectors don’t have a + operator

To remove an element, use the - operator:

Click here to view code image

Set(1, 2, 3) - 2 // Yields Set(1, 3)

You can add multiple elements with the ++ operator:

coll ++ coll2

yields a collection of the same type as coll that contains both coll and coll2. Similarly, the -- operator removes multiple elements.

Note

For lists, you can use +: instead of :: for consistency, with one exception: Pattern matching (case h::t) does not work with the +: operator.

CAUTION

The + and - operators are deprecated for mutable collections, as well as collections of unknown mutability (such as scala.collection.Set). They don’t mutate the collection but compute new collections with elements added or removed. If you really want to do this with a (potentially) mutable collection, use the ++ and -- operators.

CAUTION

The operators coll += (e1, e2, ...) and coll -= (e1, e2, ...) with multiple arguments are deprecated, since infix operators with more than two arguments are now discouraged.

As you can see, Scala provides many operators for adding and removing elements. Here is a cheat sheet:

Append (:+) or prepend (+:) to a sequence.
Add (+) to an unordered collection.
Remove with the - operator.
Use ++ and -- for bulk add and remove.
Mutations are += ++= -= --=.
For lists, many Scala programmers prefer the :: and ::: operators to +: and ++.
Stay away from deprecated += (e1, e2, ...), -= (e1, e2, ...), ++: and enigmatic +=:, ++=:.

13.7 Common Methods

Table 13–2 gives a brief overview of the most important methods of the Iterable trait, sorted by functionality.

Table 13–2 Important Methods of the Iterable Trait

Methods	Description
`head`, `last`, `headOption`, `lastOption`	Returns the first or last element; or, the first or last element as an `Option`.
`tail`, `init`	Returns everything but the first or last element.
`length`, `isEmpty`	Returns the length, or `true` if the length is zero.
`map(f)`, `flatMap(f)`, `foreach(f)`, `mapInPlace(f)collect(pf)`	Applies a function to all elements; see Section 13.8.
`reduceLeft(op)`, `reduceRight(op)`, `foldLeft(init)(op)`, `foldRight(init)(op)`	Applies a binary operation to all elements in a given order; see Section 13.9.
`reduce(op)`, `fold(init)(op)`, `aggregate(init)(op, combineOp)`	Applies a binary operation to all elements in arbitrary order.
`sum`, `product`, `max`, `min`, `maxBy(f)`, `minBy(f)`, `maxOption`, `minOption`, `maxByOption(f)`, `minByOption(f)`	Returns the sum or product (provided the element type can be implicitly converted to the `Numeric` trait), or the maximum or minimum. The `max` and `min` functions require that the element type has an `Ordering`. The `maxBy` and `minBy` functions measure the elements by applying the function `f`. The methods ending in `Option` return an `Option` so that they can be safely applied to empty collections.
`count(pred)`, `forall(pred)`, `exists(pred)`	Returns the count of elements fulfilling the predicate; `true` if all elements do, or at least one element does.
`filter(pred)`, `filterNot(pred)`, `partition(pred)`	Returns all elements fulfilling or not fulfilling the predicate; the pair of both.
`takeWhile(pred)`, `dropWhile(pred)`, `span(pred)`	Returns the first elements fulfilling `pred`; all but those elements; the pair of both.
`take(n)`, `drop(n)`, `splitAt(n)`	Returns the first `n` elements; everything but the first `n` elements; the pair of both.
`takeRight(n)`, `dropRight(n)`	Returns the last `n` elements; everything but the last `n` elements.
`slice(from, to)`, `view(from, to)`	Returns the elements in the range `from` until `to`, or a view thereto.
`zip(coll2)`, `zipAll(coll2, fill, fill2)`, `lazyZip(coll2)`, `zipWithIndex`	Returns pairs of elements from this collection and another; see Section 13.10.
`grouped(n)`, `sliding(n)`	Returns iterators of subcollections of length `n`; `grouped` yields elements with index `0 until n`, then with index `n until 2 * n`, and so on; `sliding` yields elements with index `0 until n`, then with index `1 until n + 1`, and so on.
`groupBy(k)`, `groupMap(k, m)`, `groupMapReduce(k, m, r)`	The `groupBy` method yields a map whose keys are `k(x)` for all elements `x`. The value for each key is the collection of elements with that key. The `groupMap` method further applies the function `m` to each value, and `groupMapReduce` applies `m` and then reduces the collection with `r`.
`mkString(before, between, after)`, `addString(sb, before, between, after)`	Makes a string of all elements, adding the given strings before the first, between each, and after the last element. The second method appends that string to a string builder.
`toIterable`, `toSeq`, `toIndexedSeq`, `toArray`, `toBuffer`, `toList`, `toSet`, `toVector`, `toMap`, `to`	Converts the collection to a collection of the specified type.

The Seq trait adds several methods to the Iterable trait. Table 13–3 shows the most important ones.

Table 13–3 Important Methods of the Seq Trait

Methods	Description
`coll(k)` (i.e., `coll.apply(k)`)	The `k`th sequence element.
`contains(elem)`, `containsSlice(seq)`, `startsWith(seq)`, `endsWith(seq)`	Returns `true` if this sequence contains the given element or sequence; if it starts or ends with the given sequence.
`indexOf(elem)`, `lastIndexOf(elem)`, `indexOfSlice(seq)`, `lastIndexOfSlice(seq)`	Returns the index of the first or last occurrence of the given element or element sequence.
`indexWhere(pred)`	Returns the index of the first element fulfilling `pred`.
`prefixLength(pred)`, `segmentLength(pred, n)`	Returns the length of the longest sequence of elements fulfilling `pred`, starting with `0` or `n`.
`padTo(n, fill)`	Returns a copy of this sequence, with `fill` appended until the length is `n`.
`intersect(seq)`, `diff(seq)`	Returns the “multiset” intersection or difference of the sequences. For example, if `a` contains five 1s and `b` contains two, then `a intersect b` contains two (the smaller count), and `a diff b` contains three (the difference).
`reverse`	The reverse of this sequence.
`sorted`, `sortWith(less)`, `sortBy(f)`	The sequence sorted using the element ordering, the binary `less` function, or a function `f` that maps each element to an ordered type.
`permutations`, `combinations(n)`	Returns an iterator over all permutations or combinations (subsequences of length `n`).

Note

These methods never mutate a collection. If their result is a collection, the type is of the same type as the original, or as close as possible to it. (With types such as Range and BitSet, it can happen that the result is a more general sequence or set.) This is sometimes called the “uniform return type” principle.

13.8 Mapping a Function

You often want to transform all elements of a collection. The map method applies a function to a collection and yields a collection of the results. For example, given a list of strings

Click here to view code image

val names = List("Peter", "Paul", "Mary")

you get a list of the uppercased strings as

Click here to view code image

names.map(_.toUpperCase) // List("PETER", "PAUL", "MARY")

This is exactly the same as

Click here to view code image

n <- names yield n.toUpperCase

If the function yields a collection instead of a single value, you may want to concatenate all results. In that case, use flatMap. For example, consider

Click here to view code image

def ulcase(s: String) = Vector(s.toUpperCase, s.toLowerCase)

Then names.map(ulcase) is

Click here to view code image

List(Vector("PETER", "peter"), Vector("PAUL", "paul"), Vector("MARY", "mary"))

but names.flatMap(ulcase) is

Click here to view code image

List("PETER", "peter", "PAUL", "paul", "MARY", "mary")

Tip

If you use flatMap with a function that returns an Option, the resulting collection contains all values v for which the function returns Some(v).

For example, given a list of keys and a map, here is a list of the matching values that are actually present:

Click here to view code image

val scores = Map("Alice" -> 10, "Bob" -> 3, "Cindy" -> 8)
val keys = Array("Alice", "Cindy", "Eloïse")
keys.flatMap(k => scores.get(k)) // Array(10, 8)

Note

The map and flatMap methods are important because they are used for translating for expressions. For example, the expression

for i <- 1 to 10 yield i * i

is translated to

(1 to 10).map(i => i * i)

and

Click here to view code image

for i <- 1 to 10; j <- 1 to i yield i * j

becomes

Click here to view code image

(1 to 10).flatMap(i => (1 to i).map(j => i * j))

Why flatMap? See Exercise 9 on page 206.

The mapInPlace method is the in-place equivalent of map. It applies to mutable collections, and replaces each element with the result of a function. For example, the following code changes all buffer elements to uppercase:

Click here to view code image

val buffer = ArrayBuffer("Peter", "Paul", "Mary")
buffer.mapInPlace(_.toUpperCase)

If you just want to apply a function for its side effect and don’t care about the function values, use foreach:

names.foreach(println)

The collect method works with partial functions—functions that may not be defined for all inputs. It yields a collection of all function values of the arguments on which it is defined. For example,

Click here to view code image

"-3+4".collect({ case '+' => 1 ; case '-' => -1 }) // Vector(-1, 1)

The groupBy method yields a map whose keys are the function values, and whose values are the collections of elements whose function value is the given key. For example,

Click here to view code image

val words = ...
val map = words.groupBy(_.substring(0, 1).toUpperCase)

builds a map that maps "A" to all words starting with A, and so on.

13.9 Reducing, Folding, and Scanning (A3)

The map method applies a unary function to all elements of a collection. The methods that we discuss in this section combine elements with a binary function. The call c.reduceLeft(op) applies op to successive elements, like this:

            .
            .
           .
          op
         /  
        op   coll(3)
       /  
      op   coll(2)
     /  
coll(0)  coll(1)

For example,

List(1, 7, 2, 9).reduceLeft(_ - _)

Click here to view code image

((1 - 7) - 2) - 9 = 1 - 7 - 2 - 9 = -17

The reduceRight method does the same, but it starts at the end of the collection. For example,

Click here to view code image

List(1, 7, 2, 9).reduceRight(_ - _)

Click here to view code image

1 - (7 - (2 - 9)) = 1 - 7 + 2 - 9 = -13

Note

Reducing assumes that the collection has at least one element. To avoid an exception for potentially empty collections, use reduceLeftOption and reduceRightOption.

Often, it is useful to start the computation with an initial element other than the initial element of a collection. The call coll.foldLeft(init)(op) computes

          .
          .
         .
        op
       /  
      op   coll(2)
     /  
    op   coll(1)
   /  
init   coll(0)

For example,

Click here to view code image

List(1, 7, 2, 9).foldLeft(0)(_ - _)

0 - 1 - 7 - 2 - 9 = -19

Note

The initial value and the operator are separate “curried” parameters so that Scala can use the type of the initial value for type inference in the operator. For example, in List(1, 7, 2, 9).foldLeft("")(_ + _), the initial value is a string, so the operator must be a function (String, Int) => String.

There is a foldRight variant as well, computing

     .
       .
        .
        op
       /  
coll(n-3)  op
          /  
   coll(n-2)  op
             /  
      coll(n-1)  init

These examples don’t seem to be very useful. Of course, coll.reduceLeft(_ + _) or coll.foldLeft(0)(_ + _) computes the sum, but you can get that directly with coll.sum.

Folding is sometimes attractive as a replacement for a loop. Suppose, for example, we want to count the frequencies of the letters in a string. One way is to visit each letter and update a mutable map.

Click here to view code image

val freq = scala.collection.mutable.Map[Char, Int]()
for c <- "Mississippi" freq(c) = freq.getOrElse(c, 0) + 1
// Now freq is Map('i' -> 4, 'M' -> 1, 's' -> 4, 'p' -> 2)

Here is another way of thinking about this process. At each step, combine the frequency map and the newly encountered letter, yielding a new frequency map. That’s a fold:

               .
               .
              .
             op
            /  
           op   's'
          /  
         op   'i'
        /  
empty map  'M'

What is op? The left operand is the partially filled map, and the right operand is the new letter. The result is the augmented map. It becomes the input to the next call to op, and at the end, the result is a map with all counts. The code is

Click here to view code image

"Mississippi".foldLeft(Map[Char, Int]())((m, c) => m + (c -> (m.getOrElse(c, 0) + 1)))

Note that this is an immutable map. We compute a new map at each step.

Note

It is possible to replace any while loop with a fold. Build a data structure that combines all variables updated in the loop, and define an operation that implements one step through the loop. I am not saying that this is always a good idea, but you may find it interesting that loops and mutations can be eliminated in this way.

Finally, the scanLeft and scanRight methods combine folding and mapping. You get a collection of all intermediate results. For example,

(1 to 10).scanLeft(0)(_ + _)

yields all partial sums:

Click here to view code image

Vector(0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55)

13.10 Zipping

The methods of the preceding section apply an operation to adjacent elements in the same collection. Sometimes, you have two collections, and you want to combine corresponding elements. For example, suppose you have a list of product prices and corresponding quantities:

Click here to view code image

val prices = List(5.0, 20.0, 9.95)
val quantities = List(10, 2, 1)

The zip method lets you combine them into a list of pairs. For example,

prices.zip(quantities)

is a List[(Double, Int)]:

Click here to view code image

List[(Double, Int)] = List((5.0, 10), (20.0, 2), (9.95, 1))

The method is called “zip” because it combines the two collections like the teeth of a zipper.

Now it is easy to apply a function to each pair.

Click here to view code image

prices.zip(quantities).map(_ * _)

It is perhaps surprising that you can provide a function of type (Double, Int) => Double when the collection contains (Double, Int) pairs. Exercise 7 on page 206 explores how this “tupling” works.

The result is a list of prices:

List(50.0, 40.0, 9.95)

The total price of all items is then

Click here to view code image

prices.zip(quantities).map(_ * _).sum

If one collection is shorter than the other, the result has as many pairs as the shorter collection. For example,

Click here to view code image

List(5.0, 20.0, 9.95).zip(List(10, 2))

List((5.0, 10), (20.0, 2))

The zipAll method lets you specify defaults for the shorter list:

Click here to view code image

List(5.0, 20.0, 9.95).zipAll(List(10, 2), 0.0, 1)

Click here to view code image

List((5.0, 10), (20.0, 2), (9.95, 1))

The zipWithIndex method returns a list of pairs where the second component is the index of each element. For example,

"Scala".zipWithIndex

Click here to view code image

Vector(('S', 0), ('c', 1), ('a', 2), ('l', 3), ('a', 4))

This can be useful if you want to compute the index of an element with a certain property. For example,

"Scala".zipWithIndex.max

is ('l', 3), giving you both the maximum and the position where it occurs.

13.11 Iterators

You can obtain an iterator from a collection with the iterator method. This isn’t as common as in Java or C++ because you can usually get what you need more easily with one of the methods from the preceding sections.

However, iterators are useful for collections that are expensive to construct fully. For example, Source.fromFile yields an iterator because it might not be efficient to read an entire file into memory. There are a few Iterable methods that yield an iterator, such as permutations, grouped, or sliding.

When you have an iterator, you can iterate over the elements with the next and hasNext methods and stop when you have seen enough:

Click here to view code image

val iter = ... // some Iterator
var done = false
while !done && iter.hasNext do
  done = process(iter.next())

Be aware that the iterator operations move the iterator. There is no way to reset it to the start of the iteration.

If the stopping condition is simple, you can avoid the loop. The Iterator class has many methods that work identically to the methods on collections. In particular, all Iterable methods listed in Section 13.7, “Common Methods,” on page 193 are available, except for head, headOption, last, lastOption, tail, init, takeRight, and dropRight. After calling a method such as map, filter, count, sum, or even length, the iterator is at the end of the collection, and you can’t use it again. With other methods, such as find or take, the iterator is past the found element or the taken ones.

Click here to view code image

val result = iter.take(500).toList
val result2 = iter.takeWhile(isNice).toList
// isNice is some function returning Boolean.

Sometimes, you want to be able to look at the next element before deciding whether to consume it. In that case, use the buffered method to turn an Iterator into a BufferedIterator. The head method yields the next element without advancing the iterator.

Click here to view code image

val iter = scala.io.Source.fromFile(filename).buffered
while iter.hasNext && iter.head == '#' do
  while iter.hasNext && iter.head != '
' do
    iter.next
// Now iter points to the first line not starting with #

Tip

If you find it too tedious to work with an iterator, and it doesn’t produce a huge number of elements, just dump the elements into a collection by calling toSeq, toArray, toBuffer, toSet, or toMap to copy the values into a collection.

13.12 Lazy Lists (A3)

In the preceding section, you saw that an iterator is a “lazy” alternative to a collection. You get the elements as you need them. If you don’t need any more elements, you don’t pay for the expense of computing the remaining ones.

However, iterators are fragile. Each call to next mutates the iterator. Lazy lists offer an immutable alternative. A lazy list is an immutable list in which the elements are computed lazily—that is, only when you ask for them.

Here is a typical example:

Click here to view code image

def numsFrom(n: BigInt): LazyList[BigInt] = { log(n) ; n } #:: numsFrom(n + 1)

The #:: operator is like the :: operator for lists, but it constructs a lazy list.

You couldn’t do this with a regular list. You would get a stack overflow when the numsFrom function calls itself recursively. With a lazy list, the expressions to the left and right of #:: are only executed when needed.

When you call

val tenOrMore = numsFrom(10)

you get a lazy list that is displayed as

LazyList(<not computed>)

The elements are unevaluated, and the log shows nothing. If you call

tenOrMore.tail.tail.tail.head

the result is 13 and the log shows

10 11 12 13

The lazy list caches the evaluated head values. If you call

tenOrMore.tail.tail.tail.head

one more time, you get the result without another call to log.

When you apply a method to a lazy list that yields another list, such as map or filter, the result is also lazy. For example,

Click here to view code image

val squares = numsFrom(1).map(x => x * x)

yields

LazyList(<not computed>)

You have to call squares.head to force evaluation of the first entry.

If you want to get more than one answer, you can invoke take followed by force, which forces evaluation of all values. For example,

squares.take(5).force

produces LazyList(1, 4, 9, 16, 25).

Of course, you don’t want to call

squares.force // No!

That call would attempt to evaluate all members of an infinite list, eventually causing an OutOfMemoryError.

You can construct a lazy list from an iterator. For example, the Source.getLines method returns an Iterator[String]. With that iterator, you can only visit the lines once. A lazy list caches the visited lines so you can revisit them:

Click here to view code image

val words = Source.fromFile("/usr/share/dict/words").getLines.to(LazyList)

You just saw that methods such as map and filter are computed on demand when applied to a lazy list. You can get a similar effect with other collections by applying the view method. For example,

Click here to view code image

val palindromicSquares = (1 to 1000000).view
  .map(x => x * x)
  .filter(x => x.toString == x.toString.reverse)

yields a collection that is unevaluated. When you call

Click here to view code image

palindromicSquares.take(10).mkString(",")

then enough squares are generated until ten palindromes have been found, and then the computation stops.

Unlike lazy lists, views do not cache any values. If you call palindromicSquares.take(10).mkString(",") again, the computation starts over.

13.13 Interoperability with Java Collections

At times you may need to use a Java collection, and you will likely miss the rich set of methods that you get with Scala collections. Conversely, you may want to build up a Scala collection and then pass it to Java code. The CollectionConverters object provides a set of conversions between Scala and Java collections.

For example,

Click here to view code image

import scala.jdk.CollectionConverters.*

Table 13–4 shows the conversions from Scala to Java collections.

Table 13–4 Conversions between Scala Collections and Java Collections

Conversion function	Type in `scala.collection`	Type (generally in `java.util`)
`asJava`/`asScala`	`Iterable`	`java.lang.Iterable`
`asJavaCollection`/`asScala`	`Iterable`	`Collection`
`asJava`/`asScala`	`Iterator`	`Iterator`
`asJavaEnumeration`/`asScala`	`Iterator`	`Enumeration`
`asJava`/`asScala`	`mutable.Buffer`	`List`
`asJava`	`Seq`, `mutable.Seq`	`List`
`asJava`/`asScala`	`mutable.Set`	`Set`
`asJava`	`Set`	`Set`
`asJava`/`asScala`	`mutable.Map`	`Map`
`asJava`	`Map`	`Map`
`asJavaDictionary`/`asScala`	`Map`	`Dictionary`
`asJava`/`asScala`	`concurrent.Map`	`concurrent.ConcurrentMap`

Note that the conversions yield wrappers that let you use the target interface to access the original type. For example, if you use

Click here to view code image

val props = System.getProperties.asScala

then props is a wrapper whose methods call the methods of the underlying Java object. If you call

Click here to view code image

props("com.horstmann.scala") = "impatient"

then the wrapper calls put("com.horstmann.scala", "impatient") on the underlying Properties object.

To convert from a Scala collection to a Java stream (sequential or parallel), start with the statement

Click here to view code image

import scala.jdk.StreamConverters.*

Then use one of the following methods:

asJavaSeqStream, asJavaParStream for sequences, maps, or strings
asJavaParKeyStream, asJavaParValueStream for maps
asJavaSeqCodePointStream, asJavaParCodePointStream for code points of strings
asJavaSeqCharStream, asJavaParCharStream for UTF-16 code units of strings

If the element type is a primitive type, the resulting Java stream is a DoubleStream, IntStream, or LongStream.

To convert from a Java stream to a Scala collection, use the toScala method and pass the collection type.

Click here to view code image

val lineBuffer = Files.lines(Path.of(filename)).toScala(Buffer)

Exercises

1. Write a function that, given a string, produces a map of the indexes of all characters. For example, indexes("Mississippi") should return a map associating 'M' with the set {0}, 'i' with the set {1, 4, 7, 10}, and so on. Use a mutable map of characters to mutable sets. How can you ensure that the set is sorted?

2. Repeat the preceding exercise, using an immutable map of characters to lists.

3. Write a function that removes every second element from a ListBuffer. Try it two ways. Call remove(i) for all even i starting at the end of the list. Copy every second element to a new list. Compare the performance.

4. Write a function that receives a collection of strings and a map from strings to integers. Return a collection of integers that are values of the map corresponding to one of the strings in the collection. For example, given Array("Tom", "Fred", "Harry") and Map("Tom" -> 3, "Dick" -> 4, "Harry" -> 5), return Array(3, 5). Hint: Use flatMap to combine the Option values returned by get.

5. Implement a function that works just like mkString, using reduceLeft.

6. Given a list of integers lst, what is lst.foldRight(List[Int]())(_ :: _)? lst.foldLeft(List[Int]())(_ :+ _)? How can you modify one of them to reverse the list?

7. In Section 13.10, “Zipping,” on page 200, it is not obvious why the expression prices.zip(quantities).map(_ * _) works. Shouldn’t that be prices.zip(quantities).map(t => t(0) * t(1))? After all, the parameter of map is a function that takes a single parameter. Verify this by passing a function

((Double, Int)) => Double

The Scala compiler converts the expression _ * _ to a function with a single tuple parameter. Explore in which contexts this happens. Does it work for function literals with explicit parameters (x: Double, y: Int) => x * y? Can you assign a literal to a variable of type ((Double, Int)) => Double?

Conversely, what happens when you pass a variable f of type (Double, Int) => Double, that is, prices.zip(quantities).map(f)? How can you fix that? (Hint: tupled.) Does tupled work with function literals?

8. Write a function that turns an array of Double values into a two-dimensional array. Provide a parameter for the number of columns. For example, with Array(1, 2, 3, 4, 5, 6) and three columns, return Array(Array(1, 2, 3), Array(4, 5, 6)). Use the grouped method.

9. The Scala compiler transforms a for/yield expression

Click here to view code image

for i <- 1 to 10; j <- 1 to i yield i * j

to invocations of flatMap and map, like this:

Click here to view code image

(1 to 10).flatMap(i => (1 to i).map(j => i * j))

Explain the use of flatMap. Hint: What is (1 to i).map(j => i * j) when i is 1, 2, 3?

What happens when there are three generators in the for/yield expression?

10. Write a function that computes the sum of the non-None values in a List[Option[Int]]. Hint: flatten.

11. The method java.util.TimeZone.getAvailableIDs yields time zones such as Africa/Cairo and Asia/Chungking. Which continent has the most time zones? Hint: groupBy.

12. Produce a lazy list of random numbers. Hint: continually.

13. A fixed point of a function f is an argument x such that f(x) == x. Sometimes, it is possible to find fixed points by starting with a value a and then computing f(a), f(f(a)), f(f(f(a))), and so on, until the sequence converges to a fixed point x. I happen to know this from personal experience. Bored in math class, I kept hitting the cosine key of my calculator in radian mode, and pretty soon got a value for which cos(x) == x.

Produce a lazy list of iterated function applications and search for identical consecutive values. Hint: sliding(2).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 13. Collections

Create new playlist

Sign In

Sign Up

Chapter 13. Collections

13.1 The Main Collections Traits

13.2 Mutable and Immutable Collections

13.3 Sequences

13.4 Lists

13.5 Sets

13.6 Operators for Adding or Removing Elements

13.7 Common Methods

13.8 Mapping a Function

13.9 Reducing, Folding, and Scanning (A3)

13.10 Zipping

13.11 Iterators

13.12 Lazy Lists (A3)

13.13 Interoperability with Java Collections

Exercises

Table of Contents for
Chapter 13. Collections