Chapter 6. Sequences

One of the key features of the Lisp family of languages is their orientation around lists—not just as data structures, but as a structural metaphor for algorithms and execution flow. Recursive algorithms, for example, can be very cleanly structured around lists in Lisp variants.

Unfortunately, in most Lisps, this metaphor is tightly bound to the actual implementation of a singly-linked list, which has performance characteristics that make it unsuitable for many purposes.

To resolve this problem, Clojure introduced a new abstraction around the concept of a list, called a sequence, which is shared by ClojureScript. A sequence is a logical list, similar to those in most Lisps, with a well-defined set of operations. However, Clojure sequences are not a concrete type, but rather an abstract contract that may be satisfied concretely by a variety of different types of objects. All of ClojureScript’s collections, as well as many other types of logical collections in JavaScript, can be used as sequences. This allows ClojureScript code to be constructed in an idiomatic list-based Lisp style, while using whatever data structure is actually most appropriate for the job.

Many common operations in ClojureScript are part of the sequence API. Functions from the sequence API are used to select items from sequences, add items to sequences, and produce, consume, and transform sequences. Understanding how sequences work will give you a major leg up in understanding and writing idiomatic ClojureScript code and, once you get the hang of it, will help you write functions of your own that are highly general and composable with other sequence functions.

The Sequence Abstraction

The basic definition of a sequence is very simple. All sequences have two elements: a first, which is the first element, and a rest, a sequence of the remaining elements. An empty first and rest in ClojureScript are directly analogous to the car and cdr of older Lisps. They are renamed to be clearer and to emphasize that they are abstract concepts, not inherently bound to any particular implementation of a sequence. A nil rest signals the end of the sequence. You can obtain the first or rest of a sequence by using the first or rest functions.

Anything that can be represented as a first and a rest can be a sequence. The seq function is used to polymorphically obtain a sequence view of any object that supports it. Calling seq explicitly to convert a collection is rarely necessary, however, since sequence functions (including first and rest) call seq on their argument for you.

Lists and vectors are the most obvious sequences, being naturally ordered collections of elements:

(first [:a :b :c])  ;=> :a
(rest [:a :b :c])   ;=> (:b :c)

(first '(1 2 3))    ;=> 1
(rest '(1 2 3))     ;=> (2 3)

Sets are also sequences. Although they are unordered, you can still use sequence functions on them; the elements will just be cast into an arbitrary (though consistent, for the same set) order:

(first #{:b :c :a})  ;=> :a
(rest #{:b :c :a})   ;=> (:b :c)

Maps are also sequences of key-value pairs, represented as two-element vectors and returned in arbitrary order:

(first {:b 2 :a 1 :c 3})  ;=> [:a 1]
(rest {:b 2 :a 1 :c 3})   ;=> ([:c 3] [:b 2])

Other items which can be viewed as a sequence (sequable objects) include native JavaScript arrays and strings (as sequences of their constituent characters).

Lazy Sequences

Despite their obvious utility for working with data structures, one of the advantages of sequences is that they don’t need to be backed by an actual data structure in memory. All they have to do is implement first and rest meaningfully.

This leads to an interesting and extremely useful feature: it is possible to have sequences with a rest that isn’t actually created until you call the rest function. Sequences with a nonrealized rest are known as lazy sequences. Because lazy sequences don’t have to fully exist in memory all at once, they can be arbitrarily large, even infinite.

For example, in ClojureScript, it is possible to construct an infinite sequence of every positive integer using the iterate function:

(do
  (def i (iterate inc 0))
  nil)

As an aside, note that the def is wrapped in a do, which returns nil. This is only to prevent the value of the i symbol from being printed back at the REPL, which ClojureScript does by default. i can exist as a lazy sequence using hardly any memory, but if you try to print it, it will try to print the entire thing to the REPL. Obviously, this is impossible. This is one thing to be careful of when using infinite lazy sequences: don’t do something that would cause them to be printed! This will almost certainly crash your process and force you to restart.

iterate is a higher-order function that takes two arguments, a function and an initial value. It returns a lazy sequence with a first element of the initial value. Its rest sequence is lazily constructed, and in turn has an first value of the function applied to the initial value. Its rest, also lazy, is the result of applying the function to the previous value, and so on.

So, by starting with the inc function and an initial value of 0, the above expression constructs a sequence where each successive element is constructed by incrementing the previous element; this is a sequence of the positive integers. But the calculation is only performed when the sequence is actually realized, so it doesn’t end up using an infinite amount of memory.

Letting Go of the Head

Lazy sequences are cached, so each element is realized only once, and then stored instead of being recalculated. This means that expensive computations won’t be run unnecessarily. It also means that it’s safe to have sequence generating functions with side effects—they may not happen at all, if the sequence is never realized, but they’ll never be executed repeatedly.

The downside is that lazy sequences do consume memory, once they’ve been realized. For this reason, if you’re processing a very large or infinite sequence, it’s important not to maintain a reference to the head of the sequence as you iterate over it. That way, the earlier parts of the sequence can be garbage collected each time you move forward on the sequence. But if you maintain a reference to the entire sequence, its elements will be cached, not garbage collected, and you’re likely to get out-of-memory errors.

The Sequence API

ClojureScript has a large library of functions that operate on sequences. Although you don’t have to know them all, familiarity with the basic ones covered here is critical to writing idiomatic functional code. In particular, you should become very comfortable with map, reduce, and filter if you’re not already; these are the staples of the functional programming style.

map

map is a higher order function that takes a function (which takes a single argument) and a sequence, and returns a sequence of items resulting from applying the function to each item in the input sequence. It is lazy; the input sequence is only consumed and the mapping function applied when the resulting sequence is realized. As such, it can be used on an infinite sequence to return a new infinite sequence.

For example, the following expression applies a function to obtain the square of every number in the input sequence:

(map (fn [n] (* n n)) [1 2 3 4 5])  ;=> (1 4 9 16 25)

map can also take more than one sequence. In this case, the function provided must take the same number of arguments as there are sequences, and the function is applied to the first member of each sequence, then the second, third, and so on. Processing stops at the end of the shortest sequence. For example:

(map (fn [a b] (+ a b)) [1 2 3] [10 20 30 40])  ;=> (11, 22, 33)

reduce

reduce is a function that takes a function and a sequence and uses the provided function to consume the entire sequence and return a single value. As such, it is not lazy and should not be used on infinite sequences.

The supplied function must take two arguments; it is first invoked on the first two elements of the sequence, then invoked again with the resulting value and the third item, then with the result of that and the fourth item, and so on until the sequence is consumed.

The following example uses the + function to obtain a sum of the numbers in a sequence:

(reduce + [1 2 3 4])  ;=> 10

You can also invoke reduce with three arguments, supplying an initial value as the second argument to reduce. It will be used along with the first element from the sequence in the first function invocation, instead of using the first two values from the sequence.

filter

filter takes a function and a sequence, and returns a sequence consisting only of the items for which applying the function to items in the input sequence returns logical true. It is fully lazy, although consuming a single item in the resulting sequence may “jump” forward in the input sequence until it finds one that meets the specified criteria.

The following example uses the built-in integer? function to return a filtered sequence only of numbers that are integers:

(filter integer? [1 2.71 3.14 5 42])  ;=> (1 5 42)

Other Useful Sequence Functions

Although there are a great many other sequence functions you’ll want to explore, here are some of the most frequently used:

cons

Takes an item and a sequence, and returns a new sequence with the item as its first and the sequence as its rest.

count

Takes a sequence as its argument, and returns the length of a sequence. It must realize the entire sequence to do so, so don’t call it on an infinite sequence as it will never return.

nth

Takes a sequence and an index (starting from zero), and returns the item at that location in the sequence. For example, (nth x 5) returns the sixth item in x.

take

Takes a number n and a sequence, and returns a new sequence consisting of the first n items in the input sequence. It is fully lazy; it will only realize the input sequence when and if the returned sequence is realized.

drop

Takes a number n and a sequence, and returns a new sequence of all the items in the input sequence except for the first n items. It is also lazy.

concat

Takes any number of sequences as arguments, and returns a single sequence containing all the items in each of the input sequences. It is lazy.

reverse

Takes a single sequence and returns a sequence of its items in reverse order. It cannot be lazy, since it needs to realize the entire input sequence in order to get its last item.

Finally, there is a special sequence generator that’s actually a macro, not a function:

lazy-seq

This is the basic, low-level way to create lazy sequences. It is a macro that takes any number of forms as its body. The body, when evaluated, should return a sequence or nil. However, the body isn’t evaluated right away. Instead, it’s stored in a closure, and lazy-seq returns an unrealized lazy sequence. Only if the sequence is eventually realized will the body be invoked, returning the result of the body as the realized sequence.

By making recursive function calls in the body of lazy-seq, it’s possible to construct lazy sequences of arbitrary or infinite length.

Normally, it’s not necessary to use lazy-seq directly; usually, one of the provided sequence-generating functions is more suitable. iterate, in particular, is very flexible. However, when you need it, lazy-seq can be a powerful tool to create arbitrary lazy sequences.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset