Chapter 6. Lists

So far, we’ve seen lists in a few contexts, but we haven’t really explored how they work and why they’re useful. Since lists are central to Lisp, this chapter provides a thorough look at this data structure.

The Simple View of Lists

As we’ve already seen, a list in Lisp is a sequence of zero or more Lisp expressions enclosed in parentheses. Lists may be nested; that is, the enclosed subexpressions may include one or more lists. Here are a few examples:

(a b c)        ;list of three symbols
(7 "foo")      ;list of number and string
((4.12 31178)) ;list of one element: a sublist of two numbers

The empty list () is synonymous with the symbol nil.

The functions car and cdr [23] are used to access parts of a list: car yields the first element in a list, and cdr yields the remainder of the list (everything but the first element).

(car '(a b c)) ⇒ a
(cdr '(a b c)) ⇒ (b c)
(car (cdr '(a b c))) ⇒ b

(Recall that quoting an expression—even a complete list—means to use that expression literally. So '(a b c) means the list containing a, b, and c, not the result of calling function a on arguments b and c.)

The cdr of a one-element list is nil:

(cdr '(x)) ⇒ nil

The car and cdr of the empty list are both nil:

(car '()) ⇒ nil
(cdr '()) ⇒ nil

Note that this is also true of the list containing nil:

(car '(nil)) ⇒ nil
(cdr '(nil)) ⇒ nil

This does not mean that () is the same as (nil).

You don’t have to take my word for any of this. Just go into the *scratch* buffer and, as explained in the section on Evaluating Lisp Expressions in Chapter 1, try any of these examples for yourself.

Lists are constructed with the functions list, cons, and append. The function list makes a list out of any number of arguments:

(list 'a "b" 7) ⇒ (a "b" 7)
(list '(x y z) 3) ⇒ ((x y z) 3)

The function cons takes an arbitrary Lisp expression and an existing list. It makes a new list by prepending the arbitrary expression to the old list:

(cons 'a '(3 4 5)) ⇒ (a 3 4 5)
(cons "hello" '()) ⇒ ("hello")
(cons '(a b) '(c d)) ⇒ ((a b) c d)

Note that consing onto a list creates a new list without affecting the old list:

(setq x '(a b c))     ;assign (a b c) to variable x
(setq y (cons 17 x))  ;cons 17 onto it and put it in y
y ⇒ (17 a b c)        ;as expected
x ⇒ (a b c)           ;no change in x

The function append takes any number of lists and makes a new list by concatenating the top-level elements of all the lists. It effectively strips off the outer parentheses of each list, sticks all the resulting elements together, and surrounds them with a new pair of parentheses:

(append '(a b) '(c d)) ⇒ (a b c d)
(append '(a (b c) d) '(e (f))) ⇒ (a (b c) d e (f))

The function reverse takes a list and makes a new list by reversing its top-level elements.

(reverse '(a b c)) ⇒ (c b a)
(reverse '(1 2 (3 4) 5 6)) ⇒ (6 5 (3 4) 2 1)

Note that reverse does not reverse elements in sublists.

List Details

This section explains the inner workings of Lisp lists. Since most Lisp programs employ lists to some degree or other, it is beneficial to understand why they work the way they do. This will help you to understand what Lisp lists are and aren’t good at.

Lists are composed of smaller data structures called cons cells. A cons cell is a structure that contains two Lisp expressions, referred to, you may not be surprised to learn, as the cell’s car and cdr.

The function cons creates a new cons cell from its two arguments. Contrary to the implication in the preceding section, both arguments to cons may be arbitrary Lisp expressions. The second one need not be an existing list.

(cons 'a 'b) ⇒ a cons cell with car a and cdr b
(car (cons 'a 'b)) ⇒ a
(cdr (cons 'a 'b)) ⇒ b

The resulting cons cell is usually depicted as in Figure 6-1.

The result of (cons ' a ' b)
Figure 6-1. The result of (cons ' a ' b)

When you cons something onto a list, as in

(cons 'a '(b c))

the result is (a b c), which is merely a cons cell whose car is a and whose cdr is (b c). More about this below.

There’s a special syntax for cons cells whose cdrs aren’t lists. It’s called dotted pair notation, and cons cells themselves are sometimes referred to as dotted pairs:

(cons 'a 'b) ⇒ (a . b)
(cons '(1 2) 3) ⇒ ((1 2) . 3)

When the cdr of a cons cell is nil, as in Figure 6-2, the dotted pair notation can be abbreviated to omit the dot and the nil .

A single-element list: (a)
Figure 6-2. A single-element list: (a)

Another abbreviation rule says that if the cdr of a cons cell is another cons cell, then the dot can be discarded along with the parentheses surrounding the cdr. See Figure 6-3.

One cons cell points to another
Figure 6-3. One cons cell points to another

When combined with the abbreviation rule about nil cdrs, we recognize the lists with which we’re already familiar:

(a . (b . nil)) ≡ (a b . nil) ≡ (a b)

Generally speaking, a Lisp list is a chain of cons cells where each cdr is another cons cell and the last cdr is nil. It doesn’t matter what the cars of the cons cells are. Figure 6-4 shows a list as part of another list.

A list containing a sublist
Figure 6-4. A list containing a sublist

When you write

(setq x '(a b c))

you make x point to the first cons cell in a three-cons-cell chain. If you then write

(setq y (cdr x))      ;now y is (b c)

you make y point to the second cons cell in the same chain. A list is really only a pointer to a cons cell.

A list where the last cdr is not nil is sometimes called an improper list. Frequently, the entries in an association list (see below) are improper lists.

There are several functions for testing whether a Lisp object is a list or a list component.

  • consp tests whether its argument is a cons cell. (consp x) is true when x is any list except the empty list, and false for all other objects.

  • atom tests whether its argument is atomic. (atom x) is the opposite of (consp x)—everything that’s not a cons cell, including nil, numbers, strings, and symbols, is an atom.

  • listp tests whether its argument is a list. (listp x) is true for all cons cells and for nil, false for everything else.

  • null tests whether its argument is nil.

Now that you know about cons cells, you might find it odd that (car nil) and (cdr nil) are both defined to be nil, when nil isn’t even a cons cell and therefore has no car or cdr. Indeed, a few dialects of Lisp make it an error to call car and cdr on nil. Most Lisps behave like Emacs Lisp in this regard, however, mainly for convenience—but this special case does have the bizarre side effect (previously noted) of making () and (nil) behave the same way with respect to car and cdr.

Recursive List Functions

Traditional Lisp textbooks use a variety of short programming exercises to illustrate the behavior of lists and cons cells. Let’s just take a moment now to look at two of the better-known examples, then move on.

Our goal in this exercise is to define a function called flatten that, given a list, undoes any nesting of sublists, causing all elements to be at the top level. For example:

(flatten '(a ((b) c) d)) ⇒ (a b c d)

The solution calls for recursion, flattening the car and the cdr separately, then recombining them in a way that preserves flatness. Suppose the input to flatten is the list

((a (b)) (c d))

The car is (a (b)) which when flattened yields (a b). The cdr is ((c d)) which when flattened becomes (c d). The function append can be used to combine (a b) with (c d) and preserve flatness, yielding (a b c d). So at the heart of flatten is this code:

(append (flatten (car lst))
        (flatten (cdr lst)))

(I’ve chosen lst as the name for flatten’s parameter. I prefer not to use list, which is the name of a Lisp function.) Now, flatten is only meant to work on lists, so (flatten (car lst)) is an error if (car lst) is not a list. We must therefore elaborate as follows:

(if (listp (car lst))
    (append (flatten (car lst))
            (flatten (cdr lst))))

This if has no “else” clause. What if (car lst) is not a list? For example, suppose lst were

(a ((b) c))

The car, a, is not a list. In this case, we will simply need to flatten the cdr, (((b) c)), to get (b c); then cons the car back onto it.

(if (listp (car lst))
    (append (flatten (car lst))
            (flatten (cdr lst)))
  (cons (car lst)
        (flatten (cdr lst))))

Finally, we need a way to terminate the recursion. In recursive functions that work on smaller and smaller pieces of lists, the smallest piece you can wind up with is nil, and nil is almost always used as the “basis case” for such functions. In this case, the result of flattening nil is simply nil, so our complete function is

(defun flatten (lst)
  (if (null lst)           ; Is lst nil?
      nil                  ; If so, return nil
    (if (listp (car lst))
        (append (flatten (car lst))
                (flatten (cdr lst)))
      (cons (car lst)
            (flatten (cdr lst))))))

Try running this function on a few sample lists in the *scratch* buffer, and try following the logic of the function by hand on a few examples. Remember that the return value of a function in Lisp is the value of the last expression to execute.

Iterative List Functions

Recursion isn’t always the right solution for a list-related programming problem. Sometimes plain old iteration is needed. In this example, we’ll demonstrate the Emacs Lisp idiom for operating on a list of elements one at a time, sometimes called “cdr-ing down” a list (because in each iteration, the list is shortened by taking its cdr).

Suppose we need a function that counts the number of symbols in a list, skipping over other kinds of list elements like numbers, strings, and sublists. A recursive solution is wrong:

(defun count-syms (lst)
  (if (null lst)
      0
    (if (symbolp (car lst))
        (+ 1 (count-syms (cdr lst)))
      (count-syms (cdr lst)))))

Recursion—particularly deep recursion—introduces a fair amount of overhead in terms of keeping track of many nested function calls and return values, and should be avoided when possible. Furthermore, this problem naturally suggests an iterative solution, and code should usually reflect the natural approach to solving a problem, rather than obfuscating the solution by being too clever.

 (defun count-syms (lst)
   (let ((result 0))
     (while lst
       (if (symbolp (car lst))
           (setq result (+ 1 result)))
       (setq lst (cdr lst)))
     result))

Other Useful List Functions

Here are some more list-related Lisp functions that Emacs defines.

  • length returns the length of a list. It does not work on “improper” lists.

    (length nil) ⇒ 0
    (length '(x y z)) ⇒ 3
    (length '((x y z))) ⇒ 1
    (length '(a b . c)) ⇒ error
  • nthcdr calls cdr on a list n times.

    (nthcdr 2 '(a b c)) ⇒ (c)
  • nth returns the nth element of a list (where the first element is numbered zero). This is the same as the car of the nthcdr.

    (nth 2 '(a b c)) ⇒ c
    (nth 1 '((a b) (c d) (e f))) ⇒ (c d)
  • mapcar takes a function and a list as arguments. It calls the function once for each element of the list, passing that element as an argument to the function. The result of mapcar is a list of the results of calling the function on each element. So if you have a list of strings and want to capitalize each one, you could write:

    (mapcar '(lambda (x)
               (capitalize x))
            '("lisp" "is" "cool")) ⇒ ("Lisp" "Is" "Cool")
  • equal tests whether its two arguments are equal. This is a different kind of equality test from eq, which we first encountered in the section called Saving and Restoring Point in Chapter 3. Whereas eq tests whether its arguments are the same object, equal tests whether two objects have the same structure and contents.

    This distinction is important. In the following example:

    (setq x (list 1 2 3))
    (setq y (list 1 2 3))

    x and y are two different objects. That is, the first call to list creates a chain of three cons cells, and the second call to list creates a chain of three more cons cells. So (eq x y) is nil, even though the two lists have the same structure and contents. But since they do have the same structure and contents, (equal x y) is true.

    In Lisp programming, any time you wish to compare two objects for equality, you must be alert to whether eq or equal is appropriate. Another consideration is that eq is an instantaneous operation, whereas equal may have to recursively compare the structure of its two arguments.

    Note that when you write

    (setq x (list 1 2 3))
    (setq y x)

    (eq x y) becomes true.

  • assoc is a function that helps you use lists as lookup tables. When a list has the form

    ((key1 . value1)
     (key2 . value2)
     ...
     (keyn . valuen))

    it is called an association list, or assoc list for short. [24] The function assoc finds the first sublist of an assoc list whose car (the keyi “field”) matches the argument you give. So:

    (assoc 'green
           '((red . "ff0000")
             (green . "00ff00")
             (blue . "0000ff"))) ⇒ (green . "00ff00")

    If no matching sublist is found, assoc returns nil.

    This function uses equal to test whether each keyi matches the argument you give. Another function, assq, is like assoc but uses eq instead.

    Some programmers do not like dotted pairs, so instead of setting up a lookup table in this form:

    ((red . "ff0000")
     (green . "00ff00")
     (blue . "0000ff"))

    they’ll do this instead:

    ((red "ff0000")
     (green "00ff00")
     (blue "0000ff"))

    This is fine, because as far as assoc is concerned, each element of the list is still a dotted pair:

    ((red . ("ff0000"))
     (green . ("00ff00"))
     (blue . ("0000ff")))

    The only difference is that in the earlier example, each entry in the assoc list can be stored in a single cons cell, but now each entry requires two cons cells. And retrieving the value associated with a key could previously be done like this:

    (cdr (assoc 'green ...)) ⇒ "00ff00"

    but now must be done like this:

    (car (cdr (assoc 'green ...))) ⇒ "00ff00"

Destructive List Operations

So far, all the list operations we’ve looked at have been non-destructive. For instance, when you cons an object onto an existing list, the result is a brand new cons cell whose cdr points to the unaltered old list. Any other objects or variables that refer to the old list are unaffected. Similarly, append works by making a brand new list, creating as many new cons cells as necessary to hold the elements of the lists in its arguments. It cannot make the last cdr of x point directly to y, or the last cdr of y point directly to z, because the nil pointer at the end would be changed. x and y could no longer be used in their original forms. Instead append makes an unnamed copy of those lists as shown in Figure 6-5. Note that the value of z need not be copied; append always uses its last argument directly.[25]

The append function does not alter its arguments.
Figure 6-5. The append function does not alter its arguments.

Here’s what the non-destructiveness of append means in Lisp code:

(setq x '(a b c))
(setq y '(d e f))
(setq z '(g h i))
(append x y z) ⇒ (a b c d e f g h i)

Because append does not destructively modify its arguments, these three variables continue to have their old values:

x ⇒ (a b c)
y ⇒ (d e f)
z ⇒ (g h i)

But if destructive modification were used, then each variable would refer to some part of a single, long cons chain made when the three shorter cons chains are strung together as shown in Figure 6-6. The function that performs a destructive append is called nconc.

(nconc x y z) ⇒ (a b c d e f g h i)
x ⇒ (a b c d e f g h i)
y ⇒ (d e f g h i)
z ⇒ (g h i)
Unlike append, nconc alters its arguments
Figure 6-6. Unlike append, nconc alters its arguments

Usually it’s unwise to destructively modify lists. Many other variables and data structures may be using the same copies of the lists you modify, so it’s best not to change them in ways that would have unpredictable effects.

On the other hand, sometimes you do want to destructively modify a list. Perhaps you need the efficiency of nconc and you know that no other code depends on the data structure remaining unchanged.

One of the most common uses of destructive list operations is when changing values in an assoc list. For example, suppose you have an assoc list that maps people’s names to their email addresses:

(setq e-addrs
      '(("robin" . "[email protected]")
        ("marian" . "[email protected]")
         ...))

Now suppose someone’s email address changes. You could update the list like this:

(setq e-addrs
      (alist-replace e-addrs "john" "[email protected]"))

where alist-replace is some hideously expensive recursive operation that basically recopies the whole list:

(defun alist-replace (alist key new-value)
  (if (null alist)
      nil
    (if (and (listp (car alist))
             (equal (car (car alist))
                    key))
        (cons (cons key new-value)
              (cdr alist))
      (cons (car alist)
            (alist-replace (cdr alist) key new-value)))))

Not only is this slow (especially if the input is large), but this is a case where you probably do want to affect any other objects or variables referring to this data structure. Unfortunately, alist-replace doesn’t actually change the data structure. It makes a brand-new copy, and anything referring to the old copy does not see the update. In code, this means:

(setq alist '((a . b) (c . d)))              ;alist is an assoc list.
(setq alist-2 alist)                         ;alist-2 refers to the same list.
(setq alist (alist-replace alist 'c 'q))     ;alist is a new list.
alist ⇒ ((a . b) (c . q))                    ;alist reflects the change.
alist-2 ⇒ ((a . b) (c . d))                  ;alist-2 still refers to the old list.

Enter setcar and setcdr. [26] Given a cons cell and a new value, these functions replace the cell’s car or cdr with the new value. Examples:

(setq x (cons 'a 'b)) ⇒ (a . b)
(setcar x 'c)
x ⇒ (c . b)
(setcdr x 'd)
x ⇒ (c . d)

We can now easily write a destructive version of alist-replace like so:

(defun alist-replace (alist key new-value)
  (let ((sublist (assoc key alist)))
    (if sublist
        (setcdr sublist new-value))))

This finds the sublist of alist whose car is the sought-for key—e.g., the sublist ("john" . "[email protected]")—and replaces the cdr with new-value. Since this changes the data structure in place—that is, since it doesn’t work by making a new copy of anything—all variables and other objects that refer to the cons cell, particularly the assoc list containing it, reflect the change.

There is one other important destructive list operation: nreverse, a non-copying version of reverse.

(setq x '(a b c))
(nreverse x) ⇒ (c b a)
x ⇒ (a)

Why does x equal (a) after the last example? It’s because x continues to refer to the same cons cell, which has gotten shuffled around. Consider: the list (a b c) consists of three cons cells, one whose car is a, one whose car is b, and one whose car is c. At first, x refers to the list by referring to the first cons cell—the one whose car is a and whose cdr refers to the next cons cell in the chain (which is the one containing b). But after nreverse, the cdrs of all the cons cells are changed. Now the cons cell whose car is c is the first in the chain, and its cdr is the cons cell containing b. Meanwhile, x’s value hasn’t changed: it still refers to the original cons cell, whose car is a. But now that cell’s cdr is nil because it’s at the end of the chain, so x is (a).

If you need x to reflect the change in the list, you’d have to write

(setq x (nreverse x)) ⇒ (c b a)

Circular Lists?!

Because we can destructively modify lists after they’re created, we’re not limited to building lists only out of pre-existing parts. A list can be made to refer to part of itself! Consider:

(setq x '(a b c))
(nthcdr 2 x) ⇒ (c)
(setcdr (nthcdr 2 x) x)      ;don't try this yet!

What’s happening in this example? First we create a three-element list and place it in x. Next, we find the last cons cell with nthcdr . Finally, we replace that cell’s cdr with x—which is the first cons cell in the list. The list is now circular: the former end of the list now points back to the beginning.

What does this list look like? Well, it starts out like this:

(a b c a b c a b c a b c a b c a b c a b c a b c ...

and it never stops. The reason I wrote don’t try this yet! above is that if you executed the setcdr in the *scratch* buffer, Emacs would try to display the result—and would never finish. It would get caught in an infinite loop, albeit one that you can interrupt with C-g. Go ahead and try it now, but press C-g as soon as Emacs locks up. The longer you wait, the more it fills up the *scratch* buffer with repetitions of a b c.

Obviously, printing isn’t the only thing you can do to a circular data structure that can make Emacs loop forever. Any operation that iterates over all the elements in a list will never terminate. Here’s an important illustration:

(eq x (nthcdr 3 x)) ⇒ t       ;3rd cdr is same object as x
(equal x (nthcdr 3 x)) ⇒ never terminates!

If circular lists throw Emacs for a loop (pun intended), what good are they? One doesn’t normally think of lists as being circular, but if you stop thinking of them as lists and start thinking of them as connected cons cells, you can build any kind of linked data structure, from trees to lattices. Some data structures are self- referential, i.e., circular. If you ever find yourself needing to build such a data structure, you should not be daunted by the fact that Emacs loops forever trying to display it. Simply don’t evaluate it in a context where the result needs to be displayed. For instance, if we changed the setcdr above to

(setq x '(a b c))
(progn
  (setcdr (nthcdr 2 x) x)
  nil)

then Emacs would not try to display the result of the setcdr, and now x is a circular data structure that we can manipulate without trying to display the whole thing.

(nth 0 x) ⇒ a
(nth 1 x) ⇒ b
(nth 412 x) ⇒ b


[23] Pronounced “could-er.” These names are historical holdovers from the computer architecture on which Lisp was first designed.

[24] I’ve never found consensus on whether this should be pronounced a-SOAK, a-SOASH, or a-SOCK list. I’ve heard all three. Some avoid the problem by calling it an “a-list.”

[25] Because it’s pointed to directly, the last argument to append doesn’t even have to be a list! Try it and see.

[26] Also called rplaca and rplacd, for the same historical reasons that gave us car and cdr

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset