Now that we’ve got primitives out of the way, we need to start doing something with them. Single atomic values are great and all, but things get much more interesting when we start globbing them all together. As you’ll see soon enough, data manipulation is one of Clojure’s strong suits.
What makes Clojure so good at manipulating collections? It comes down to three things: immutability, persistence, and the sequence abstraction. Every one of Clojure’s built-in collection types has these properties and is thus unified in its API’s appearance and behavior.
As the great Alan J. Perlis (an early computer science pioneer) put it:
It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures.
This chapter introduces Clojure collections and where/how to use them. Finally, we wrap things up by showing you how to build your own feature-complete types that look and behave just like the rest of Clojure’s collections by leveraging Clojure’s capacity for interface polymorphism.
Immutability means that a Clojure data structure, once created, can never change. You can only “modify” an immutable data structure by creating a new data structure that is a copy of the old, with the desired changes in place.
Immutability also means that Clojure data structures, however deeply
nested, are simple values, just like the number 3
or the character
z
. It doesn’t make sense to speak of “changing” the value of 3
—
it just is. If you “change” it by, say, incrementing it, you don’t
modify 3
itself. Instead, you end up with an entirely new and
different value, 4
. Clojure extends this notion of value to all data
structures. In Clojure, any action that in another language would
perform any kind of update on a data structure will instead return an
entirely new one. You can continue to pass around and use both the old
and the new versions with confidence, knowing that nothing you can do
will cause any unintended changes elsewhere in your program.
This feature is extremely important in concurrent and parallel programming, where unexpected mutation is the source of a large class of bugs. With immutable data, any number of threads can read from the same data without any worrying about locks or race conditions—it’s always safe to read something that can’t change. “Clone” operations are not only free, but unnecessary.
But, you may ask, how can that possibly be efficient? Surely it is impractical to do a full copy of an object every time you need to add something?
Yes, it would be, except for the feature of persistence. Persistence means that Clojure’s data structures, although logically immutable, can still share pieces of their internal structure for efficiency in both time and space. Essentially, updated versions of immutable data only need to store the deltas from pervious versions, rather than doing a full deep copy.
To make it performant, all of this uses some extremely clever algorithms, of course. See the book Purely Functional Data Structures by Chris Okasaki (Cambridge University Press) for a detailed description of how they work.
From vectors, maps, sets, and lists to strings and streams, every last one of Clojure’s collections behaves in a similar, predictable fashion. They are simple tools for a more civilized age. This is on account of Clojure’s sequence abstraction.
The array of collection-manipulating functions in Clojure are all
implemented in terms of one simple abstraction: every collection can
be treated as a sequence of values. By implementing first
, rest
,
and cons
, any data structure—even ones you build yourself—can
participate in the ISeq
interface.
Then, you can use Clojure’s huge library of functions that can operate
on sequences, with any of the data structures. All of functional
programming’s most beloved functions (map
, reduce
, filter
, etc.)
will work interchangeably on any data structure. In essence, the
sequence abstraction allows all the expressiveness of traditional
list-based LISP programming without forcing you to actually use lists. Instead, use whatever type is most efficient for the task, knowing
that you can consume them all in the same way when it makes sense to
do so.
There are two basic ways to specifically construct a list (a
clojure.lang.PersistentList
).
You can use parentheses in combination with a single quote to indicate that the list should only be read as a data structure, not immediately evaluated:
'
(
1
:2
"3"
)
;; -> (1 :2 "3")
Or, more commonly, you can use the list
function, which takes a
variadic number of arguments and constructs a list from them:
(
list
1
:2
"3"
)
;; -> (1 :2 "3")
Typically, between these two approaches, using the list
function is
the better choice. The problem with constructing quoted lists is that
the quote also prevents evaluation of everything inside the list,
which means that symbols will be returned as literal symbols, instead
of resolving variables or calling functions. list
, however, will evaluate its
arguments in the normal way before constructing the list and is
usually what is desired for nonmacro code:
(
def
x
2
)
'
(
1
x
)
;; -> (1 x)
(
list
1
x
)
;; -> (1 2)
That said, '()
is the idiomatic way to create an empty list—it is
more terse, and the concern about evaluating its contents is
irrelevant when it’s empty.
You have an existing sequential data structure that you would like to convert into a list as its concrete data type.
The easiest solution is: don’t. Having a concrete list provides little or no advantage over simply using the sequence abstraction directly on your existing data, and for large data structures, conversion can be expensive.
If you do know that you need an explicit conversion of the concrete data structure, there are two ways to do it.
First, you could use the apply
function to call the list
function,
passing it your existing data structure as its arguments:
(
apply list
[
1
2
3
4
5
])
;; -> (1 2 3 4 5)
Alternatively, you could use the into
function to repeatedly conjoin
elements from your original data onto a list. Note, however, that
this approach has the effect of reversing the order of the original
collection:
(
into
'
()
[
1
2
3
4
5
])
;; -> (5 4 3 2 1)
These two approaches are both viable choices. However, what actually happens in each case is very different.
When using apply
, you are actually invoking the list
function with
however many arguments are in the data structure. This may sound
strange, particularly if the data structure contains millions of
items. What does it mean to invoke a function with a million
arguments? How does that even work, given that the JVM limits methods
to 255 arguments (see the JVM
class file specification)?
As it turns out, functions with variadic arguments (such as list
)
are handled in a somewhat special way: the argument list is passed in
as a sequence. apply
knows this and passes this sequence view of the
original structure directly through to the receiving function. This is
why it works; there is never actually a JVM method invocation with a
million arguments.
into
works quite differently: it takes two arguments, the first
being a data structure and the second being a sequence. It then
repeatedly conjoins items from the sequence onto the data structure
provided using the conj
function (discussed in greater detail
elsewhere). This is why the sequence is reversed; items are always
pulled from the front of the sequence, but conj
on a list prepends
the element being added. Therefore, the first element in the input
sequence will end up being the last item in the list, and so on.
So why ever choose into
over apply
, given that it reverses the
order? Speed. into
utilizes Clojure transients, which provide a
considerable performance improvement. On the author’s machine,
converting a million-item vector to a list using apply
took an
average of 750 milliseconds, while using into
took about half that
time, for an average of 350 milliseconds. Of course, the list was in
reverse order, and reversing either the input or the output negates
the speed advantage. In the end, into
is only advantageous in
situations where a reversed order is acceptable.
You want to add an item to a list; or, putting it in functional terms, you want to derive from an existing list a new list that contains an additional item.
Use the conj
function. conj
is used to add an item or items to a
logical collection and is polymorphic, meaning it works on multiple
concrete data types, including lists:
(
conj
(
list
1
2
3
)
4
)
;; -> (4 1 2 3)
You can also add multiple items at once:
(
conj
(
list
1
2
3
)
4
5
)
;; -> (5 4 1 2 3)
The behavior of conj
may vary slightly depending on the concrete
type. It always “adds” an item to an immutable collection by returning
a new collection containing the new item, but may add the item to
different places in the collection depending on what is most efficient
for the particular type.
In the case of lists, conj
will always add the item at the
beginning of the list, since a linked list data structure supports
constant-time insertion only at the beginning.
You want to obtain a list without a particular item in it, removing an item from the original list.
Removing the first item from a list is easily accomplished using one
of two functions, rest
or pop
. Both work identically when used on
a nonempty list:
(
pop
'
(
1
2
3
))
;; -> (2 3)
(
rest
'
(
1
2
3
))
;; -> (2 3)
rest
is actually a sequence function, used to obtain the tail of a
sequence. Since Clojure lists implement the sequence interface
directly, using rest
on a list will always return another (possibly
empty) list.
pop
is similar to conj
in that it operates on concrete data
structures rather than the sequence interface. Like conj
, it is
polymorphic; also like conj
, the position it removes the item
from depends on what’s most efficient for the concrete type.
When used on an empty list, the behavior does differ; pop
will throw
an exception, while rest
will return an empty list:
(
pop
'
())
;; -> IllegalStateException Can't pop empty list ...
(
rest
'
())
;; -> ()
Lists do not support removing items except at the first position. If you need to remove an item in the middle or at the end of a list, you’ll have to do so using the sequence manipulation functions, then convert the result back into a concrete list (if you absolutely need it to be a list, for some reason).
The list?
function may seem like the obvious choice, but in most
cases, it’s better to use the more general seq?
function as your
test.
The list?
function specifically tests if the argument implements
clojure.lang.IPersistentList
, but in most cases, you really want to
know if the value is a seq (implements clojure.lang.ISeq
), which
is a more general abstraction than a list.
Not everything that prints as a list (in parentheses) actually
satisfies the list?
test. In practice, you’ll often receive Cons
and LazySeq
values when manipulating lists. By focusing on the
fundamental seq abstraction, you don’t need to worry about the
details of those concrete implementations:
;; A list constructed via list satisfies both list? and seq?
(
list?
(
list
1
2
3
))
;; -> true
(
seq?
(
list
1
2
3
))
;; -> true
;; cons, however *looks* like a list, but is actually a Cons
(
list?
(
cons
1
'
(
2
3
)))
;; -> false
(
type
(
cons
1
'
(
2
3
)))
;; -> clojure.lang.Cons
(
seq?
(
cons
1
'
(
2
3
)))
;; -> true
;; range's lazy return value is a seq, but not a list
(
list?
(
range
3
))
;; -> false
(
seq?
(
range
3
))
;; -> true
(
type
(
range
3
))
;; -> clojure.lang.LazySeq
You want to create a vector data structure, either as a literal or from an existing data structure.
By far, the easiest way to create a vector is using the literal vector
notation of square brackets. However, it is also possible to use the
vector
function, which creates a vector of its arguments:
[
1
:2
"3"
]
;; -> [1 :2 "3"]
(
vector
1
:2
"3"
)
;; -> [1 :2 "3"]
To construct a vector from an existing data structure, you can use the
vec
function, which takes any collection and returns a vector
containing the same items:
(
vec
'
(
1
:2
"3"
))
;; -> [1 :2 "3"]
Alternatively, you can use the into
function, which takes two
collections and repeatedly invokes conj
on the first with items from
the second:
(
into
[]
'
(
1
:2
"3"
))
;; -> [1 :2 "3"]
There is rarely any reason to use the vector
function over the
literal vector syntax. Unlike lists, vectors are not evaluated as
function calls (or anything else) in Clojure, so quoting is not a
concern as it is with list literals.
Oddly enough, when constructing a vector from an existing collection,
using the into
approach is currently about 30% more performant on
large collections compared to vec
due to its use of transients to
speed things up. If you’re converting large collections and speed
matters, consider using into
. Otherwise, vec
is usually more
readable.
When used on a vector, the conj
function returns a vector with one
or more items appended to the end:
(
conj
[
1
2
3
]
4
)
;; -> [1 2 3 4]
(
conj
[
1
2
3
]
4
5
)
;; -> [1 2 3 4 5]
Vectors do not support adding new items anywhere aside from the end. If you need to insert an item in the middle, you will have to use a sequence manipulation function and convert back to a vector (if necessary) when you’re done.
Since vectors are associative (mapping integer indexes to values), you
can also use the assoc
function with an index equal to the current length of the vector
(one greater than the maximum index) to append an item:
(
assoc
[
:a
:b
:c
]
3
:x
)
;; -> [:a :b :c :x]
However, this approach is somewhat more fragile than conj
. If the
index you provide is too small, you might simply “overwrite” an
earlier value in the vector; and if it’s greater than the vector’s
current length, it will throw an IndexOutOfBoundsException
.
Still, this technique is worth remembering. If you have code that is
assoc
-ing to a vector already, you can use this technique to produce
new vectors with updated values.
To efficiently remove an item from the end of a vector, use the pop
function, which takes a vector and returns a new vector without the
last item:
(
pop
[
1
2
3
4
])
;; -> [1 2 3]
Although there is no operation designed specifically to remove items
from the beginning of a vector, as pop
does from the end, there is
a function, subvec
, that can be used to efficiently remove any
number of items from the beginning or end of a vector. Given a vector,
a start index, and an (optional) end index, it will return a vector
from the start (inclusive) to end (exclusive) indexes.
The following example drops a single item from the beginning of a
vector. You can use subvec
like so:
(
subvec
[
:a
:b
:c
:d
]
1
)
;; -> [:b :c :d]
Or, to remove items from the beginning and the end of a vector, pass
an end index to subvec
as well:
(
subvec
[
:a
:b
:c
:d
]
1
3
)
;; -> [:b :c]
Because subvec
exploits the internal representation of a vector to
create a subvector that shares the internal structure of the original,
it is extremely efficient and runs in constant time. It is the only
way to efficiently remove items from the beginning of a vector.
While it is certainly also possible to use a function like rest
or
drop
on a vector, these are technically sequence operations, not
vector operations. The value they return is only guaranteed to be a
sequence, not a concrete vector, and as such will not support the same
features or performance guarantees that vectors do.
Of course, you can convert any sequence back into a concrete vector
using vec
or into []
, but this can be an expensive operation for
large vectors.
You have a vector, and you want to retrieve the value the vector contains at a particular location (index).
There are several ways to do this.
The nth
function, which works on all sequences, is special-cased to
provide constant-time performance when used with indexed collections
such as vectors:
(
nth
[
:a
:b
:c
:d
]
2
)
;; -> :c
If given an index greater than the size of the vector, nth
will
throw an exception unless you pass it an optional third argument,
which will be returned if the provided index is out of bounds:
(
nth
[
:a
:b
:c
]
4
)
;; -> IndexOutOfBoundsException
(
nth
[
:a
:b
:c
]
4
:not-found
)
;; -> :not-found
Vectors are themselves functions that when called with an integer argument, will return the value at that index:
(
def
v
[
:a
:b
:c
])
(
v
2
)
;; -> :c
Using an out-of-range index when invoking a vector as a function will
result in an IndexOutOfBoundsException
.
Because vectors support the associative interface with integer indexes
as keys, you can also use the get
function to retrieve values by
index:
(
get
[
:a
:b
:c
]
2
)
;; -> :c
Unlike nth
, when you pass an out-of-range index to get
it will
return nil
, not throw an exception—unless, that is, you provide a
default value to be returned if the key (the index, in this case) is
not found:
(
get
[
:a
:b
:c
]
5
)
;; -> :nil
(
get
[
:a
:b
:c
]
5
:not-found
)
;; -> :not-found
Which technique should you use? All work equally well, but the choice
does emphasize the way in which you’re looking at your vector. nth
focuses on its sequential nature, whereas get
emphasizes its
indexed, associative quality. Using the vector as a function is also
consistent with the way all associative collections in Clojure act as
functions of their keys.
Ultimately, when making a choice like this, you should consider:
nth
), or fundamentally a correlation of values to indexes
(implying get
)?
nil
return value or an exception be preferable?
Given a vector, you would like to obtain a new vector with a different value at a particular index.
Use assoc
to set the value at a particular index:
(
assoc
[
:a
:b
:c
]
1
:x
)
;; -> [:a :x :c]
assoc
can also be used to set multiple indexes at the same time, by
providing additional index/value pairs:
(
assoc
[
:a
:b
:c
]
1
:x
2
:y
)
;; -> [:a :x :y]
As you may have noticed, assoc
is the same function used to set the
values of keys in a map. This is because vectors, like maps, are
associative and implement the same interface
(clojure.lang.Associative
), which is what assoc
uses under the
hood.
Unlike with maps, however, the keys used when using assoc
on a vector
must be integer indexes within the range of the vector. Attempting to
use a noninteger key will cause an IllegalArgumentException
, and
attempting to assoc
an index greater than the size of the vector will throw
an IndexOutOfBoundsException
.
Note that it is possible to assoc
to an index equal to the current size
of the vector (one greater than the maximum index). This will have
the result of appending the item to the end.
You want to create an unordered collection of distinct objects, which can be tested for membership quickly.
Use a set literal to create a set of objects:
#
{
:a
:b
:c
}
;; -> #{:a :c :b}
;; Duplicate elements in set literals are an error
#
{
:x
:y
:z
:z
:z
}
;; -> IllegalArgumentException Duplicate key: :y :z ...
Use hash-set
to create a set from arguments:
(
hash-set
:a
:b
:c
)
;; -> #{:a :c :b}
(
apply hash-set
:a
[
:b
:c
])
;; -> #{:a :c :b}
Use set
to create a set from another collection:
(
set
"hello"
)
;; -> #{e h l o}
Alternatively, use into
with a set to create a new set:
(
into
#
{}
[
:a
:b
:c
:a
])
;; -> #{:a :b :c}
(
into
#
{
:a
:b
}
[
:b
:c
:d
])
;; -> #{:a :b :c :d}
At the time of writing, the into
technique is about three times
faster than set
for large collections of objects. Use it whenever
you’re working with large sets where performance is a concern:
(
def
largeseq
(
doall
(
range
1
e5
)))
(
time
(
dotimes
[
_
100
]
(
set
largeseq
)))
;; *out*
;; "Elapsed time: 5594.961 msecs"
(
time
(
dotimes
[
_
100
]
(
into
#
{}
largeseq
)))
;; *out*
;; "Elapsed time: 1329.66 msecs"
Create a sorted set with sorted-set
:
(
sorted-set
99
4
32
7
)
;; -> #{4 7 32 99}
(
into
(
sorted-set
)
"the quick brown fox jumps over the lazy dog"
)
;; -> #{space a c d e f g h i j k l m o p
;; q s u v w x y z}
Sets are very useful data structures. They are commonly used when you have a collection of values but you are only concerned with the distinct values. Lookup of an element in a set is typically O(1).
The techniques just shown all construct hash sets—sets that are unordered and use a hash table as their internal representation.
Clojure also supports creating sorted sets, which maintain the order
of their elements. Sets created with sorted-set
keep their elements
in ascending order using compare
. This is useful when treating the
set as a seq:
(
def
alphabet
(
into
(
sorted-set
)
"qwertyuiopasdfghjklzxcvbnm"
))
(
last
alphabet
)
;; -> z
(
second
(
disj
alphabet
))
;; -> c
All of the elements in a sorted set must be comparable against one another (e.g., you cannot have a sorted set that contains both strings and numbers). Attempting to add an uncomparable value will result in a runtime error.
Adding or removing objects in a sorted set will always return another sorted set.
If the values you want to store don’t have a natural sort order (or
you don’t want to use their natural ordering), you can specify a
custom comparator using sorted-set-by
. The comparator used to create
the set is preserved when adding or removing objects:
(
def
descending-set
(
sorted-set-by
>
1
2
3
))
(
into
descending-set
[
-1
4
])
;; -> #{4 3 2 1 -1}
There are some performance trade-offs to consider when choosing between hash sets and sorted sets. Hash sets are based on hash tables, which offer constant time insert and lookup in most cases. However, they do require some degree of memory overhead. Sorted sets, based on a balanced red-black binary tree, are more memory-efficient but slower for lookup and insertion.
The conj
function supports sets, just as it does lists, vectors, and
maps. Use it to add an item or items to a set: just pass it the set
and any number of items to add:
(
conj
#
{
:a
:b
:c
}
:d
)
;; -> #{:a :c :b :d}
(
conj
#
{
:a
:b
:c
}
:d
:e
)
;; -> #{:a :c :b :d :e}
To remove one or more items, use the disj
function, which is
specific to sets. It takes a set and one or more keys to remove:
(
disj
#
{
:a
:b
:c
}
:b
)
;; -> #{:a :c}
(
disj
#
{
:a
:b
:c
}
:b
:c
)
;; -> #{:a}
Since sets are unordered, there is no concept of “where” items are added or removed; either a set contains an item, or it doesn’t.
Note that both conj
and disj
return a set of the same concrete
type as the original. A hash set will remain a hash set, and a sorted set
will remain a sorted set.
Also worth noting is that these operations are simply no-ops if the
set already does or does not contain the item being added or
removed. conj
returns the same set if it already contains the item,
just as disj
does if the specified item was already absent.
If you’re adding or removing large numbers of items to or from sets, you
should also consider using the dedicated set manipulation functions
from the clojure.set
namespace: particularly clojure.set/union
,
which can be used to add the items of multiple sets together, and
clojure.set/difference
, which can be used to obtain a set of items
not contained in another set. These are typically a far more natural
expression of set operations than issuing many calls to conj
or disj
, or
invoking them with large numbers of arguments.
The easiest way to check a single item is with the contains?
function, which takes a set and an item and returns true
if the item
is a member of the set:
(
contains?
#
{
:red
:white
:green
}
:blue
)
;; -> false
(
contains?
#
{
:red
:white
:green
}
:green
)
;; -> true
The get
function also works with sets and does basically the same
thing, except instead of returning true
or false
, it returns the
value itself if it is a member, or nil
if it is not:
(
get
#
{
:red
:white
:green
}
:blue
)
;; -> nil
(
get
#
{
:red
:white
:green
}
:green
)
;; -> :green
Finally, sets are also functions. When you invoke them with a single
argument, they work just like get
, returning the argument if it is a
member, and nil
otherwise:
(
def
my-set
#
{
:red
:white
:green
})
(
my-set
:blue
)
;; -> nil
(
my-set
:green
)
;; -> :green
Note as well that keywords behave in the same manner for sets as they
do with maps. Thus, the following is equivalent to having used get
:
(
:blue
#
{
:red
:white
:green
})
;; -> nil
(
:green
#
{
:red
:white
:green
})
;; -> :green
The choice between contains?
and get
is mainly an aesthetic one.
However, if your set might contain nil
as an actual value you care
about, you’ll definitely need to use contains?
, since a nil
return
from get
wouldn’t tell you anything in that case.
The ability to use a set as a function is interesting, but it’s especially useful when you want to use it as a predicate function on a sequence operation. For example, it’s fairly common to want to filter a sequence to only contain items in a set. In this case, using the set itself is both easy and idiomatic:
(
take
10
(
filter
#
{
1
2
3
}
(
repeatedly
#
(
rand-int
10
))))
;; -> (2 1 2 3 2 2 1 2 2 1)
This snippet first creates an infinite lazy sequence consisting of
random numbers between 1 and 10, using repeatedly
to call
rand-int
(wrapped in an anonymous function) over and over. Then it
feeds this sequence through a filter, with a set of the numbers 1–3
as the filter predicate.
The result is another infinite lazy sequence, but containing only members of the predicate set.
This example is contrived. However, using sets as predicate functions is an extremely useful technique that that pops up quite frequently in Clojure projects.
You want to perform common operations on sets, such as taking the union, intersection, or difference of two sets, or you want to test if one set is a subset or superset of another.
All these functions are available in the clojure.set
namespace,
which is built into Clojure.
union
takes any number of sets as arguments and returns a set
containing their union (i.e., a set containing all the elements from all the
sets):
(
clojure.set/union
#
{
:red
:white
}
#
{
:white
:blue
}
#
{
:blue
:green
})
;; -> #{:white :red :blue :green}
intersection
also takes any number of sets as args and returns
their intersection (a set consisting only of the items shared by all
the argument sets):
(
clojure.set/intersection
#
{
:red
:white
:blue
}
#
{
:red
:blue
:green
}
#
{
:yellow
:blue
:red
})
;; -> #{:red :blue}
difference
takes a set as its first argument and returns it without
elements from the sets given in the additional arguments:
(
clojure.set/difference
#
{
:red
:white
:blue
:yellow
}
#
{
:red
:blue
}
#
{
:white
})
;; -> #{:yellow}
subset?
returns true
if and only if the first argument is a subset
of the second (that is, if every member of the first set is also a
member of the second):
(
clojure.set/subset?
#
{
:blue
:white
}
#
{
:red
:white
:blue
})
;; -> true
(
clojure.set/subset?
#
{
:blue
:black
}
#
{
:red
:white
:blue
})
;; -> false
superset?
works the same way, except it returns true
only if the
first set is a superset of the second.
As you may have noticed, superset?
is actually identical to
subset?
, only with the order of the arguments reversed.
In general, you should try to use these set manipulation functions wherever they are applicable. Sets represent a sizable portion of the data most developers work with day to day, whether they are recognized and explicitly modeled as sets or not.
There are a large number of bugs that can be caused by assumptions regarding the behaviors of collections. In programming, the type of data structure used for a given purpose is actually a communication, from the initial writer of the code to future programmers, that tells a number of things about the nature of the collection. Sets are unordered, unique collections—they emphasize that the important fact is whether an item is a member of the set, not the order or number of occurrences.
If your data does represent a logical set, then model it using set data structures, and try to think about manipulating it in terms of set operations. In many cases, you will find that this makes your program substantially easier to reason about, and makes it more self-documenting regarding the source and intended use of the data it contains.
You want to create an association that maps keys to values. You possibly want to maintain a specific ordering of keys.
Use map literals (curly braces) with alternating keys and values to create simple maps:
{
:name
""
:class
:barbarian
:race
:half-orc
:level
20
:skills
[
:bashing
:hacking
:smashing
]}
Keys and values can be any type. Commas may be used to delimit key/value pairs where the structure would be hard to discern at a glance:
{
1
1
,8
64
,2
4
,9
81
}
In Clojure, commas are whitespace, which means that they can be used anywhere in a form with no effect on the value; it is just one way to make your code easier to read.
Create an empty, unsorted map with a pair of braces: {}
.
Create specific types of maps with map constructor
functions. array-map
, hash-map
, and sorted-map
each return a map
of the corresponding type:
(
array-map
)
;; -> {}
(
sorted-map
:key1
"val1"
:key2
"val2"
)
;; -> {:key1 "val1" :key2 "val2"}
If a key occurs multiple times in the argument list, the last value will be that used in the final return map.
Use sorted-map-by
to create a sorted map using a custom comparator:
(
sorted-map-by
#
(
<
(
count
%1
)
(
count
%2
))
"pigs"
14
"horses"
2
"elephants"
1
"manatees"
3
)
;; -> {"pigs" 14, "horses" 2, "manatees" 3, "elephants" 1}
Clojure maps can have one of three distinct concrete implementations:
clojure.lang.PersistentArrayMap
clojure.lang.PersistentHashMap
clojure.lang.PersistentTreeMap
Array maps are the default implementation for small maps (under 10
entries at the time of writing), and hash maps are the default for
larger ones. Sorted maps can only be created by explicitly invoking
the sorted-map
or sorted-map-by
functions.
Using assoc
or conj
on a sorted map will always yield another
sorted map. However, assoc
-ing onto an array map will yield a hash
map once it reaches a certain size. The inverse is not true; using
dissoc
on a hash map will not yield an array map, even if it becomes
very small.
As with sets, there are several ways to retrieve the value of a key.
The most straightforward way is to use the get
function, which, given
a map and a key, returns the value stored at the key or nil
if the
map does not contain the key:
(
get
{
:name
"Kvothe"
:class
"Bard"
}
:name
)
;; -> "Kvothe"
(
get
{
:name
"Kvothe"
:class
"Bard"
}
:race
)
;; -> nil
If desired, you can also pass a third argument to be used as the
default return value instead of nil
if a map doesn’t contain the key:
(
get
{
:name
"Kvothe"
:class
"Bard"
}
:race
"Human"
)
;; -> "Human"
If your map uses keywords as keys, you can use the keyword itself as a
function. Keywords implement the IFn
interface, and when invoked
with a map as an argument, they will look themselves up in the map,
returning the value if it is present or nil
if not. You can also
pass a second argument that will be used as a default return value in
the case of a missing key, just as you can with get
:
(
:name
{
:name
"Marcus"
:class
"Paladin"
})
;; -> "Marcus"
(
:race
{
:name
"Marcus"
:class
"Paladin"
}
"Human"
)
;; -> "Human"
Finally, the third basic way to look up a value in a map is to use
the map itself as a function, passing the key to be retrieved as the
argument. As with get
and keyword functions, it is also possible to
pass a second argument for use as a default value if the key is not
found; otherwise, nil
is returned:
(
def
character
{
:name
"Brock"
:class
"Barbarian"
})
(
character
:name
)
;; -> "Brock"
(
character
:race
)
;; -> nil
(
character
:race
"Human"
)
;; -> "Human"
There is a convenience function for looking up items in nested maps:
get-in
. Instead of passing a single key, you can pass a sequence of
keys, and they will be successively looked up in a nested structure,
as if repeatedly calling get
on each level of the nested data
structure. nil
is returned if any key is missing:
(
get-in
{
:name
"Marcus"
:weapon
{
:type
:greatsword
:damage
"2d6"
}}
[
:weapon
:damage
])
;; -> "2d6"
(
get-in
{
:name
"Marcus"
}
[
:weapon
:damage
])
;; -> nil
get-in
also takes an optional default value, which will be returned
if any key in the nested hierarchy is missing:
(
get-in
{
:name
"Marcus"
}
[
:weapon
:damage
]
"1d2 (fists)"
)
;; -> "1d2 (fists)"
Note that get-in
works with any associative data structure, not
just maps. This means that it can be combined to work with, for
example, indexes of vectors:
(
get-in
[{
:name
"Marcus"
:class
"Paladin"
}
{
:name
"Kvothe"
:class
"Bard"
}
{
:name
"Felter"
:class
"Druid"
}]
[
1
:class
])
;; -> "Bard"
Which technique of the three discussed is the best to use? All have identical semantics, but in idiomatic Clojure, they convey different implications about the scenario in which they are used.
Typically, keyword-as-a-function lookup is used when maps are being used as “objects” and the keys as “fields”; where the map contains a relatively small, well-known set of keys; and when there is a reasonable expectation that the key will actually be present.
The get
function and map-as-a-function lookup techniques, on the
other hand, are more frequently used with large maps where the set of
possible keys is more open-ended. There is less motivation for
choosing between these two; the only difference to be aware of is that
when the map provided is nil
for some reason, using it in function
position will throw an exception, while applying get
to nil
will
simply return nil
.
It is interesting to note, as well, that the ability to use a map itself as a function is not just an arbitrary convenience. In the technical sense of the word “function,” maps are functions of keys to values. Consider the following function definition and map:
(
defn
square
[
x
]
(
*
x
x
))
(
def
square
{
1
1
,2
4
,3
9
,4
16
,5
25
})
Using an invocation of the form (square 3)
, the caller can actually
be agnostic as to whether square
is a “real” function or a map. Of
course, the normally defined function has a number of advantages in
this case. For one, it has an unlimited domain instead of just the
keys enumerated. And the multiplication function is fairly fast, so
precomputing results is not a win. But in some cases, for functions
that do have a more naturally constrained domain and are more
expensive to compute, being able to use a map implementation of a
function can be a real boon to performance.
Because all of the different techniques for retrieving values from a
map return nil
if the key is not present, special handling is
required if you need to differentiate the case in which a key does
exist in a map with a value of nil
from the case in which the key does
not exist at all.
The easiest way to do this is to always provide a default value to be
returned in the case of a missing key. To be absolutely sure that you
can differentiate the default value from any possible value the map
might contain, you can use a namespace-qualified keyword
(e.g., ::not-found
).
It is also possible to use the contains?
function, which takes a
collection and a key, and returns true
if and only if the collection
has a specific entry for that key (even if the value is nil
).
Use vals
and select-keys
when the order of returned values is not
important:
;; How many red and green beans are there?
(
def
beans
{
:red
10
:blue
3
:green
1
})
(
reduce +
(
vals
(
select-keys
beans
[
:red
:green
])))
;; -> 11
Use juxt
when order matters:
;; What are the red and green bean totals?
((
juxt
:red
:green
)
beans
)
;; -> [10 1]
juxt
and the combination of vals
and select-keys
are both apt
tools for retrieving multiple keys from a map. There are subtleties to
their behavior that are important to understand, though.
At first glance, the juxt
approach seems to be the clear winner of
the two. However, it only goes so far: the approach falls
apart when any of the keys you wish to retrieve is not a keyword (more
specifically, not a function). This is because juxt
merely
juxtaposes the return values of multiple functions. Since keywords
are functions, it’s possible to juxt
them and retrieve a
strongly ordered list of values.
If the keys in the beans
map were strings, it would not be possible
to retrieve their values with juxt
:
((
juxt
"a"
"b"
)
beans
)
;; -> ClassCastException java.lang.String cannot be cast to clojure.lang.IFn ...
select-keys
, on the other hand, is capable of pulling values for
any number of arbitrary keys. The select-keys
function takes a map
and a sequence of keys and returns a new map populated with only those
keys:
(
def
weird-map
{
"a"
1
,{
:foo
:bar
}
:baz
,13
31
})
(
select-keys
weird-map
[
"a"
{
:foo
:bar
}])
;; -> {{:foo :bar} :baz, "a" 1}
(
vals
{{
:foo
:bar
}
:baz
,"a"
1
})
;; -> (1 :baz)
Since maps are not ordered, it is not safe to assume that the
ordering of keys and values is identical (even if you stumble upon an
example where it is). In cases where you’re pulling multiple values
from nonkeyword maps, it is probably easiest to wrap that interaction
up via juxt
:
(
def
a-str-then-foo-bar-map
(
juxt
#
(
get
%
"a"
)
#
(
get
%
{
:foo
:bar
})))
(
a-str-then-foo-bar-map
weird-map
)
;; -> [1 :baz]
You’ll avoid weird maps now, won’t you?
The most basic way to change a map is using the assoc
function.
Given a map and any number of additional key/value pairs as arguments,
it will return an updated map containing the respective keys and
values:
(
def
villain
{
:honorific
"Dr."
:name
"Mayhem"
})
(
assoc
villain
:occupation
"Mad Scientist"
:status
:at-large
)
;; -> {:honorific "Dr.", :name "Mayhem",
;; :occupation "Mad Scientist", :status :at-large}
If used on a map that already contains a key, the assoc
function
will return an updated map with the newly specified value for the key:
(
def
villain
{
:honorific
"Dr."
,:name
"Mayhem"
,:occupation
"Mad Scientist"
,:status
:at-large
})
(
assoc
villain
:status
:deceased
)
;; -> {:honorific "Dr.", :name "Mayhem",
;; :occupation "Mad Scientist", :status :deceased}
To remove keys, use the dissoc
function. Given a map and any
number of keys, it returns a map minus those keys:
(
def
villain
{
:honorific
"Dr."
,:name
"Mayhem"
,:occupation
"Mad Scientist"
,:status
:deceased
})
(
dissoc
villain
:occupation
:honorific
)
;; -> {:name "Mayhem", :status :deceased}
It’s fairly common to have maps contained in other maps. If it is
necessary to update a deeply nested value, nested calls to assoc
quickly become inconvenient, especially since they need to be
“inside-out.” Consider the following data structure:
(
def
book
{
:title
"Clojure Cookbook"
:author
{
:name
"Ryan Neufeld"
:residence
{
:country
"USA"
}}})
If Ryan were to move back to his native land of Canada, fully updating
the map representing this book using only assoc
would look something
like the following:
(
assoc
book
:author
(
assoc
(
:author
book
)
:residence
(
assoc
(
:residence
(
:author
book
))
:country
"Canada"
)))
Obviously, this is inconvenient and difficult to read.
The assoc-in
function removes this inconvenience, allowing you to
specify a key path instead of a sole key. Instead of changing a
value one key deep, a key path lists a sequence of keys, applied
recursively to change a deeply nested value:
(
assoc-in
book
[
:author
:residence
:country
]
"Canada"
)
;; -> {:author {:name "Ryan Neufeld"
;; :residence {:country "Canada"}}
;; :title "Clojure Cookbook"}
The preceding sample first looks up the map associated with the
:residence
key in the nested data structure, then associates
"Canada"
with the :country
key. Finally, the entire data structure
is returned.
What if you needed to update a value based on its previous value, instead of just changing it?
Fortunately, Clojure provides update-in
expressly for this purpose.
Instead of taking a new value, update-in
takes an update function.
This function is invoked with the value retrieved at key path and
any trailing arguments passed to update-in
. It’s a peculiar
function to wrap your head around at first. Perhaps it is best to
illustrate with an example:
(
def
website
{
:clojure-cookbook
{
:hits
1236
}})
;; Register 101 new hits to the Cookbook website
(
update-in
website
;
[
:clojure-cookbook
:hits
]
;
+
;
101
)
;
;; -> {:clojure-cookbook {:hits 1337}}
update-in
will also actually create maps for any of the keys in the
vector that don’t exist. This means it can be used to create structure
as well as to update values:
(
update-in
{}
[
:author
:residence
]
assoc
:country
"USA"
)
;; -> {:author {:residence {:country "USA"}}}
Even though the starting map is empty, two empty maps are created for
the values of the :author
and :residence
keys, meaning the assoc
will be applied to a new, empty map.
You’d like to use a value that isn’t a simple primitive type as a lookup key in a map. For example:
Because of its robust identity semantics on composite values, Clojure fully supports using any immutable value as a map key. More importantly, doing so is reasonably efficient.
For example, consider the data structure to represent a chessboard, an 8 × 8 grid where each position can have one of six possible types of piece. Rows are represented by the numbers 1–8, and columns by the letters a–h.
In Clojure, you can represent this directly as a map:
(
def
chessboard
{[
:a
5
]
[
:white
:king
]
[
:a
4
]
[
:white
:pawn
]
[
:d
4
]
[
:black
:king
]})
Moving a piece then requires two operations, dissoc
-ing the old
position for a piece and assoc
-ing the new position:
(
defn
move
"Given a map representing a chessboard, move the piece at src
to dest"
[
board
source
dest
]
(
->
board
(
dissoc
source
)
(
assoc
dest
(
board
source
))))
(
move
chessboard
[
:a
5
]
[
:a
4
])
;; -> {[:d 4] [:black :king]
;; [:a 4] [:white :king]}
As another example of nontraditional map keys, consider the situation where you have a set of functions, and you want to be able to assign them each a “weighting” and multiply the return value by the corresponding weight whenever the function is called.
An easy way to do this is to store the functions and weights in a map, with the functions as keys:
(
def
plus-two
(
partial +
2
))
(
def
plus-three
(
partial +
3
))
(
def
weight-map
{
plus-two
1.0
plus-three
0.8
})
Then you can use a simple wrapper function to apply the functions with the weights applied:
(
defn
apply-weighted
"Given a weight map, a function, and args, applies the function
to the args, multiplying the result by the weighting for the
function. If the weight map does not specify a weight for the
function, a default of 1.0 is used."
[
weight-map
f
&
args
]
(
*
(
get
weight-map
f
1.0
)
(
apply
f
args
)))
(
apply-weighted
weight-map
plus-two
2
)
;; -> 4.0
(
apply-weighted
weight-map
plus-three
1
)
;; -> 3.2
A more traditional way to model the chess game would be with a two-dimensional array, or, in Clojure’s case, with a vector of vectors.
This is certainly a reasonable thing to do, and is (possibly) slightly more performant. However, it is a less clean model of the actual problem domain. It requires a translation, for example, from chess’s row/column numbers and letters to zero-indexed indexes. Using a map lets you store the positions directly, in native chess terminology.
Similarly, there are alternative implementations for the
function-weighting example. It could be implemented using a cond
statement with all the functions and weights enumerated, or by
replacing the functions altogether with a protocol method that could
then have varying implementations with different weights.
However, storing the functions and weights in a map has the benefit of making it obvious at a glance what the weightings for particular functions are. More importantly, it is possible to store multiple different sets of weights, and switch between different weight schemes dynamically at runtime.
You want to treat the contents of a map as a sequence of entries. Alternatively, you want to convert a sequence of entries back into a map.
To obtain a sequence view of a map, simply call seq
on it. Note that
most sequence-processing functions call seq
on their arguments
themselves, so it’s usually not necessary to do this explicitly:
(
seq
{
:a
1
,:b
2
,:c
3
,:d
4
})
;; -> ([:a 1] [:c 3] [:b 2] [:d 4])
This creates a sequence of key/value pairs, which you can then process as you would any sequence.
To create a map from a sequence, you can exploit the fact that
conj
, when applied to a map, can take a two-element vector as a
key/value pair and use it to add the respective key and value on to
the map:
(
def
m
{
:a
1
,:b
2
})
(
conj
m
[
:c
3
])
;; -> {:a 1, :b 2, :c 3}
Because the into
function uses repeated applications of conj
to
add items from one sequence onto a collection, this means it can be
used to transform a sequence of pairs into a single map:
(
into
{}
[[
:a
1
]
[
:b
2
]
[
:c
3
]])
;; -> {:a 1, :b 2, :c 3}
It is also possible to construct a map from two sequences: one
containing keys and one containing values. This is the purpose of the
zipmap
function. Given two sequences, it will return a single map
with keys from the first argument sequence and values from the second:
(
zipmap
[
:a
:b
:c
]
[
1
2
3
])
;; -> {:c 3, :b 2, :a 1}
If one of the sequences passed to zipmap
is shorter than the other,
the extra values will be ignored, and the output map will only contain
entries up to the length of the shortest sequence.
When obtaining a sequence view of a hash map, the map entries will be returned in an arbitrary or undefined order. Conveniently, this order (although arbitrary) is guaranteed to be consistent if the same map is turned into a sequence multiple times.
When using a sorted map, the entries will be returned according to their sort order in the map. For example:
(
seq
(
hash-map
:a
1
,:b
2
,:c
3
,:d
4
))
;; -> ([:a 1] [:c 3] [:b 2] [:d 4])
(
seq
(
sorted-map
:a
1
,:b
2
,:c
3
,:d
4
))
;; -> ([:a 1] [:b 2] [:c 3] [:d 4])
There is another interesting fact about the entry values in this
sequence. They are printed as vectors, and they are vectors insofar
as they implement the full vector interface. However, their concrete
type is not actually clojure.lang.PersistentVector
; rather, they are
a different kind of vector called a map entry, which not only is a
vector but also supports the clojure.lang.MapEntry
interface.
The MapEntry
interface provides key
and val
functions that can
be used to retrieve the key and value of an entry:
(
def
entry
(
first
{
:a
1
:b
2
}))
(
class
entry
)
;; -> clojure.lang.MapEntry
(
key
entry
)
;; -> :a
(
val
entry
)
;; -> :1
These functions should be preferred to using the first
and second
functions on map entries when processing maps as sequences, since they
preserve the semantic of key/value pairs, making the code easier to
read.
Use one of these simple general-purpose functions, modified to suit any needs you have:
(
defn
map-keys
"Given a map and a function, returns the map resulting from applying
the function to each key."
[
m
f
]
(
zipmap
(
map
f
(
keys
m
))
(
vals
m
)))
(
map-keys
{
"a"
1
"b"
2
}
keyword
)
;; -> {:b 2, :a 1}
(
defn
map-vals
"Given a map and a function, returns the map resulting from applying
the function to each value."
[
m
f
]
(
zipmap
(
keys
m
)
(
map
f
(
vals
m
))))
(
map-vals
{
:a
1
,:b
1
}
inc
)
;; -> {:b 2, :a 2}
(
defn
map-kv
"Given a map and a function of two arguments, returns the map
resulting from applying the function to each of its entries. The
provided function must return a pair (a two-element sequence.)"
[
m
f
]
(
into
{}
(
map
(
fn
[[
k
v
]]
(
f
k
v
))
m
)))
(
map-kv
{
"a"
1
"b"
1
}
(
fn
[
k
v
]
[(
keyword
k
)
(
inc
v
)]))
;; -> {:a 2, :b 2}
map-keys
and map-vals
are extremely straightforward. They each
start by breaking the map, m
, down into a sequence of keys and a
sequence of values using the keys
and vals
functions, which return
a sequence of the keys or values of a map, respectively. Then, they
use the map
function to transform either the sequence of keys or the
sequence of vals. Finally, the zipmap
function is used to recombine
the key and value sequences into a single map, with the updates in
place.
map-kv
works a bit differently. It starts by converting the map into
a sequence of map entries, then uses map
to apply them to an
anonymous function that destructures the key and value, and then
passes the key and value to the caller-provided function. Finally, it
uses into
to repeatedly conjoin the resulting pairs onto an empty
map, returning a new map consisting of the transformed keys and values.
The following example is identical, but does not use destructuring, so the high-level structure is a bit more clear:
(
defn
map-kv
[
m
f
]
(
into
{}
(
map
(
fn
[
entry
]
(
f
(
key
entry
)
(
val
entry
)))
m
)))
It is easy to see that these three functions are all riffs on the
standard map
function, applied to map data structures. What about
the other staple of functional programming, reduce
?
Clojure already has a reduce-kv
function built in, which was added in version 1.4.
reduce-kv
takes three arguments: a function, an initial value, and
an associative collection. The provided function must also take three
arguments. reduce-kv
reduces the provided collection by first
applying the function to the initial value, the first key, and its
corresponding value from the map. The resulting value is then
reapplied along with the second key and value, the resulting value
with the third key and value, and so on.
The following example uses reduce-kv
to obtain the sum of all the
values in a map:
(
reduce-kv
(
fn
[
agg
_
val
]
(
+
agg
val
))
0
{
:a
1
:b
2
:c
3
})
;; -> 6
Note that an underscore (_
) is used instead of key
in the function
argument declaration. This is idiomatic in Clojure to name any argument
that isn’t actually used in the body.
It’s also possible to define map-kv
using reduce-kv
:
(
defn
map-kv
[
m
f
]
(
reduce-kv
(
fn
[
agg
k
v
]
(
conj
agg
(
f
k
v
)))
{}
m
))
which could be used in this example:
(
map-kv
{
:one
1
:two
2
:three
3
}
#
(
vector
(
->
%1
str
(
subs
1
))
(
inc
%2
)))
;; -> {"one" 2, "three" 4, "two" 3}
Normally, maps are strictly one value per key: if you assoc
an
existing key, the old value is replaced. However, sometimes it would
be useful to have a map-like interface (a “multimap”) capable
of storing multiple values for the same key.
You would like to create a map-like data structure that implements a multimap-like interface in Clojure.
To introduce such a capability on top of normal maps, create and
extend a protocol MultiAssociative
that defines this behavior:
(
defprotocol
MultiAssociative
"An associative structure that can contain multiple values for a key"
(
insert
[
m
key
value
]
"Insert a value into a MultiAssociative"
)
(
delete
[
m
key
value
]
"Remove a value from a MultiAssociative"
)
(
get-all
[
m
key
]
"Returns a set of all values stored at key in a
MultiAssociative. Returns the empty set if there
are no values."
))
(
defn-
value-set?
"Helper predicate that returns true if the value is a set that
represents multiple values in a MultiAssociative"
[
v
]
(
and
(
set?
v
)
(
::multi-value
(
meta
v
))))
(
defn
value-set
"Given any number of items as arguments, returns a set representing
multiple values in a MultiAssociative. If there is only one item,
simply returns the item."
[
&
items
]
(
if
(
=
1
(
count
items
))
(
first
items
)
(
with-meta
(
set
items
)
{
::multi-value
true
})))
(
extend-protocol
MultiAssociative
clojure.lang.Associative
(
insert
[
this
key
value
]
(
let
[
v
(
get
this
key
)]
(
assoc
this
key
(
cond
(
nil?
v
)
value
(
value-set?
v
)
(
conj
v
value
)
:else
(
value-set
v
value
)))))
(
delete
[
this
key
value
]
(
let
[
v
(
get
this
key
)]
(
if
(
value-set?
v
)
(
assoc
this
key
(
apply
value-set
(
disj
v
value
)))
(
if
(
=
v
value
)
(
dissoc
this
key
)
this
))))
(
get-all
[
this
key
]
(
let
[
v
(
get
this
key
)]
(
cond
(
value-set?
v
)
v
(
nil?
v
)
#
{}
:else
#
{
v
}))))
and, of course, corresponding unit tests (using clojure.test
):
(
require
'
[
clojure.test
:refer
:all
])
(
deftest
test-insert
(
testing
"inserting to a new key"
(
is
(
=
{
:k
:v
}
(
insert
{}
:k
:v
))))
(
testing
"inserting to an existing key (single existing item)"
(
let
[
m
(
insert
{}
:k
:v1
)]
(
is
(
=
{
:k
#
{
:v1
:v2
}}
(
insert
m
:k
:v2
)))))
(
testing
"inserting to an existing key (multiple existing items)"
(
let
[
m
(
insert
(
insert
{}
:k
:v1
)
:k
:v2
)]
(
is
(
=
{
:k
#
{
:v1
:v2
:v3
}}
(
insert
m
:k
:v3
))))))
(
deftest
test-delete
(
testing
"deleting a non-present key"
(
is
(
=
{
:k
:v
}
(
delete
{
:k
:v
}
:nosuch
:nada
))))
(
testing
"deleting a non-present value"
(
is
(
=
{
:k
:v
}
(
delete
{
:k
:v
}
:k
:nada
))))
(
testing
"deleting a single value"
(
is
(
=
{}
(
delete
{
:k
:v
}
:k
:v
))))
(
testing
"deleting one of two values"
(
let
[
m
(
insert
(
insert
{}
:k
:v1
)
:k
:v2
)]
(
is
(
=
{
:k
:v1
}
(
delete
m
:k
:v2
)))))
(
testing
"deleting one of several values"
(
let
[
m
(
insert
(
insert
(
insert
{}
:k
:v1
)
:k
:v2
)
:k
:v3
)]
(
is
(
=
{
:k
#
{
:v1
:v2
}}
(
delete
m
:k
:v3
))))))
(
deftest
test-get-all
(
testing
"get a non-present key"
(
is
(
=
#
{}
(
get-all
{}
:nosuch
))))
(
testing
"get a single value"
(
is
(
=
#
{
:v
}
(
get-all
{
:k
:v
}
:k
))))
(
testing
"get multiple values"
(
is
(
=
#
{
:v1
:v2
}
(
get-all
(
insert
(
insert
{}
:k
:v1
)
:k
:v2
)
:k
)))))
(
run-tests
)
;; -> {:type :summary, :pass 11, :test 3, :error 0, :fail 0}
First, this code defines a protocol to implement the set of functions that comprises the multimap behavior. A protocol is a great choice in this situation: it ties together several methods that perform related operations on the same object, and it allows for multiple concrete implementations.
In this case, there are three methods required to implement the
desired functionality. Note that the protocol implementation does not override or
reimplement any of the core map methods (assoc
, dissoc
, etc.). It is only the
semantics of the new behavior that differ from those of regular maps. Clojure defines very strong semantics around core
functions. Breaking or overriding these expectations is always a bad
idea, especially when using a distinct set of functions makes it clear
when multimap functionality is being used.
The concrete implementation of MultiAssociative
extends the protocol
to the clojure.lang.Associative
interface. It would certainly be
possible to implement it on something more targeted, such as
IPersistentSet
, but since it only requires something associative
for the implementation, it’s best not to be too specific. Coding
against clojure.lang.Associative
also gives several additional
capabilities for free. For example, there is now automatically a
“multivector” that can store multiple values at each index
(provided they are added using insert
).
Reading the code, you’ll notice that a good deal of the actual logic is
devoted to making sure that single values are stored plainly, while
multiple values are wrapped in a set. This is maintained both when
inserting and when deleting items, requiring the functions to run a
check on what type the value is and wrap or unwrap
accordingly. Similarly, get-all
needs to wrap single values in a set
before returning, since it specifies that it must return a set.
This is a design decision that has several benefits and trade-offs. The alternative would be to always wrap the values in a set, even single values. This would make the code a bit simpler and would eliminate most of the type checking as well as the wrapping and unwrapping of forms.
However, the simplicity would come with a price. If values (even
single values) were always wrapped in a set, the map being used as a
multimap would not be easily usable via the normal map functions. It
would contain a lot of odd-looking single-item sets, and if anything
were added to it using assoc
, it would be incompatible with future
uses of insert
on that key.
In essence, the wrapping and unwrapping is to allow any map to be
usable via both the standard Associative
and the
MultiAssociative
interfaces, without requiring the user to keep
track of which “kind” of a map it is. Values inserted using assoc
can be read with get-all
, and values inserted using insert
can be
removed with dissoc
. All expectations regarding normal maps should
hold. In the case of a normal get
on a key with multiple values, a
set containing multiple items will be returned. This is probably what
the user would expect upon inspecting the data.
There is one more feature of this code that deserves commentary: the
use of ::multi-value
metadata on the sets used to store multiple
values, applied and tested using the value-set
and value-set?
functions.
This is to handle the edge case where the intended value for a key is itself a set. The code needs a way to disambiguate between sets it creates in order to manage multiple keys for a value and sets that are simply values provided by users.
This is accomplished by placing metadata on sets created to contain
values. A namespace-scoped keyword is used to ensure that it will not
collide with any possible existing metadata on values provided by the
user. Then, all the code has to do is check if a set has the
::multi-value
metadata to know whether it’s a set containing values,
or is itself a value.
Use merge
to combine two or more maps with no keys in common:
(
def
arizona-bird-counts
{
:cactus-wren
8
})
(
def
florida-bird-counts
{
:gull
20
:pelican
14
})
(
merge
florida-bird-counts
arizona-bird-counts
)
;; -> {:pelican 14, :cactus-wren 8, :gull 20}
Use merge-with
when you want more explicit control of the merge
strategy for keys that exist in more than one map:
(
def
florida-bird-counts
{
:gull
20
:pelican
1
:egret
4
})
(
def
california-bird-counts
{
:gull
12
:pelican
4
:jay
3
})
;; Merge values with + to get their totals
(
merge-with +
california-bird-counts
florida-bird-counts
)
;; -> {:pelican 5, :egret 4, :gull 32, :jay 3}
In both merge
and merge-with
, maps are combined from left to
right, returning a new immutable map as a result. This functions much
like a “left fold.” merge
is the simpler function of the pair,
always returning the last value it sees for every key.
When mappings for the same key exist in more than one map, the latter mapping is used in the result. This can be useful, for example, when you receive new totals throughout the day, but only for values that have changed:
;; Favorite ice cream flavor votes throughout the day
(
def
votes-am
{
:vanilla
3
:chocolate
5
})
(
def
votes-pm
{
:vanilla
4
:neapoliton
2
})
(
merge
votes-am
votes-pm
)
;; -> {:vanilla 4, :chocolate 5, :neapoliton 2}
merge-with
facilitates powerful recipes for map combination by
allowing you to control how values are merged. You can imagine
merge-with
as reduce
for maps with common keys. The first argument
to merge-with
is a merge function that will be invoked for each pair
of duplicated values.
With careful choice of map value types, merge-with
provides some
concise solutions to common problems. For example, by merging with
clojure.set/intersection
, you can find the intersection of “like” and
“dislike” sets in a team of programmers:
(
def
Alice
{
:loves
#
{
:clojure
:lisp
:scheme
}
:hates
#
{
:fortran
:c
:c++
}})
(
def
Bob
{
:loves
#
{
:clojure
:scheme
}
:hates
#
{
:c
:c++
:algol
}})
(
def
Ted
{
:loves
#
{
:clojure
:lisp
:scheme
}
:hates
#
{
:algol
:basic
:c
:c++
:fortran
}})
(
merge-with
clojure.set/intersection
Alice
Bob
Ted
)
;; -> {:loves #{:scheme :clojure}, :hates #{:c :c++}}
It is also possible to merge nested maps by creating a recursive merge function:
(
defn
deep-merge
[
&
maps
]
(
apply merge-with
deep-merge
maps
))
(
deep-merge
{
:foo
{
:bar
{
:baz
1
}}}
{
:foo
{
:bar
{
:qux
42
}}})
;; -> {:foo {:bar {:qux 42, :baz 1}}}
As you saw in the previous examples, merge-with
is a versatile tool: we
used +
to add values of the same key, clojure.set/intersection
to
find shared values of multiple sets, and a recursive function
deep-merge
to recursively merge nested maps. merge-with
is a
very powerful function, indeed.
You want to compare two values according to some comparison function, or you want to sort a collection by comparing all the items in it.
Use the clojure.core/compare
function to compare two items. They must be
comparable with respect to each other. For example, a double
can be
compared to a ratio because they’re both numbers, but a string can’t
be compared to a vector.
compare
returns a negative number if the first argument is less than
the second, zero if it is logically equal, and a positive number if it
is greater:
(
compare
5
2
)
;; -> 1
(
compare
0.5
1
)
;; -> -1
(
compare
(
/
1
4
)
0.25
)
;; -> 0
(
compare
"brewer"
"aardvark"
)
;; -> 1
To sort an entire collection, pass it to the clojure.core/sort
function. sort
applies compare
as needed and returns a sorted
sequence.
For example, the following code breaks down a string into a sequence
of characters (sort
calls seq
on its argument), then sorts
them. The result is concatenated back to a string, for better
readability:
(
apply str
(
sort
"The quick brown fox jumped over the lazy dog"
))
;; -> " Tabcddeeeefghhijklmnoooopqrrtuuvwxyz"
As seen previously, many of Clojure’s data types have a natural
comparison order, which is what compare
uses. For example, numbers,
dates, and strings all sort as one would expect, from low to high,
based on the well-understood and accepted inherent ordering between
them.
If you want to sort a data type that does not have a natural ordering,
or if you want to override the natural sort (such as sorting a set
from high to low), you are not limited to using the built-in
comparator function. sort
allows you to specify a custom comparison
function that can perform any operation you like to determine the
relative ordering between two items. This function must take two
arguments. It can return values like compare
does (that is, a
positive integer, a negative integer, or zero). Alternatively, it can
return a Boolean value (i.e., a predicate function). The predicate
function should return true
if and only if the first argument should be sorted before the second argument.
This means that you can pass regular Clojure predicates to sort
:
(
sort >
[
1
4
3
2
])
;; -> (4 3 2 1)
(
sort >
[
1
4
3
2
])
;; -> (1 2 3 4)
Or, you can write your own arbitrary comparator. For example, the custom comparator used in the next example cares only about the length of a string, not the contents of it; strings will be sorted as equal if they have the same number of characters, whatever those characters are:
(
sort
#
(
<
(
count
%1
)
(
count
%2
))
[
"z"
"yy"
"zzz"
"a"
"bb"
"ccc"
])
;; -> ("z" "a" "yy" "bb" "zzz" "ccc")
Under the hood, Clojure uses Java’s built-in sort mechanism. Java uses a slightly modified merge sort algorithm that is highly performant for the vast majority of cases. It requires n log(n) comparisons in the worst case and performs at near O(n) when the input is largely sorted already.
The sort is also stable, meaning that if two items are equal in terms of the comparator function being used, their relative ordering will remain unchanged after sorting.
Although you can use any predicate as a comparison function or write your own comparison function that returns a positive/negative/zero integer, the actual function must behave properly in order to work. Specifically, it must:
x
is sorted before y
, and y
is sorted before z
, then x
must always be sorted before z
. In other words, there must
always be a single fully deterministic sort order for a given
collection and comparator, without any contradictions or
inconsistencies caused by the comparison function.
=
function
0
. When using a
predicate function, (pred x y)
and (pred y x)
should return the
same thing in the case where x
and y
are equal.
Clojure fully participates in Java’s comparison and sorting
mechanisms. All Clojure objects that have a natural order implement
java.lang.Comparable
and implement the compareTo
method.
More importantly, every Clojure function actually implements the
java.util.Comparator
interface. This means that you can pass a Clojure function to any Java
method that requires an instance of java.util.Comparator
, and it
will invoke the function with two arguments. This is what allows you
to pass arbitrary Clojure functions as the comparator to sort
. The
function object itself is actually being used as a Java comparator,
and invoking the Java .compare
method on a Clojure function will
actually call it, passing it the two values being compared as two
arguments.
Because predicate functions (those returning a Boolean value) do not
map exactly to the positive/negative/zero integer return values
expected from a java.util.Comparator
, Clojure itself handles the
logical mapping between them. If a function used as a comparator (that
is, (pred x y)
) returns true
, the implementation will return -1
,
indicating that x
is less than y
in the given sort. If not, it will
invoke the function again with the arguments reversed. If (pred x y)
and (pred y x)
are both false
, it is assumed that the objects are
equal, and the implementation returns 0
. Otherwise, it presumes x
is greater than y
and returns 1
.
Sometimes, you want to sort a collection not by the values themselves, but by some derivative function of the values. For example, say you have the following data, and you’d like to sort alphabetically by name. Unfortunately, maps don’t have a natural sort, so you’ll need to tell Clojure how to sort the data:
(
def
people
[{
:name
"Luke"
:role
:author
}
{
:name
"Ryan"
:role
:author
}
{
:name
"John"
:role
:reviewer
}
{
:name
"Travis"
:role
:reviewer
}
{
:name
"Tom"
:role
:reviewer
}
{
:name
"Meghan"
:role
:editor
}])
One option would be to use a custom comparator, which extracts the
:name
key and then invokes compare
on it:
(
sort
#
(
compare
(
:name
%1
)
(
:name
%2
))
people
)
;; -> ({:name "John", :role :reviewer}
;; {:name "Luke", :role :author}
;; {:name "Meghan", :role :editor}
;; {:name "Ryan", :role :author}
;; {:name "Tom", :role :reviewer}
;; {:name "Travis", :role :reviewer})
However, there’s an easier way. The sort-by
function works the same
as sort
, but takes an additional function keyfn
as an argument to
apply to the elements before sorting them. Instead of sorting on the
elements themselves, it sorts the result of applying keyfn
to the
elements.
So, passing in :name
as the keyfn
(as discussed in Recipe 2.16, “Retrieving Values from a Map”, keywords are
functions that look themselves up in a map), you can call:
(
sort-by
:name
people
)
;;-> ({:name "John", :role :reviewer}
;; {:name "Luke", :role :author}
;; {:name "Meghan", :role :editor}
;; {:name "Ryan", :role :author}
;; {:name "Tom", :role :reviewer}
;; {:name "Travis", :role :reviewer})
Like sort
, sort-by
also takes an optional comparator function
that it will use to compare the values extracted by the keyfn
.
For another example, the following expression uses the str
function
as a keyfn
to sort the numbers from 1 to 20 not on their numeric
value, but lexographically as strings (meaning that “2” is greater
than “10,” etc.). It also demonstrates using a custom comparator to
specify the results in descending order:
;; Descending lexographic order
(
sort-by str
#
(
*
-1
(
compare
%1
%2
))
(
range
1
20
))
;; -> (9 8 7 6 5 4 3 2 19 18 17 16 15 14 13 12 11 10 1)
Some compositive data structures can also be compared if they
implement Comparable
, are of the same type, and contain comparable
values. The comparison order is implementation dependent. For example,
by default, vectors are compared first by their length, then by the
result of applying compare
to their first value, then to their second
value if the first is equal, etc.:
(
sort
[[
2
1
]
[
1
]
[
1
2
]
[
1
1
1
]
[
2
]])
;; -> ([1] [2] [1 2] [2 1] [1 1 1])
Some data structures are not comparable. For example, the fact that a set is defined to be unordered means that a meaningful greater-than/less-than comparison is not possible in the general case, so no comparison is provided.
java.lang.Comparable
java.util.Comparator
You have a sequence of elements and you want to remove any duplicates, while possibly preserving the order of elements.
When the sequence of elements you’re working with is of a bounded,
reasonable size, use set
to coerce the collection into a hash set
containing only distinct values:
(
set
[
:a
:a
:g
:a
:b
:g
])
;; -> #{:a :b :g}
When the sequence is infinite, or you wish to maintain ordering, use
distinct
to return a lazy sequence of unique values in a collection
in the order they appear:
(
distinct
[
:a
:a
:g
:a
:b
:g
])
;; -> (:a :g :b)
There are a number of trade-offs between these two approaches. For
starters, set
consumes the entire sequence to produce a new set
collection. Because of this, set
cannot be used to filter an
infinite sequence. distinct
, on the other hand, is designed for
consuming lazy sequences. The value of distinct
is a lazy view or
projection over another sequence, yielding new values the first time
they appear:
(
defn
rand-int-seq
"Returns an infinite sequence of ints from [0, n)"
[
n
]
(
repeatedly
#
(
rand-int
n
)))
;; Taking the set of an infinite sequence will *never* return:
;; (set (rand-int-seq 10)) ; don't do it!
;; However, if you limit the seq, set will work
(
set
(
take
10
(
rand-int-seq
10
)))
;; -> #{0 1 2 3 4 7 8 9}
;; distinct works no matter what
(
take
10
(
distinct
(
rand-int-seq
10
)))
;; -> (8 3 4 6 0 5 9 7 1 2)
Since distinct
produces new values as it sees them, it does
maintain ordering. set
, on the other hand, returns an unordered set.
If distinct
is both ordered and lazy over sequences of any length,
what is the advantage of set
? Speed. Using distinct
is by far the
slowest option; simply calling set
is about two times faster.
Use some
, along with a set:
(
some
#
{
1
2
}
(
range
10
))
;; -> 1
(
some
#
{
10
}
(
range
10
))
;; -> nil
Since sets can act like functions, they can be used as predicates to test whether the argument is a member of the set. This idiom will
test each item in a collection, returning either the first match or
nil
if a match couldn’t be found. However, a problem arises when nil
or false
is a member of the set
you’re using to test a collection with. Consider the following:
(
if
(
some
#
{
nil
}
[
1
2
nil
3
])
::found
::not-found
)
;; -> :user/not-found
(
if
(
some
#
{
false
}
[
1
2
false
3
])
::found
::not-found
)
;; -> :user/not-found
Because the some
function returns the value returned from the
predicate function, not just true
or false
, using it with sets
that contain nil
or false
probably isn’t what you want—it will
return nil
or false
if the item actually is in the set. The simplest solution is to test for nil
or false
separately,
using the nil?
or false?
predicate functions built into Clojure:
(
if
(
some nil?
[
nil
false
])
::found
::not-found
)
;; -> :user/found
(
if
(
some false?
[
nil
false
])
::found
::not-found
)
;; -> :user/found
Or, to test both at once:
(
if
(
some
#
(
or
(
false?
%
)
(
nil?
%
))
[
nil
false
])
::found
::not-found
)
;; -> :user/found
You want to implement a data structure in Clojure with very specific performance characteristics.
For example, you need fast, efficient in-memory searches across a large, random, and ever-changing dataset.
After identifying that Clojure’s core data structures are not appropriate for your domain, your first step is to determine what data structure is appropriate.
For the purpose of this recipe, assume you are trying to choose and implement a data structure appropriate for fast in-memory search of a large, random, and ever-changing dataset. At first, a binary search tree (BST) seems like a good solution. A BST is most efficient over a sorted dataset, however. Adding and removing large amounts of data may unbalance a BST and degenerate its performance to that of a linked list.
Red-black trees (RBTs) are similar to BSTs, but are self-balancing. This would be an appropriate data structure for the dataset in question.
The next step is to implement the data structure itself. The
implementation of RBTs relies on pattern matching. Use
core.match
to simplify the
implementation of an RBT. Add [org.clojure/core.match "0.2.0"]
to your project’s
dependencies, or start a REPL with lein-try
:
$ lein try org.clojure/core.match
First, implement the core of an RBT, the balance
and insert-val
functions. By using core.match
, it is possible to succinctly express
the required behaviors based on the shape of the tree:
(
require
'
[
clojure.core.match
:refer
[
match
]])
(
defn
balance
"Ensures the given subtree stays balanced by rearranging black nodes
that have at least one red child and one red grandchild"
[
tree
]
(
match
[
tree
]
[(
:or
;; Left child red with left red grandchild
[
:black
[
:red
[
:red
a
x
b
]
y
c
]
z
d
]
;; Left child red with right red grandchild
[
:black
[
:red
a
x
[
:red
b
y
c
]]
z
d
]
;; Right child red with left red grandchild
[
:black
a
x
[
:red
[
:red
b
y
c
]
z
d
]]
;; Right child red with right red grandchild
[
:black
a
x
[
:red
b
y
[
:red
c
z
d
]]])]
[
:red
[
:black
a
x
b
]
y
[
:black
c
z
d
]]
:else
tree
))
(
defn
insert-val
"Inserts x in tree.
Returns a node with x and no children if tree is nil.
Returned tree is balanced. See also `balance`"
[
tree
x
]
(
let
[
ins
(
fn
ins
[
tree
]
(
match
tree
nil
[
:red
nil
x
nil
]
[
color
a
y
b
]
(
cond
(
<
x
y
)
(
balance
[
color
(
ins
a
)
y
b
])
(
>
x
y
)
(
balance
[
color
a
y
(
ins
b
)])
:else
tree
)))
[
_
a
y
b
]
(
ins
tree
)]
[
:black
a
y
b
]))
With insertion and balance out of the way, the only remaining
function to implement is a find-val
function for testing if a value
is present in an RBT. The easiest way to do this is by breaking down
individual tree nodes with match
and recursively scanning for the
desired value:
(
defn
find-val
"Finds value x in tree"
[
tree
x
]
(
match
tree
nil
nil
[
_
a
y
b
]
(
cond
(
<
x
y
)
(
recur
a
x
)
(
>
x
y
)
(
recur
b
x
)
:else
x
)))
With all of this in place, it is now possible to create and query an RBT:
(
def
rb-tree
(
reduce
insert-val
nil
(
range
4
)))
rb-tree
;; -> [:black [:black nil 0 nil] 1 [:black nil 2 [:red nil 3 nil]]]
(
find-val
rb-tree
2
)
;; -> 2
(
find-val
rb-tree
100
)
;; -> nil
For anyone who has ever had to implement a red-black tree—or at
least attended a class in computer science where the algorithm was
taught—the implementation of balance
might seem extremely short.
The reason for this is threefold:
balance
and find-val
use core.match
to codify logic as patterns
to match.
The two latter points are related, as you’ll see shortly.
Conveniently enough, core.match
allows us to match on the shape and
values of a data structure and perform structural binding at
the same time. For example, the following tries to match a-vector
against two clauses:
(
def
a-vector
[
1
2
3
])
(
match
a-vector
[
_
y
]
(
str
"Got y: "
y
)
[
_
_
z
]
(
str
"Got z: "
z
))
;; -> "Got z: 3"
The first clause matches a two-element vector, whereas the second
matches a three-element vector. Given that a-vector
has exactly three
elements, it matches the second clause. In the expression that
follows, named values (such as z
) are bound to the positions they
match.
This is why it was convenient to represent nodes as vectors—it makes pattern matching against them a breeze:
(
def
rb-node
[
:red
nil
3
[
:black
nil
4
nil
]])
(
match
rb-node
[
:red
left
value
right
]
(
str
"Red node with value: "
value
)
[
:black
left
value
right
]
(
str
"Black node with value: "
value
))
;; -> "Red node with value: 3"
Assuming this new custom data structure meets your performance
criteria, what is left? (You are benchmarking all of your custom data
structures, right?) Unlike built-in data structures, this custom data
structure doesn’t work with core functions such as map
and
filter
.
In the second part of this recipe, Recipe 2.28, “Implementing Custom Data Structures: Red-Black Trees—Part II”, we’ll rectify this situation by participating in the core sequence abstraction.
You want to use Clojure’s core sequence functions (conj
, map
,
filter
, etc.) with your custom data structure.
In part one of this recipe (Recipe 2.27, “Implementing Custom Data Structures: Red-Black Trees—Part I”), you implemented all the functions necessary for creating an efficient red-black tree. What’s missing is participation in Clojure’s sequence abstraction.
The most important part of participating in sequence abstraction
is the ability to expose values of a data structure sequentially. The
built-in tree-seq
is well suited for this task. One extra step is
needed, however; tree-seq
returns a sequence of nodes, not values.
Here’s the final rb-tree->seq
function:
(
defn-
rb-tree->tree-seq
"Return a seq of all nodes in an red-black tree."
[
rb-tree
]
(
tree-seq
sequential?
(
fn
[[
_
left
_
right
]]
(
remove nil?
[
left
right
]))
rb-tree
))
(
defn
rb-tree->seq
"Convert a red-black tree to a seq of its values."
[
rb-tree
]
(
map
(
fn
[[
_
_
val
_
]]
val
)
(
rb-tree->tree-seq
rb-tree
)))
(
rb-tree->seq
(
->
nil
(
insert-val
5
)
(
insert-val
2
)))
;; -> (5 2)
Since RBTs most closely resemble sets, they should adhere well to the
IPersistentSet
interface. Extend the IPersistentSet
and IFn
protocols to a new RedBlackTree
type, implementing all of the
necessary functions. It’s also wise to implement the multimethod
print-method
for RedBlackTree
, as the default implementation will
fail for RedBlackTree
as implemented:
(
deftype
RedBlackTree
[
tree
]
clojure.lang.IPersistentSet
(
cons
[
self
v
]
(
RedBlackTree.
(
insert-val
tree
v
)))
(
empty
[
self
]
(
RedBlackTree.
nil
))
(
equiv
[
self
o
]
(
if
(
instance?
RedBlackTree
o
)
(
=
tree
(
.tree
o
))
false
))
(
seq
[
this
]
(
if
tree
(
rb-tree->seq
tree
)))
(
get
[
this
n
]
(
find-val
tree
n
))
(
contains
[
this
n
]
(
boolean
(
get
this
n
)))
;; (disjoin [this n] ...) ;; Omitted due to complexity
clojure.lang.IFn
(
invoke
[
this
n
]
(
get
this
n
))
Object
(
toString
[
this
]
(
pr-str
this
)))
(
defmethod
print-method
RedBlackTree
[
o
^
java.io.Writer
w
]
(
.write
w
(
str
"#rbt "
(
pr-str
(
.tree
o
)))))
disjoin
and the corresponding remove-val
functions are left as
exercises for the reader.
It is now possible to use a RedBlackTree
instance like any other
collection—in particular, instances act like sets:
(
into
(
->RedBlackTree
nil
)
(
range
2
))
;; -> #rbt [:black nil 0 [:red nil 1 nil]]
(
def
to-ten
(
into
(
->RedBlackTree
nil
)
(
range
10
)))
(
seq
to-ten
)
;; -> (3 1 0 2 5 4 7 6 8 9)
(
get
to-ten
9
)
;; -> 9
(
contains?
to-ten
9
)
;; -> true
(
to-ten
9
)
;; -> 9
(
map inc
to-ten
)
;; -> (4 2 1 3 6 5 8 7 9 10)
In the end, it doesn’t take a lot to participate in the sequence
abstraction. By implementing a small handful of interface functions, the
red-black tree implementation from Recipe 2.27, “Implementing Custom Data Structures: Red-Black Trees—Part I”, can
participate in an array of sequence-oriented functions: map
,
filter
, reduce
, you name it.
At its essence, clojure.lang.IPersistentSet
is an abstraction of what it
means to represent a mathematical set structure; this matches a tree data
structure well. A set isn’t a list or sequence, though. So how is RedBlackTree
then said to be participating in the sequence abstraction?
In Clojure, types extending the clojure.lang.ISeq
interface are true
sequences, represented as a logical list of head and tail. While
IPersistentSet
does not inherit from ISeq
, it does share a common
ancestry with it. Both interfaces extend
clojure.lang.IPersistentCollection
and its parent
clojure.lang.Seqable
. As luck would have it,[5] sequence functions rely on collections being
Seqable
, not ISeq
. Since RedBlackTree
can be read as a
sequence, it is Seqable
and can be operated on by all of the
sequence functions you know and love.
Most of the functions in the IPersistentSet
interface are self-explanatory, but some deserve further explanation. The function cons
is a historical name for constructing a new list by appending a value
to an existing list. seq
is intended to produce a sequence from a
collection, or nil
if empty:
public
interface
IPersistentSet
extends
IPersistentCollection
,
Counted
{
public
IPersistentSet
disjoin
(
Object
key
);
public
boolean
contains
(
Object
key
);
public
Object
get
(
Object
key
);
}
public
interface
IPersistentCollection
extends
Seqable
{
int
count
();
IPersistentCollection
cons
(
Object
o
);
IPersistentCollection
empty
();
boolean
equiv
(
Object
o
);
}
public
interface
Seqable
{
ISeq
seq
();
}
The most challenging part of any Seqable
implementation is actually
making a sequence out of the underlying data structure. This would be
particularly challenging if you needed to write your own lazy
tree-traversal algorithms, but luckily Clojure has a built-in function,
tree-seq
, that does precisely this. By leveraging tree-seq
to
produce a sequence of nodes, it is trivial to write an rb-tree->seq
conversion function that lazily traverses a RedBlackTree
, yielding
node values as it goes.
tree-seq
accepts three arguments:
branch?
true
if a node is a branch (not a leaf node). For RedBlackTree
, sequential?
is an adequate check, as every node is a vector.
children
root
tree-seq
performs a depth-first traversal of trees. Given how
red-black trees are represented, this will not be an ordered
traversal.
With a sequence conversion function in hand, it is easy enough to
write the seq
function. Similarly, cons
and empty
are a breeze—simply utilize the existing tree functions. Equality testing can be a
bit more difficult, however.
For the sake of simplicity, we chose to implement equality (equiv
)
between only RedBlackTree
instances. Further, the implementation
compares a sorted sequence of their elements. In this case, equiv
is
answering the question, “Do these trees have the same values?” and not
the question, “Are these the same trees?” It’s an important
distinction, one you’ll need to consider carefully when implementing
your own data structures.
As discussed in Recipe 2.26, “Determining if a Collection Holds One of Several Values”, one of the big
bonuses of sets is their ability to be invoked just like any other
function. It’s easy enough to provide this ability to
RedBlackTree
s too. By implementing the single-arity invoke
function of the clojure.lang.IFn
interface, RedBlackTree
s can be
invoked like any other function (or set, for that matter):
(
some
(
rbt
[
2
3
5
7
])
[
6
])
;; -> nil
((
rbt
(
range
10
))
3
)
;; -> 3
Even with the full IPersistentSet
interface implemented, there are
still a number of conveniences RedBlackTree
is lacking. For one, you
need to use the kludgy /→RedBlackTree
or RedBlackTree.
functions
to create a new RedBlackTree
and add values to it manually. By
convention, many built-in collections provide convenience functions
for populating them (aside from literal tags like []
or {}
, of
course).
It’s easy enough to mirror vec
and vector
for RedBlackTree
s:
(
defn
rbt
"Create a new RedBlackTree with the contents of coll."
[
coll
]
(
into
(
->RedBlackTree
nil
)
coll
))
(
defn
red-black-tree
"Creates a new RedBlackTree containing the args."
[
&
args
]
(
rbt
args
))
(
rbt
(
range
3
))
;; -> #rbt [:black [:black nil 0 nil] 1 [:black nil 2 nil]]
(
red-black-tree
7
42
)
;; -> #rbt [:black nil 7 [:red nil 42 nil]]
You may also have noticed printing is not a concern of the sequence
abstraction, although it is certainly an important consideration to
make for developing developer- and machine-friendly data structures.
There are two types of printing in Clojure: toString
and pr
-based
printing. The toString
function is intended for printing
human-readable values at the REPL, while the pr
family of functions
are meant (more or less) to be readable by the Clojure reader.
To provide our own readable representation of RBT, we must implement
print-method
(the heart of pr
) for the RedBlackTree
type. By
writing in a “tagged literal” format (e.g., #rbt
), it is possible to
configure the reader to ingest and hydrate written values as
first-class objects:
(
require
'
[
clojure.edn
:as
edn
])
;; Recall ...
(
defmethod
print-method
RedBlackTree
[
o
^
java.io.Writer
w
]
(
.write
w
(
str
"#rbt "
(
pr-str
(
.tree
o
)))))
(
def
rbt-string
(
pr-str
(
rbt
[
1
4
2
])))
rbt-string
;; -> "#rbt [:black [:black nil 1 nil] 2 [:black nil 4 nil]]"
(
edn/read-string
rbt-string
)
;; -> RuntimeException No reader function for tag rbt ...
(
edn/read-string
{
:readers
{
'rbt
->RedBlackTree
}}
rbt-string
)
;; -> #rbt [:black [:black nil 1 nil] 2 [:black nil 4 nil]]