Chapter 5. Data and State

As discussed earlier, ClojureScript is a member of the functional family of programming languages, meaning that the function is the primary unit of abstraction and composition. You can view any ClojureScript program as a collection of functions, and interpret its structure by observing the function call graph.

However, with only a very slight shift in viewpoint, you can also understand any functional program in terms of the data that it manipulates and how that data flows through the system. Every function takes some data as arguments and returns data when it is complete. Usually, the end goal of a program is not to invoke certain execution paths, but to create, retrieve, or transform data in one form or another. Functions are simply the tool for doing so. In a very real sense, one could say that “data-oriented programing” is a synonym for “functional programming.”

Clojure and ClojureScript recognize this, and therefore provide a carefully-designed set of data primitives and composite data structures that are both easy to use and philosophically aligned with basic theories about what data is. It is a common remark among experienced Clojure programmers that they came to Clojure for the concurrency, but stayed for the data structures. ClojureScript brings these data structures and their associated mindset to the browser, where they have proven to be an equally good fit.

Primitives

ClojureScript provides a small set of primitive data types. Each type maps directly to one of JavaScript’s native types. As in JavaScript, all of ClojureScript’s primitives are immutable, meaning that each is a value unto itself and cannot be changed. Immutability is an important feature of ClojureScript, and will be discussed in much greater detail later on.

Strings

Strings represent textual data, as a sequence of characters. They can be entered as literals in a ClojureScript program using double quotes. Although they are primitives, it is also possible to obtain a sequence view of a string as a sequence of characters (see the next chapter).

Under the hood, ClojureScript strings are JavaScript strings, and may be freely passed to (or received from) JavaScript functions and libraries that expect (or return) strings.

Keywords

Keywords are very similar to strings in that they are sequences of characters. Actually, in ClojureScript, they are nearly identical to strings except for their intended use. Typically keywords are used as keys in maps, for constants, or for enumerated sets of values. As a rule of thumb, it is idiomatic to use a keyword wherever the value is of interest to the program or programmer, rather than data for the user.

In Clojure, keywords are guaranteed to be interned (that is, all instances of the same keyword will always refer to the same object in memory, making them very efficient). This is not the case in ClojureScript, since at the JavaScript level keywords are implemented as plain old JavaScript strings. Still, it is good practice to create keywords only for a constrained, finite set of values.

Keywords may optionally be namespace qualified, meaning that they have a separate namespace component to them, and are logically associated with a particular namespace (namespaces are discussed in Chapter 7). To create a namespace-qualified keyword as a literal, include a slash in the keyword. For example, :my-ns/foo creates a keyword with a name of “foo” in a namespace called “my-ns.” You can also use a double leading colon to create a namespace-qualified keyword in the current namespace (e.g., ::foo).

Keywords also support some additional operations; for example, they can be used as functions that know how to look themselves up in maps, which we will see later.

Symbols

Symbols are also very similar to strings, and like keywords, they can be namespace-qualified. In ClojureScript they are used almost exclusively as named bindings (i.e., “variable” names, even though ClojureScript doesn’t really have variables as such). The literal form of a symbol is simply the raw text (foo), with a slash if it is namespace-qualified (foo/bar).

There is typically no reason to create or use symbols as data in your program, unless you’re working with macros (discussed in Chapter 8). Although they’re a key part of the data that represents your program itself (remember, Lisp code is data), keywords or strings are usually better choices for the data your program actually manipulates.

In ClojureScript, symbols are also implemented as JavaScript strings.

Characters

Characters are a single textual character, and can be expressed as literals with a leading backslash (e.g., a for the character “a”).

Since JavaScript doesn’t have a native character type, ClojureScript characters are implemented as single-character strings, and behave identically to strings.

Numbers

ClojureScript’s numbers are the same as JavaScript numerics and can be either integers or floating-point numbers. They are expressed literally as numerals (for example, 42 or 3.14). Conversions and coercions between integer and floating point happen automatically; ClojureScript has the same arithmetic semantics as JavaScript.

You can pass a ClojureScript numeric value to any JavaScript function that expects a numeric, and receive them the same way.

Unlike Clojure, ClojureScript does not currently support additional numeric types such as Ratio, BigDecimal, or BigInteger.

Booleans

Boolean values are always one of two values, true or false, representing logical truth and falsehood, respectively. ClojureScript Booleans, like strings and numerics, are implemented directly as JavaScript Boolean values and may be used accordingly in interop scenarios.

Note

Note that although the basic values for Boolean true and false are the same in ClojureScript and JavaScript, the semantics of what constitutes truth can be different. For example, the number zero, when used in a Boolean expression, is false in JavaScript but true in ClojureScript. See the sidebar on “Truthiness” in Chapter 4.

To use a Boolean as a literal, just type one of the special symbols true or false.

Functions

In ClojureScript (like JavaScript), functions are first-class entities and, as befits a functional programming language, are themselves data. They can be created using the syntax discussed in the previous chapter, and once created can be passed around and added to composite data structures like any other data.

Importantly, ClojureScript functions are implemented as plain old JavaScript functions. This means that they can be passed to any JavaScript library that takes a function as a callback (for example), and given a JavaScript function, you can invoke it using ClojureScript syntax. (Unless, of course, the JavaScript function contains a reference to this. Internally, ClojureScript always invokes functions using their call method and passes in nil as the value for this.)

nil

ClojureScript’s nil is identical to null in JavaScript; it is used where a value is logically absent, empty, or meaningless. To use it as a literal, just use the special symbol nil. ClojureScript does not use JavaScript’s undefined value, but you can refer to it as js/undefined.

Table 5-1. Quick reference for primitive data types
ClojureScript typeliteralexample(s)JS type
stringdouble quotes“string”string
symbolplain characterssymbolstring
keywordleading colon:keywordstring
characterleading backslashcstring
numberliteral number42, 3.14numeric
boolean‘true’ or ‘false’true, falseboolean
function(fn ...) or #(...)#(* 2 %)function
nil‘nil’nilnull

Data Structures

ClojureScript also provides a full complement of composite collection types. These collections can contain ClojureScript’s primitive types or other collections, as well as any other object that JavaScript itself supports. However, using non-ClojureScript objects as values in ClojureScript collections may invalidate some of the guarantees ClojureScript can make regarding equality semantics and serializability.

ClojureScript collections that contain only primitives or other ClojureScript collections do make certain guarantees:

Equality

Collections with the same semantics containing the same values are considered equal for all purposes, even if they are different instances in the JavaScript VM. ClojureScript equality is always value-dependent, and the value of a collection is defined in terms of its contents. Note that this is true even across implementations, as long as the semantics of the collection are the same. For example, a map can only be equal to another map, but that map may be any of the alternative map implementations (see the section on maps below).

Serializability

Obtaining the string value of a collection always results in a string that, when read back using the ClojureScript reader, will be equal to the original. This is extremely useful for simple cases of storing and transmitting data.

Clojure compatibility

The serialized string representation of ClojureScript objects and collections is fully compatible with that of Clojure. Objects printed in ClojureScript can be read in Clojure, and vice versa. This makes development Clojure on the server side and ClojureScript in the browser client very easy. We will demonstrate this technique in Chapter 10.

Collection Types

Lists

Lists are ordered collections of items, implemented as singly-linked lists. As such, they support fast lookups and insertions at the head of the list and O(n) reads in the general case.

The literal syntax for writing lists is simply parentheses around the items (e.g., (1 2 3)). However, lists are also used in ClojureScript code to indicate a form that should be evaluated, meaning that if you try to enter a list that you don’t want evaluated (such as the one above), you’ll get an error as it tries to execute something it shouldn’t.

To avoid this and create a list literal, you can quote the form using either the quote special form or the single quote reader macro, which prevent evaluation of the forms to which they are applied. They are completely equivalent: '(1 2 3) is identical to (quote (1 2 3)), and both will evaluate to a list consisting of the numbers 1, 2, and 3 without attempting to evaluate 1 as a function.

To prepend an item to a list, use the conj function, which takes a collection as its first argument and any number of additional items to add. The items will be added at the beginning of the list. To retrieve items from a list, use the sequence functions (described in Vectors).

Vectors

Vectors are also ordered collections of items, and should generally be preferred to lists in most ClojureScript code. They fill the role played by arrays in JavaScript and most other programming languages, having near-constant lookup, update, and append operations. Technically, the computational complexity of a vector lookup is O(log32(n)), but this is so close to constant time that the distinction is practically meaningless on any data structure that will fit in memory on a modern computer.

The literal syntax for a vector is square brackets surrounding the items, such as [1 2 3] or [:a :b :c]. You’ve already seen literal vectors: they are used for specifying function parameters.

To append an item to a vector, use the conj function as you would for a list. However, in the case of a vector, the item(s) will be appended rather than prepended (conj works differently depending on the type of collection).

You can retrieve items from a vector using the sequence functions. The nth function will efficiently retrieve the item at a particular index. Vectors themselves can also be invoked as functions, passing an integer as the argument will return the item stored at that index (e.g., ([:a :b :c] 1) returns :b). To return a vector with an updated value at a particular index, use the assoc function (which takes a vector, an index, and a value) and returns a vector with the update applied.

Maps

Maps are associative collections; that is, they associate keys with values, and allow efficient retrieval of a value by its key. They are similar to Hashes in Ruby, HashTables in Java, or associative arrays (i.e., objects) in JavaScript.

The literal syntax for a map is alternating key/value pairs surrounded by curly braces, such as {:key1 :val1 :key2 :val2}. Because commas are whitespace in ClojureScript, some people like to add them to maps for greater visual distinction between key/value pairs like {:k1 :v1, :k2 :v2}, but this is strictly optional.

Note that keys can be any primitive or data type that supports proper equality. Keywords are idiomatic and efficient, but strings and integers are also commonly used as map keys. It is even perfectly acceptable to use other data structures as keys if they support good equality semantics (as ClojureScript’s do).

Maps may actually be implemented in a number of different ways, using different algorithms. ClojureScript includes array maps (backed by arrays), hash maps (backed by hash tables), and tree maps (backed by red-black balanced binary search trees). There are no semantic differences between these implementations, although they do have different performance characteristics. (The sorted tree map does actually make one additional guarantee that other implementations don’t: when iterating over its entry set, the entries will be returned in the specified sort order of the keys.) Typically, however, you don’t need to worry about them. When you create a map using a literal, ClojureScript chooses the best algorithm based on the size of the map, and will swap out the type to keep it efficient as it grows. If you wish, however, you can create a particular type of map using the array-map, hash-map, or sorted-map/sorted-map-by functions (for array maps, hash maps, and tree maps, respectively).

There are several techniques for retrieving values from a map:

  • The get function, which takes a map and a key value, and returns the value mapped to the key.

  • The map itself can be invoked as a function. Passing it a key will return the value mapped to that key.

  • If the key is a keyword, you can invoke it as a function, passing the map as an argument. When used as a function, keywords can look themselves up in the map they are provided and return the associated value.

To obtain a map with an inserted or updated value at a particular field, use the assoc function, passing a map and a series of alternating keys and values. This will return the map, but with the specified keys mapped to the specified values. If the map previously contained values associated with the keys, they will be replaced.

Sets

Sets are unordered collections of unique items, meaning that the same item cannot be duplicated in the set (similar to the mathematical notion of a set). If you add an item to a set that is equal to an item the set already contains, the set is unaffected. Sets can also be thought of as maps with only keys and no values. They support fast insertion, removal, and membership checks.

The literal syntax for a set is a pound sign followed by members enclosed in curly braces, like #{:a :b :c}.

To add an item to a set, use the conj function, passing the set and the item to add. Sets also support disj, which does the opposite of conj and returns a set with the item removed. To test if an item is a member of a set, use the contains? function, which takes a set and an item and returns true if the item is a member of the set.

ClojureScript also provides the clojure.set namespace containing dedicated set operations such as union, intersection, and difference.

Immutability

An important feature of all of ClojureScript’s collections is that they are immutable, meaning that they can’t be changed. Functions that “modify” collections don’t actually ever change them, but instead create and return a new one based on the original with the specified differences in place.

This is highly nonintuitive to most programmers who don’t have prior experience with purely functional languages. However, it becomes clearer once you understand Clojure(Script)’s concept of value.

Values don’t change. Consider the number 3. If you add 3 + 1, you haven’t changed the value of 3 (which would wreak havoc with math and physics everywhere). Instead, you’ve acquired a new value. The same is true of words: if you use the word “good” together with the word “morning” to say “good morning,” you haven’t changed the global meaning of the word “good,” you’ve used it to create a new utterance. In ClojureScript, the very definition of a value means that it can’t change—if it does, it’s no longer the same value.

ClojureScript’s collections are all values. If I take the vector [1 2] and append the value 4, I haven’t changed the meaning of [1 2]. I can’t change it. By definition, it can only ever mean “the two element vector consisting of the integers 1 and 2.” If I could literally change it, it would no longer meet its own definition. But what I can do is create an entirely new vector, using [1 2] as a base: [1 2 4].

The same thing is true of all ClojureScript’s other collection types. When you add a member to a set, you’re creating a different set with different members (which, incidentally, conforms to the mathematical definition of a set). When you add an item to the front of a list, you create a new list consisting of both the old list and the new item. When you add a new key to a map, you’re creating a new map, with a different set of keys.

Why immutability?

In Clojure, concurrency is always listed as a compelling reason to use immutable collections: preventing unexpected changes to data goes a long way towards preventing race conditions. HTML 5 WebWorkers do allow concurrent execution in modern browsers. However, they sidestep many of the difficulties associated with concurrent programming by forbidding shared state between threads, instead operating solely on the basis of message passing. But what about ClojureScript, which always runs in a single-threaded JavaScript environment?

There are two possible answers to this question. First, there is a sense in which treating collections as values is philosophically correct, irrespective of performance or design implications. It makes programs easier to formalize and reason about. For example, having a firm concept of collections as values also allows a rigorous notion of equality (which can greatly simplify programs), and allows functions dealing with collections to remain formally pure.

Second, there are indeed practical benefits to having immutable objects besides full concurrency. Even though JavaScript is single-threaded, code is often structured in terms of asynchronous callbacks and event loops, and it isn’t always easy to reconstruct the exact sequence of execution a program might take. With immutable values, you can rely on the fact that once you have obtained a collection, you can save it (either explicitly or by closing over it) and use it later without any risk that it will have changed. Having immutable objects means never having to worry about mentally keeping track of what’s going on—all value changes are explicit and apparent in the code.

Persistence

One question that almost invariably follows a discussion of immutability is that of the performance implications. No matter what the benefits are, isn’t cloning an entire data structure every time it’s updated prohibitively wasteful of computational resources?

The answer would be yes, if that were what actually happens. Fortunately, ClojureScript provides some extremely sophisticated data structure implementations that utilize the concept of persistence to provide objects that are logically immutable, but share structure with previous versions of themselves to minimize their computational overhead.

A full discussion of the implementation of persistent data structures is beyond the scope of this book, but essentially what happens is that when a data structure is modified, the new value is not a full clone of the original one. Instead, it incorporates the original (which it can safely do, because the old one is immutable) plus the changes, and then exposes a unified view of the whole package in a way that hides the internal structure.

In practical terms, persistence means that while using immutable objects does incur some small overhead compared with mutating traditional objects, it (hopefully) falls well within the realm of acceptable cost relative to the benefit provided. Typically, unless you’re writing extremely performance-sensitive code (which is rare in JavaScript to begin with), ClojureScript’s immutable collections are more than fast enough. And if you ever do need to eke out every last drop of performance, ClojureScript’s interop syntax makes it easy to drop down to native JavaScript objects and arrays.

Identity and State

Having data structures be immutable values is all very well, but it opens another question: if values are immutable, then how does ClojureScript model state and change over time? After all, not every program can be a purely functional transformation of inputs to outputs. Most of the time, programs need to store and change values.

The answer lies in ClojureScript’s (and Clojure’s) conceptual distinctions among value, identity, and state.

  • A value is, as the name implies, an immutable value. As discussed above, values can’t change, by definition.

  • Identity refers to a named entity in the system that may refer to different values at different points in time.

  • State refers to the value of an identity at a particular point in time.

Most languages don’t make a clear distinction between these concepts—for example, a variable in JavaScript has bits of all three. It is a named thing, but it has a value, and its value can change.

By teasing apart these concepts, ClojureScript makes state management explicit. Identities are clearly visible as the only things that can change, and state transitions to new values are clearly intentional.

This leads to a unique program structure in large ClojureScript programs. Rather than having state smeared thinly across the whole program, it is isolated from the main bulk of the code. Only a few functions update state, the rest remain pure functions of values. When done correctly, this makes ClojureScript programs much easier to reason about than those written in object-oriented or imperative paradigms.

Atoms

In Clojure, there are several constructs for creating identities, including atoms, refs, and agents. The different types of identities differ in the concurrency semantics they support. In ClojureScript, which doesn’t need to support shared-memory concurrency, there is only one type: atoms.

Atoms are identities that refer to a single value (though that value, of course, may be one of Clojure’s collections). All updates to the state of an atom are atomic, that is, they occur in a single operation.

To create an atom, just use the atom function, passing a value for the initial state. For example:

(def my-atom (atom {}))

This constructs an atom with an initial state of an empty map, and binds it to a var called my-atom.

To retrieve the current value of an atom, use the deref function, which also has a shortened syntax using the reader macro @. The following two expressions are equivalent:

(deref my-atom)   ;=> {}
@my-atom   ;=> {}

There are two ways to update the state of an atom, swap! and reset!. swap! is used to update the atom’s state in terms of the previous state, reset! sets the state without regard for the previous state. Both functions return the value of the atom’s new state.

swap! always takes at least two arguments; the first is the atom, the second is the update function. The update function will be applied with the value of the atom as its first argument, with any additional arguments to swap! used as additional arguments.

So, for example, to add a new entry to the map that is the current value of my-atom, you could invoke swap! like so:

(swap! my-atom assoc :a "1")  ;=> {:a 1}

Subsequently, retrieving the value of the atom returns the new value:

@my-atom   ;=> {:a 1}

Or, you can use reset!, passing the atom and the new value to update the state:

(reset! my-atom {:x 42})  ;=> {:x 42}
@my-atom   ;=> {:x 42}

Initially, this might seem like too much ceremony to do something as easy as changing some state. But the ceremony is (almost) the whole point. State should not be something implicit in a program, quietly multiplying complexity exponentially with each new variable. Instead, it should be carefully, knowingly managed. In ClojureScript, atoms provide this capability.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset