We use TLA+ to find flaws in our designs. But there’s another, subtler benefit: we also find places where the spec is ambiguous. Formally specifying your problem forces you to decide what you actually want out of your system. This is especially important when we model “business logic,” features, and requirements. To work through this, we’ll use TLA+ to spec a simple library system and show how the act of specifying can itself find faults in the spec.
In our system, people should be able to check out books, renew books, and return them. They will also be able to reserve books: a reserved book cannot be checked out by anybody else. The system should be internally consistent (all books are accounted for), and anybody who wants a book should eventually be able to check it out. Most of these seem like simple features, but how they interact can lead to surprising behavior.
In addition to the final specs, I’ll be showing the development process and the dead ends we can run into. This is an example of how we write specifications and would be incomplete without it.
The Requirements
We begin with the standard setup
, extensions and constants. There seem to be two constants here: one that represents the set of books and one that represents the set of people.
---- MODULE main ----
EXTENDS Integers, TLC, Sequences
CONSTANTS Books, People
PT == INSTANCE PT
On second thought, though, “books” is ambiguous. Are we going to assume we’re only looking at one type of book or multiple types? If we do one type the spec will be simpler and probably check faster, and if we do multiple types the spec will more closely mirror our problem domain. I decide to go with the latter. Since the library’s holdings will change over time
, we might assign that to a variable.
(*--algorithm library
variables
library = * ???
end algorithm; *)
Question two: Do we have a single copy of each book, or can we have multiple copies? In the former case, we can make
library a set. In the latter case, we actually want it to be a map of books to numbers, something like
[Books -> Nat]. Again, the second case is closer to what an actual library has. That means we have to introduce another constant for the range of possible copies. We can still test the model with one copy of each book by setting that range to
{1}.
CONSTANTS Books, People, NumCopies
ASSUME NumCopies subseteq Nat
(*--algorithm library
variables
library in [Books -> NumCopies];
end algorithm; *)
For each person, we’ll give them the ability to take a book from the library or return a book to the library. They have a private
books variable that tracks what they have, a
Checkout action, and a
Return action
.
process person in People
variables
books = * ???
begin
Person:
either
* Checkout
skip;
or
* Return
skip;
end either;
goto Person;
end process;
And again, we have a question: What should
books be? Should it be a set or an accumulator like
library? These lead to different behaviors
. If we specify that
books is a set, we’re assuming that people can only check out one copy of the book at a time. This is a question we’d have to ask the client: Should people be able to take out multiple copies? Let’s assume they said “no.” This gives us another requirement:
Since we’ll be adding and removing from sets, I want to add a couple of convenience binary operators. These would go above the algorithm, as they don’t depend on any of the variables to work.
set ++ x == set union {x}
set -- x == set {x}
For the implementation of the
person process
, I decide that for now, we’ll assume they always eventually act, so we can make them fair processes. This assumption might not hold over time; after all, the person might forget to return their book. I also decide not to introduce separate labels for
Checkout or
Return, as there are no concurrency issues here (yet).
define
AvailableBooks == {b in Books: library[b] > 0}
end define;
fair process person in People
variables
books = {};
begin
Person:
either
* Checkout:
with b in AvailableBooks books do
library[b] := library[b] - 1;
books := books ++ b;
end with;
or
* Return:
with b in books do
library[b] := library[b] + 1;
books := books -- b;
end with;
end either;
goto Person;
end process;
end algorithm; *)
In our system, a person can grab any library book that is available and that they don’t already have. This defines our minimal system without any invariants
.
Adding Invariants
Before deciding the invariants
, let’s create our first model. As mentioned before, our multi-copy system can “simulate” a single-copy system. Let’s call the model
OneCopyPerBook and use the following constant assignments:
NumCopies <- 1..1
People <- [model value] {p1, p2}
Books <- [model value] {b1}
Run this to confirm we don’t have any crashes. Looks good. Time to add some invariants. First we’ll focus on simple safety properties to make sure nothing’s going wrong.
The first we’ll add is a common TLA+ pattern called TypeInvariant.
TypeInvariant
is the conventional term for an operator that captures the ‘sensibility’ of the system. The system may still not satisfy the spec, but at least it’s physically possible. One example of this is that the library cannot have a negative number of books in it: it’s simply not meaningful to have that. Nor can the library have more than the possible NumCopies of books in it.
Similarly, it’s not meaningful that people, raw numbers, or anything
except books are in each person’s
books repository. The PlusCal abstraction leaks a little bit here: there are multiple processes, one for each element of the set
People, so TLA+ translates the private variable to a function
from
People to sets of
Books. Since
TypeInvariant checking a constraint on a private variable, we need to put it after the translation.
TypeInvariant ==
/ library in [Books -> NumCopies ++ 0]
/ books in [People -> SUBSET Books]
Have OneCopyPerBook check this invariant and confirm the system still works (16 states). As a sanity check, replace library in [Books -> NumCopies ++ 0] with library in [Books -> NumCopies] and confirm that TLC finds an error.
Adding Liveness
Now let’s add a temporal property
. Some people want certain books. We want to confirm that they eventually get these books. We add this with a couple of small changes.
fair process person in People
variables
books = {},
wants in SUBSET Books;
begin
Person:
either
* Checkout:
with b in AvailableBooks books do
library[b] := library[b] - 1;
books := books ++ b;
wants := wants -- b;
* Rest is same
TypeInvariant ==
/ library in [Books -> NumCopies ++ 0]
/ books in [People -> SUBSET Books]
/ wants in [People -> SUBSET Books]
Liveness ==
/ <>(A p in People: wants[p] = {})
Add Liveness to OneCopyPerBook and rerun. You should see this fail. p2 want to read the book, but p1 could keep checking it out, returning it, and then checking it out again. p2 never gets a chance to read the book, so our liveness constraint
is violated.
Adding Reservations
If a person reserves a book, it cannot be checked out by anybody else. There are a few possible types for a reserve variable. [Books -> People] means that every book is reserved by exactly one person. This immediately seems wrong for several reasons. First, we’d need to add a model value NULL
to represent a book that isn’t reserved, which adds unnecessary complexity to our model. Second, only one person can reserve a given book at a time, so what happens if somebody else tries? Do we simply prevent them, or does it override the existing hold? Neither of these seem like desired behavior for the library.
Our second choice is [Books -> SUBSET People]. On the one hand, this means that any book can be held by several people, and the order they placed the holds doesn’t matter. This naturally includes nobody reserving, as {} in SUBSET People. On the other hand, maybe the library wants there to be some sort of priority for holds, such as “people who placed them first get the book first.”
A third choice is
[Books -> Seq(People)], where
Books maps to an ordered sequence of people. So we have a question for the customer: Ordered reservations or unordered reservations? We’ll start with unordered because it makes the fewest assumptions. Note there’s peculiar behavior to how reservations work: if the set is
empty, then anybody can check out that book. If the set is nonempty, only people in that set can check it out.
variables
library in [Books -> NumCopies],
reserves = [b in Books |-> {}];
define
AvailableBooks == {b in Books: library[b] > 0}
BorrowableBooks(p) == {b in AvailableBooks: reserves[b] = {} / p in reserves[b]}
end define;
Another way we could write the filter in
BorrowableBooks is with the
=> operator:
reserves[b] /= {} => p in reserves[b]. We’ll keep using the version above, though. Then we update the
Person action
:
Person:
either
* Checkout:
with b in BorrowableBooks(self) books do
library[b] := library[b] - 1;
books := books ++ b;
wants := wants -- b;
end with;
or
* Return:
with b in books do
library[b] := library[b] + 1;
books := books -- b;
end with;
or
* Reserve:
with b in Books do
reserves[b] := reserves[b] ++ self;
end with;
end either;
goto Person;
This fails, as a borrower can simply keep reserving the book
and reborrowing it. Someone else is left out and never gets a chance to read it! If the library agrees with the change, we’d move to an ordered sequence of holds. But sequences can have duplicate entries. Should those be allowed? If so, then is the reservation queue bounded? And if duplicates are not allowed, then we have to design our system to prevent them. For this exercise, we’ll say that you can only hold one position in the list at a time.
NoDuplicateReservations ==
A b in Books:
A i, j in 1..Len(reserves[b]):
i /= j => reserves[b][i] /= reserves[b][j]
TypeInvariant ==
/ library in [Books -> NumCopies ++ 0]
/ books in [People -> SUBSET Books]
/ wants in [People -> SUBSET Books]
/ reserves in [Books -> Seq(People)]
/ NoDuplicateReservations
And let’s change the rest of the code:
variables
library in [Books -> NumCopies],
reserves = [b in Books |-> <<>>];
BorrowableBooks(p) ==
{b in AvailableBooks:
/ reserves[b] = <<>>
/ p = Head(reserves[b])}
* Reserve:
with b in Books do
reserves[b] := Append(reserves[b], self);
end with;
This fails
TypeInvariant
because it allows for a duplicate. Let’s fix that by preventing duplicate reservations:
* Reserve:
with b in {b in Books: self
otin PT!Range(reserves[b])} do
reserves[b] := Append(reserves[b], self);
end with;
This fails again, because while writing this spec I forgot to
remove reservations that have been fulfilled. Let’s fix that.
* Checkout:
with b in BorrowableBooks(self) books do
library[b] := library[b] - 1;
books := books ++ b;
wants := wants -- b;
if reserves[b] /= <<>> / self = Head(reserves[b]) then
reserves[b] := Tail(reserves[b]);
end if;
end with;
We need reserves[b] /= <<>> to avoid checking Head on an empty sequence. Confirm this passes with 80 states found.
Updating Assumptions
Next, I cloned
OneCopy
and created a new model
One Copy, Two Books, One Person:
NumCopies <- 1..1
People <- [model value] {p1}
Books <- [model value] {b1, b2}
This fails. The first error is that somebody can be interested in a book but never get around to checking it out. This does not seem so much an issue with our system as much as a missing caveat to our requirement: “people eventually get to check out the books they want
if they try to check them out.” We can add this assumption
to our spec by only having people check out books they want to read:
Person:
while TRUE do
either
with b in (BorrowableBooks(self) intersect wants) books do
Now the system deadlocks: if the person isn’t interested in any more books, the system can’t do anything. We could fix this by disabling deadlocks, but that may let an actual deadlock slip through. Instead, let’s add the assumption that people’s preferences aren’t fixed over time. Just because I don’t want
b1 now doesn’t mean I won’t eventually want to read it. I could also add an “Unwant” action, but adding it would weaken the spec: we don’t want the library system succeeding only if people give up on using it.
* Reserve
with b in {b in Books: self
otin PT!Range(reserves[b])} do
reserves[b] := Append(reserves[b], self);
end with;
or
* Want
with b in Books wants do
wants := wants ++ b;
end with;
end either;
On the plus side, this no longer deadlocks. On the minus side, it once again violates
Liveness:
- 1.
- 2.
p1 checks out b1. p1 now wants b2.
- 3.
p1 adds b1 to wants. p1 now wants b1 and b2.
- 4.
p1 checks out b2. p1 now wants b1.
- 5.
p1 adds b2 to wants. GOTO 1.
At no point is wants empty, so the spec is violated. Adding the extra actions revealed more ambiguity in our spec: currently liveness does not say “everybody gets to read every book they want.” It says, “there is some point where nobody wants to read any more books.” If I steadily add new books to read, the system fails, even if I still read every book I want to.
Additionally, it means that
everybody must be satisfied at that time. If you go back and rerun
OneCopyPerBook, you’ll see that TLC
can find a trace where at least one person has a book in their
wants. A more accurate property would be “if a person wants to read a book, eventually they don’t want to read it”:
Liveness ==
A p in People:
A b in Books:
b in wants[p] ~> b
otin wants[p]
Recall that
~> is “leads-to”: every time a person wants to read
b, there is a future state where they don’t want to read
b.
OneCopyPerBook now passes (284 states), but
Two Books still fails: instead of cycling both books,
p1 now just keeps rereading
b1. This seems to me to be a user error: the person isn’t actually trying to read
b2. What happens if we assume that people only add new books when they run out, but also can add any number at one time?
or
* Want
await wants = {};
with b in SUBSET books do
wants := b;
end with;
Two Books now passes with 328 states.
Expiring Reservations
We know the system works, under our assumptions
, if there is one person and two books or two people and one book. The next thing to try would be two people and two books, in a model I call
2P 2B:
NumCopies <- 1..1
People <- [model value] {p1, p2}
Books <- [model value] {b1, b2}
Surprisingly, this deadlocks. Someone can reserve a book they don’t care about and block everybody else from reading it. We could restrict which books you can reserve, but that’s not realistic: this is a scenario the library actually has to be able to handle. The model shows us that we cannot always rely on people to always check out the books they hold, and that this can prevent people from reading the books they want. So there must be some way to invalidate the hold.
But then doesn’t that put us back where we started? If reservations can expire, we can’t guarantee that everybody eventually reads all the books they want. It could keep expiring before they have a chance to check it out, and then somebody else grabs it first. It turns out we cannot guarantee Liveness, no matter what we do! Without a significantly more complicated system, or placing unrealistic restrictions on how the people behave, we cannot ensure that everybody eventually reads all of the books they want to read. By trying to resolve the ambiguity in the business requirements, we found that they were self-contradictory. That’s something worth knowing before we start coding this!
What if we relax the requirements? Instead of saying that everybody eventually reads every book they want, we could say that everybody eventually gets a chance to read the book(s) they want. In other words, there exists at least one state where they could take out the book. In practice, this would correspond to the library only letting you reserve for, say, five days. If you don’t decide to check out the book in that time, you had your chance and the library did everything it could.
The obvious way to relax it would be to say that for every book a person wants, either the person reads the book, or the person is at some point the next in line to reserve it.
Liveness ==
A p in People:
A b in Books:
b in wants[p] ~>
/ b
otin wants[p]
/ p = Head(reserves[b])
If you run this, though, you will get an error. If TLC evaluates
Liveness in a state where
reserves[b] is empty, then it tries to find
Head(<<>>), which is undefined. For most specs an empty sequence is a special case that has to be treated in the context of the wider system. Since we’re trying to see if someone has reservation rights, if the sequence is empty then they obviously don’t have it. We should put the two reservation clauses in a separate operator for clarity
:
NextInLineFor(p, b) ==
/ reserves[b] /= <<>>
/ p = Head(reserves[b])
Liveness ==
A p in People:
A b in Books:
b in wants[p] ~>
/ b
otin wants[p]
/ NextInLineFor(p, b)
Finally, we create an expiration process for each book, which I’ll put at the bottom of the PlusCal spec.
fair process book_reservations in Books
beg/in
Expire:
await reserves[self] /= <<>>;
reserves[self] := Tail(reserves[self]);
goto Expire;
end process;
This still fails:
p1 can want
b1 but keep reserving
b2 and never get around to taking out
b1. As an experiment, I decided to make people only reserve the books they wanted, not any book at random:
* Reserve
with b in {b in wants: self
otin PT!Range(reserves[b])} do
reserves[b] := Append(reserves[b], self);
end with;
But this didn’t work either, and in a way I completely didn’t expect. We had the following failure mode:
- 1.
- 2.
p1 reserves b1. p = Head(reserves[b1]). All that’s left to satisfy liveness is that she reserves or checks out b2.
- 3.
Her reservation for b1 expires.
- 4.
According to our system, she’s done with b1. But we only remove b1 from wants[p1] if the person actually checks out the book. We still have b1 in wants[p1]. She doesn’t care that our system works, she still wants to check out the book!
- 5.
- 6.
We can try fixing this by restricting people’s behavior, but that is, again, unrealistic. Ultimately, we’re unable to make headway on liveness because liveness is a hard problem that requires us to have control over the entire system
. But humans aren’t under our control: we can’t force them to do things for us. Any liveness conditions that depend on the users behaving properly are going to be intractable. If the library requires that “everybody who wants a book eventually gets to read it,” we can’t absolutely guarantee them that, not without unrealistic assumptions about how humans behave.
That said, we can still verify that the system works for various special cases. Before we had a problem with people using reservations to block other people from reading a book. Does our expiration system fix that? We can check by adding a state constraint to
2P 2B. As we discussed back in Chapter
, TLC will only check states that satisfy the state constraint. Let’s add one saying that a person only wants at most one book. The easiest way to do this is by extending
FiniteSets so we can use
Cardinality.
EXTENDS Integers, TLC, Sequences, FiniteSets
* This goes in Advanced Options > State Constraint
A p in People: Cardinality(wants[p]) <= 1
Now
2P 2B passes with 414 states. With this, I’m reasonably confident that if a person wants to read a book and takes steps to read it, the library reservation system guarantees they eventually have a chance to read it. Our final spec:
EXTENDS Integers, TLC, Sequences, FiniteSets
CONSTANTS Books, People, NumCopies
ASSUME NumCopies subseteq Nat
PT == INSTANCE PT
set ++ x == set union {x}
set -- x == set {x}
(*--algorithm library
variables
library in [Books -> NumCopies],
reserves = [b in Books |-> <<>>];
define
AvailableBooks == {b in Books: library[b] > 0}
BorrowableBooks(p) ==
{b in AvailableBooks:
/ reserves[b] = <<>>
/ p = Head(reserves[b])}
end define;
fair process person in People
variables
books = {},
wants in SUBSET Books;
begin
Person:
while TRUE do
either
* Checkout:
with b in (BorrowableBooks(self) intersect wants) books do
library[b] := library[b] - 1;
books := books ++ b;
wants := wants -- b;
if reserves[b] /= <<>> / self = Head(reserves[b]) then
reserves[b] := Tail(reserves[b]);
end if;
end with;
or
* Return:
with b in books do
library[b] := library[b] + 1;
books := books -- b;
end with;
or
* Reserve
with b in {b in wants: self
otin PT!Range(reserves[b])} do
reserves[b] := Append(reserves[b], self);
end with;
or
* Want
await wants = {};
with b in SUBSET books do
wants := b;
end with;
end either;
end while;
end process;
fair process book_reservations in Books
begin
Expire:
await reserves[self] /= <<>>;
reserves[self] := Tail(reserves[self]);
goto Expire;
end process;
* BEGIN TRANSLATION
* ...
* END TRANSLATION
NoDuplicateReservations ==
A b in Books:
A i, j in 1..Len(reserves[b]):
i /= j => reserves[b][i] /= reserves[b][j]
TypeInvariant ==
/ library in [Books -> NumCopies ++ 0]
/ books in [People -> SUBSET Books]
/ wants in [People -> SUBSET Books]
/ reserves in [Books -> Seq(People)]
/ NoDuplicateReservations
NextInLineFor(p, b) ==
/ reserves[b] /= <<>>
/ p = Head(reserves[b])
Liveness ==
A p in People:
A b in Books:
b in wants[p] ~>
/ b
otin wants[p]
/ NextInLineFor(p, b)
Summary
We took a couple of requirements for a library checkout system and, in trying to formally specify it, found several ambiguities. By trying to resolve these ambiguities we pinned down the semantics of what “reservation” actually means, and then showed that reasonable models could not fulfill one of the client requirements. We could, however, guarantee the properties for special cases, such as “people actually making an effort to check out the books they want to read.”
Often times we can’t match requirements perfectly. The real world adds its own complex problems and sometimes we have to settle for “good enough.” It’s better to know what these problems are – and what “good enough” means – right now rather than four months into the project.
In the next chapter, we work through another large example and verify the design of a MapReduce system.