© Hillel Wayne 2018
Hillel WaynePractical TLA+https://doi.org/10.1007/978-1-4842-3829-5_1

1. An Example

Hillel Wayne1 
(1)
Chicago, Illinois, USA
 

Let’s write our first specification! In this chapter we will take a simple problem and translate it into a specification. Then, we’ll model check the specification and see if it has a flaw. Spoiler alert, it will have a flaw. This will be a whirlwind tour: we will be using concepts we will gradually learn over the course of the book.

The Problem

Alice and Bob have accounts at Bankgroup. Each account has 0 or more dollars in it. Bankgroup wants to add a new “wire” feature, where any user can transfer money to any other user. This feature has the following requirements:
  • Each wire must be between two different people in the bank and wire at least one dollar.

  • If a wire is successful, the value of the wire is deducted from the sender account and added to the receiver account.

  • If the wire fails, the two accounts are unchanged.

  • A wire may not cause an account to have a negative balance.

  • For scaling reasons, multiple wires may happen simultaneously.

Your implementation must guarantee all properties, even if wires can take an arbitrary amount of time. In other words, even if wire A starts before B, A may still finish after wire B. Your algorithm must guarantee that, even in those cases, we satisfy all of the requirements. Given that money is on the line, you should probably design this in advance.

Boilerplate

We’re going to start with some boilerplate. Create a new project under File > Open Spec > Add New Spec, set the root-module file to /your/path/wire.tla and set the specification name to “wire”. You should see something like Figure 1-1.
../images/462201_1_En_1_Chapter/462201_1_En_1_Fig1_HTML.jpg
Figure 1-1

An empty specification

---------- MODULE wire --------
===============================

The left panel is the list of TLA+ projects you have, while the right panel is where you will write your spec. Everything above the dashes and below the equals are ignored. We will not write them in the code snippets, and you can assume that everything is happening inside those.

Warning

The name of the module must match the name of the file, or TLA+ will consider the spec invalid.

Next comes the imports we need to use. The TLA+ keyword for an import is EXTENDS. Since we want to do arithmetic, we need to add EXTENDS Integers to the top. Finally, we’ll set up the frame for the algorithm .
---------- MODULE wire --------
EXTENDS Integers
(*--algorithm wire
begin
    skip;
end algorithm;*)
===============================

Single line comments are *, comment blocks are (**). The algorithm is inside a comment: we’ll cover why later. That’s all the boilerplate we need. Now we can get around to specifying what we want.

Specifying

From a design perspective, there are two things we’re tracking in the system state: the set of people with accounts, and how much money each of them has. For simplicity, we’ll represent each person by their name and assume they all have 5 dollars in their account.
EXTENDS Integers
(*--algorithm wire
    variables
        people = {"alice", "bob"},
        acc = [p in people |-> 5];
begin
    skip;
end algorithm;*)

Note

Whitespace is not significant here. There is exactly one place in TLA+ where it is significant, which we will cover in Chapter 3.

Let’s start with people. people is a set. It’s an unordered collection of things, same as most programming languages. In this case, it has two strings as elements: “alice” and “bob”. Nothing too surprising.

Okay, what about acc? acc is a function. These aren’t like normal programming functions: they’re closer to dictionaries or mappings. For each value in a given set, it maps to some output value. Here the set is people and the element is p. This means acc["alice"] = acc["bob"] = 5. This would be equivalent, in a language like Python, to writing {"alice": 5, "bob": 5}. We could also make the function depend on each element. For example, we could write double = [x in 1..10 |-> 2*x].

While we eventually want to allow for multiple wires , we’ll start by just modeling a single wire. And we’ll say that it must be for 3 dollars, and from alice to bob.
    variables
        people = {"alice", "bob"},
        acc = [p in people |-> 5],
        sender = "alice",
        receiver = "bob",
        amount = 3;

Note that we moved the semicolon, as acc is no longer the last variable we declare.

The final thing we will add is an Invariant. That’s something we want to be true at every state of the system, no matter how it starts or where it ends. If it’s false, our specification has an error.
EXTENDS Integers
(*--algorithm wire
variables
    people = {"alice", "bob"},
    acc = [p in people |-> 5],
    sender = "alice",
    receiver = "bob",
    amount = 3;
define
    NoOverdrafts == A p in people: acc[p] >= 0
end define;
begin
    skip;
end algorithm;*)

The == here does not mean comparison . Rather, it’s the definition of an operator, which is closer to what we normally think of programming functions. We’ll discuss them more in Chapter 4. For now, we’re using it as an invariant that we’ll want to check. NoOverdrafts, in English, is “for all p in the set of people, their account must be greater than or equal to 0”. This accurately represents our property, whether we have two people in the set or two hundred.

Implementing

It’s time to add the implementation. We will add it between the begin and the end algorithm. It represents the implementation for our transfer algorithm, which we will check to see if it matches our spec.
EXTENDS Integers
(*--algorithm wire
variables
    people = {"alice", "bob"},
    acc = [p in people |-> 5],
    sender = "alice",
    receiver = "bob",
    amount = 3;
define
    NoOverdrafts == A p in people: acc[p] >= 0
end define;
begin
    Withdraw:
        acc[sender] := acc[sender] - amount;
    Deposit:
        acc[receiver] := acc[receiver] + amount;
end algorithm;*)

If you’re assigning a value to a variable for the very first time, you use =. However, if the variable already exists and you assign a new value to it, you use :=. Withdraw and Deposit are labels. They signify that everything inside them happens in the same moment of time. If we put the two assignments in the same label, they’d happen simultaneously. As it is, we let some time pass between the withdrawal and the deposit. We only allowed for one wire to happen, so this really doesn’t change much. But if we wrote the specification to allow multiple wires (which we’ll do later), this becomes more important.

Writing implementations in TLA+ can be a barrier starting out, so we wrote part of our spec in PlusCal, a special language that compiles to pure TLA+. PlusCal adds additional syntax, such as the := assignment syntax and the labels, giving us a pseudocode-like structure on top of TLA+. This is why we have to put it in a comment, as it needs to be translated first. To compile the PlusCal, we can go to File > Translate PlusCal Algorithm.

Tip

The shortcut is Ctrl-T, or Cmd-T on macOS. If you right-click in the editor, there’s an option in the context menu for “Translate PlusCal Automatically,” which does just that on every save.

You should see the translated TLA+ appear below, in Figure 1-2, what you wrote, bounded by * BEGIN TRANSLATION.
../images/462201_1_En_1_Chapter/462201_1_En_1_Fig2_HTML.jpg
Figure 1-2

The translated text

You don’t need to do anything else with it. You’re now ready to check your specification.

Verifying

To check that this spec works, we need to create a model to check it. The model is like the “simulation” we want to run. Different models may set up different initial conditions and check for different things. Since we aren’t doing anything too fancy here, all we need to do with our model is check that our one invariant holds.

To create the model, go to TLC Model Checker > New Model. Once you’ve created the model, you should see something like what is shown in Figure 1-3.
../images/462201_1_En_1_Chapter/462201_1_En_1_Fig3_HTML.jpg
Figure 1-3

The Model Overview

Find the drop-down labeled Invariants and click the Add button. Type in the name of the invariant we want to check (NoOverdrafts), save it, and run the model (the green button just under “Model Overview”).

Note

The shortcut for running the model is F11. You can bind your own shortcut in Preferences.

You should see 4 states found, 3 distinct states, and no error (Figure 1-4). From now on, if the spec completes without an error, we will list the number of states it should complete with. In this case, it would be successful (4 states).
../images/462201_1_En_1_Chapter/462201_1_En_1_Fig4_HTML.jpg
Figure 1-4

(4 states)

Warning

If you don’t see any states found, go back to Model Overview and make sure Temporal Formula is selected as the behavior with the temporal formula Spec. This is what tells TLC to specifically check the behavior of your spec.

TLC did an exhaustive search across our entire state space. Since we weren’t doing anything tricky, there was exactly one behavior to check:
  1. 1.

    Choose a possible initial state (there is only one possible initial state).

     
  2. 2.

    Check that the starting state follows the NoOverdrafts invariant. It does, so begin evaluation. Execute the Withdraw step.

     
  3. 3.

    Check the NoOverdrafts invariant. It’s true, so continue.

     
  4. 4.

    Execute the Deposit step.

     
  5. 5.

    Check the NoOverdrafts invariant . It’s true, so finish.

     

If our spec had any errors, it would have popped up an error bar. Since that didn’t happen, our spec is probably correct.

Or maybe it’s too simple. Let’s break it.

Initial Conditions

Our spec worked when Alice was sending exactly 3 dollars to Bob, but that’s only one possible input. Maybe she sends 2 dollars. Maybe she sends 20. We can expand the coverage of the spec by letting it choose how much she sends. We’ll do this by telling it to pick amount from a set of possible numbers, say 1 through 6. TLA+ provides the a..b shorthand for all integers between a and b (inclusive). Make this change in the same file as the translated TLA+.
EXTENDS Integers
(*--algorithm wire
variables
    people = {"alice", "bob"},
    acc = [p in people |-> 5],
    sender = "alice",
    receiver = "bob",
    amount in 1..6;
define
    NoOverdrafts == A p in people: acc[p] >= 0
end define;
begin
    Withdraw:
        acc[sender] := acc[sender] - amount;
    Deposit:
        acc[receiver] := acc[receiver] + amount;
end algorithm;*)

We’ve expanded the set of possible initial states for our spec. Our model checker will check every single one of them to make sure they all pass. Recompile the PlusCal and rerun the model.

Note

If you see deleteOutOfSyncFiles, don’t worry about it. It’s a cleanup thing.

This time, you should see something new (Figure 1-5).
../images/462201_1_En_1_Chapter/462201_1_En_1_Fig5_HTML.jpg
Figure 1-5

NoOverdrafts violated

TLC shows the specific invariant that was violated, which is NoOverdrafts. Below that you see the “Error-Trace.” This shows the initial condition and exact sequence of steps that lead to the violation. In this case, the error is this:
  1. 1.

    Choose amount = 6 as part of the initial state.

     
  2. 2.

    Check that the starting state follows the NoOverdrafts invariant. It does, so begin evaluation.

     
  3. 3.

    Execute the Withdraw step.

     
  4. 4.

    Check the NoOverdrafts invariant. It’s false, so raise the error.

     

How can we fix it? We could restrict the spec to only consider amount in 1..acc[sender], which would make it pass. However, this might not reflect the bank’s expected use. People in the real world might attempt to transfer more than they have, and we should be ready for that case. To continue with the example, though, we’ll assume that this is an acceptable assumption. Make the change and confirm the model passes again (20 states).

Multiple Processes

There’s one more requirement to implement: simultaneous wires. People should be allowed to start a second wire before the first finishes. In PlusCal, each algorithm happening simultaneously belongs to its own process. Each process can have its own code and its own local variables. We’ll start with two wires, so there are two processes running the exact same code.
EXTENDS Integers
(*--algorithm wire
variables
    people = {"alice", "bob"},
    acc = [p in people |-> 5];
define
    NoOverdrafts == A p in people: acc[p] >= 0
end define;
process Wire in 1..2
    variables
        sender = "alice",
        receiver = "bob",
        amount in 1..acc[sender];
begin
    Withdraw:
        acc[sender] := acc[sender] - amount;
    Deposit:
        acc[receiver] := acc[receiver] + amount;
end process;
end algorithm;*)
If we retranslate this and rerun, we get an error again. Even if some other system is stopping Alice from attempting to wire 6 dollars, that system doesn’t work across multiple wires. Then both withdraws happen, and our spec fails. We need to add a condition to our spec, so that we check we have sufficient funds before withdrawing.
EXTENDS Integers
(*--algorithm wire
variables
    people = {"alice", "bob"},
    acc = [p in people |-> 5],
define
    NoOverdrafts == A p in people: acc[p] >= 0
end define;
process Wire in 1..2
    variables
        sender = "alice",
        receiver = "bob",
        amount in 1..acc[sender];
begin
    CheckFunds:
        if amount <= acc[sender] then
            Withdraw:
                acc[sender] := acc[sender] - amount;
            Deposit:
                acc[receiver] := acc[receiver] + amount;
        end if;
end process;
end algorithm;*)
This also fails! TLC will report NoOverdrafts failed. The error is significantly more complex than what we’ve seen before. In order to understand what the error is, we need to look at the error trace (Figure 1-6).
../images/462201_1_En_1_Chapter/462201_1_En_1_Fig6_HTML.jpg
Figure 1-6

An Error Trace. Yours may look slightly different.

<Initial Predicate> is the starting state we picked. All of the variables have an assignment here. pc is the current label each process is on, and amount is the corresponding wire amount for each process. Since we let the wire amount vary, for this state TLC picked <<1, 5>>.

Every following state highlights with variables that have changed from the previous state. In state 2, the only change is that our first process finished the CheckFunds step and is ready to perform the Withdraw step. Step 3 is similar. In step 4, pc changes but so does acc: process 1 has performed Withdraw, so acc["alice"] has changed from 5 to 4. By reading the error trace, we can put together a sense of how this particular error occurs.
  1. 1.

    Choose amount[1] = 1 and amount[2] = 5.

     
  2. 2.

    Execute CheckFunds[1]. Since 1 <= acc["alice"], proceed to Withdraw[1].

     
  3. 3.

    Execute CheckFunds[2]. Since 5 <= acc["alice"], proceed to Withdraw[2].

     
  4. 4.

    Execute Withdraw[1]. Now acc["alice"] = 4.

     
  5. 5.

    Execute Withdraw[2]. Now acc["alice"] = -1.

     
You may see a slightly different error trace, but the overall pattern should be the same. Even if we check that Alice has enough money, both processes can pass the check before either withdraws . We’ve caught a race condition in our code. What if we placed the check and withdraw in the same label? Then there would be no way to sneak a concurrency bug in between the two actions because they’d happen in the same instant of time. In practice, we could potentially do this by withdrawing as the check and rolling back if that overdrafts.
(*--algorithm wire
variables
    people = {"alice", "bob"},
    acc = [p in people |-> 5],
define
    NoOverdrafts == A p in people: acc[p] >= 0
end define;
process Wire in 1..2
    variables
        sender = "alice",
        receiver = "bob",
        amount in 1..acc[sender];
begin
    CheckAndWithdraw:
        if amount <= acc[sender] then
                acc[sender] := acc[sender] - amount;
            Deposit:
                acc[receiver] := acc[receiver] + amount;
        end if;
end process;
end algorithm;*)

That seems to work (332 states). We have fewer states because we took out the label, which removed some of the concurrency. Let’s move on to the last requirement and make sure it still works if the wire fails.

Temporal Properties

“If the wire fails, the account is unchanged.” For the simple case of two people, let’s check a slightly weaker but more tractable requirement: “the total final value of the accounts is the same as the total starting value.” Unlike NoOverdrafts, this is a Temporal Property. Simple invariants check that every state of the spec is valid. Temporal properties check that every possible “lifetime” of the algorithm, from start to finish, obeys something that relates different states in the sequence to each other. Think of it like the difference between checking that a database is “always consistent” versus “eventually consistent” (although that’s just one of many things you could check). Here’s what our new spec would look like with the defined temporal property.
EXTENDS Integers
(*--algorithm wire
variables
    people = {"alice", "bob"},
    acc = [p in people |-> 5],
define
    NoOverdrafts == A p in people: acc[p] >= 0
    EventuallyConsistent == <>[](acc["alice"] + acc["bob"] = 10)
end define;
process Wire in 1..2
    variables
        sender = "alice",
        receiver = "bob",
        amount in 1..acc[sender];
begin
    CheckAndWithdraw:
        if amount <= acc[sender] then
                acc[sender] := acc[sender] - amount;
            Deposit:
                acc[receiver] := acc[receiver] + amount;
        end if;
end process;
end algorithm;*)

EventuallyConsistent looks like NoOverdrafts, but the equation starts with <>[]. <>[] is the “eventually-always” operator , and means that no matter what the algorithm does, in the end the given equation must eventually be true. It may be false for a while, and it may switch between true and false several times, but it will end true.

We need to add the temporal property under Properties in the Model Overview page. You will find it just below Invariants. After doing that, rerun the model.
../images/462201_1_En_1_Chapter/462201_1_En_1_Fig7_HTML.jpg
Figure 1-7

An error with stuttering

The error we see (Figure 1-7) is unusual. The first four steps in the trace look normal: we withdraw in the first wire, we deposit in the first wire, and we withdraw in the second wire. Then we see that the fifth state is labeled <Stuttering>. Stuttering is when a process simply stops. There’s nothing preventing the wire from making the deposit, it just doesn’t try. If this seems far-fetched, consider the case where the server crashes in between steps. Or the power goes out. Or the deposit eventually happens, but it takes so long Alice complains to the SEC. Those are all real and quite worrisome concerns, and they are all represented by the stuttering state.

How do we fix this? Unfortunately, there are no good options. At least, there’s no easy options that probably won’t blow up in practice.
  • We could make the check, withdraw, and deposit all happen in the same step. Then there’s no way for the server to crash between the withdraw and the deposit, because there is no timespan between the withdraw and the deposit. In practice, though, this means our process only has one label, which means it effectively takes zero time to run. This violates the core requirement that wires take an arbitrary amount of time.

  • We could specifically tell TLA+ that our process cannot stutter between the withdrawal and the deposit. Our spec would pass, but there’s no way we could implement it. Server faults are as certain as death and taxes.

  • We could convince the Project Manager to relax the EventuallyConsistent requirement.

  • We could try a different implementation entirely.

  • We could relax the NoOverdrafts requirement. In the end, this is how most banks do it: instead of guaranteeing that overdrafts never happen, they work to make overdrafts less likely and have procedures in place when they do happen.

Specifying safe transfers is surprisingly hard! What’s important, though, is that we can test our specification without having to write code first. This lets us explore solutions and confirm they are safe before building our system.

Summary

We wrote a specification of a bank transfer and saw how the model checker, TLC, was able to find race conditions and logic bugs in our design. In the process, we sampled all of the various features of both TLA+ and PlusCal. Over the next five chapters, we will cover all of these features in more detail. Invariants, operators, and functions are all covered in Chapter 3, using TLC better is part of Chapter 4, Chapter 5 is all about concurrency, and Chapter 6 is stuttering and temporal properties.

As for the basics of PlusCal, that will be the next chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset