Does the algorithm always terminate?
Will all messages in the queue get processed?
If disrupted, will the system return to a stable state over time?
Is the database eventually consistent?
Temporal properties are very powerful but also much harder to guarantee. Systems have a whole new set of failure modes that apply to temporal properties. As always, as a system gets harder to analyze, specifying and model checking it becomes more important.
Termination
The simplest temporal property is Termination. This is the requirement that the algorithm eventually ends. If the algorithm crashes or enters an infinite loop, then it violates termination.
Create a model and, under Model Overview > What to Check? > Properties, check Termination. Before you run it, try to predict what will happen.
What does that mean?
Stuttering
TLA+ is the “Temporal Logic of Actions.” Every step of the model represents a single action in time. TLC works by looking at all of the enabled labels at each step and picking one. However, it also has another option: it can take no action at all. We call this stuttering. In most cases, stuttering doesn’t change the spec: if no action happens, then everything’s the same as before and it didn’t matter. The one special case is if the spec keeps stuttering, over and over again, and never takes any other action. It’s as if the spec is frozen in time, never able to change.
Up until now, stuttering hasn’t mattered. All of our invariants are safety checks, which checks the model can’t reach an invalid state. Since stuttering on a valid state leaves you in a valid state, TLC had no reason to try stuttering. Most temporal properties, though, are what’s called liveness checks. A liveness check is one that verifies the system eventually does what you expect it to do. Here, TLC never finishes evaluating Cycle so the spec never terminates.
Stuttering can be useful to us. It can represent a server crashing, or a process timing out, or an awaited signal never coming. It’s better that TLA+ defaults to “everything can crash” than the converse: otherwise our models may only work because of an implicit assumption. If you want to explicitly rule out stuttering, you need to add fairness.
Fairness, Weak and Strong
First, see what happens when only one process is fair. If only the car is fair, then the light might never cycle. If only the light is fair, it will eventually cycle to green, but the car might never move. Try this, and then see what happens when both processes are fair.
You should still see the spec fail! There’s one case we didn’t cover: What if the light keeps cycling between green and red? The Drive action is enabled, then disabled, then enabled again, ad infinitum. But weak fairness only guarantees the action will happen if it stays enabled. If the light is always green, the car will eventually drive through. But since it can keep cycling, the car is stuck.
This, finally, will always terminate. Even if the light keeps cycling, the Drive action is repeatedly enabled and so is guaranteed to happen. Note that this still requires the light to be weakly fair : if it’s unfair, it can simply cycle to red and stay there. In practice, people don’t often use strong fairness; it’s a much safer to assume the system is only weakly fair. However, it’s worth knowing about for the cases where it is useful.
Tip
You can also make individual actions in a process fair. For label A: in an unfair process, writing A:+ will make it weakly fair. In a weakly fair process, A:+ will make it strongly fair.
You can also make the spec globally fair by writing --fair algorithm instead of --algorithm.
The Temporal Operators
For all of these assume P and Q are Boolean statements.
[]
[] is always. []P means that for P is true for all states in all behaviors. This is useful enough that TLC is designed around it: saying P is an invariant is an optimized way of saying that []P is a temporal property, and in fact TLC uses a much faster algorithm to evaluate invariants. As such we rarely use it explicitly, except for specifying especially advanced properties.
You can also write ~[]P, which means that P will be false for at least one state.
<>
then <>(light = "red") would be an unsatisfied temporal property: TLC can find a possible execution where the light is never red.
You can also write ~<>P, which means that P is never true. Note that this is the same as saying []~P, and in fact <>P is formally defined as ~[]~P.
Note
Termination is defined as “eventually all processes are done”: Termination == <>(A self in ProcSet: pc[self] = "Done").
The current version of TLC cannot check set membership of a variable set as part of a property with <>. So you can write <>(set = {}), but if you write <>(x in set), set must either be a constant or a parameterless operator.
~>
then it would not hold. The first time the light turns red, it later turns green, which is fine. But the second time it turns red it doesn’t eventually turn green again, so the property is false.
~P ~> Q and P ~> ~Q have their expected meanings. ~(P ~> Q) makes TLC explode.
You can also do P ~> []Q. If P is true, then there is some state where Q becomes true and forever stays true.
[ ]<> and <>[ ]
[]<>P means that P is always eventually true, <>[]P means that P is eventually always true. For a finite spec, these mean the same thing: P is true at termination. For an infinite spec, <>[]P means that there is some point where P becomes true and forever stays true, while []<>P means that if P ever becomes false, it will eventually become true again. Another way to think about it is that []<>P <=> (~P ~> P): P being false leads to P being true later.
then <>[](light = "green") becomes false and []<>(light = "red") becomes true.
As with <>, TLC cannot check set membership of a variable set as part of a property with <>[] or []<>.
Limitations of Liveness
- 1.
Temporal properties can be incredibly powerful.
- 2.
Temporal properties can be incredibly confusing.
Fact is, you don’t often need them. Most often what you want can be expressed as an invariant. The rest of the time you’re usually fine with [], <>, and simple uses of ~>. As long as you’re not writing something like <>~(P ~> []Q) you’re probably fine.
From a practical perspective, the main limitation of temporal properties is that checking liveness is slow. Very slow. Invariants are checked on individual states at a time, while temporal properties have to be checked over sequences of states. TLC uses a different algorithm for this, which is slower and is not parallelizable.
When checking temporal properties, place them in a separate model from your invariants. This way you can test the invariants much more quickly before checking the slower temporal properties. Also consider testing liveness over a smaller domain. If you can check invariants with MaxFoo <- 5, it might take the same time to check liveness for MaxFoo <- 3. You can, of course, simply leave TLC running for a longer time. Having a model take a day to check is unpleasant, but it’s better than having a mistake in your design.
There’s one other, extremely important limitation of temporal properties: do not combine temporal properties and symmetry sets. Regular sets of model constants are fine, but not symmetry sets. TLC optimizes symmetry sets by skipping redundant states, which may lead to it missing a liveness error. Almost all of the mistakes you can make using TLC are false positives: the checker will report spec errors that aren’t actually in the design. This is one of the extremely few false negatives: it could potentially tell you that a spec is valid when it really has errors. TLC will warn you if you accidentally do this.
Example
Now that you know how to use temporal properties, let’s apply it to a more interesting example than a traffic light. Dekker’s Algorithm was, historically, the first successful algorithm to allow two threads to share a single resource without having a race condition. It guarantees that both threads will eventually perform their update, but not at the same time, and without using any CPU-specific features. The only thing you need is some shared memory. We will specify it in TLA+ and show it works as expected.
For any two threads, they both can’t be in CS at the same time. We need the t1 /= t2 clause in there to make sure they’re different threads. Otherwise, TLC can pick the same thread as both t1 and t2.
TLC can handle this property because the set, Threads, is a constant. This means that all threads eventually reach the cs step. If we add Liveness as a temporal property, the spec fails: both threads get endlessly stuck cycling in P2. This is called a livelock.
Warning
A common mistake is putting the temporal operator in the wrong place. If you write <>A t in Threads: pc[t] = "CS", you’re instead saying “there is a state where all the threads are simultaneously in CS”, which directly contradicts our AtMostOneCritical invariant.
This will pass (256 states). We’ve guaranteed our liveness properties for two threads.
While Dekker’s Algorithm is simple and satisfies all properties, it has a couple of problems. The first is that it only applies for two threads: if you extend it to Threads <- 1..3 it will fail. While two of the three threads will always reach CS, one thread won’t. This is called resource starvation . Try making some simple changes and seeing if you can generalize it. Remember to place every operation in a separate label, and don’t be surprised if you can’t manage it. The successful generalizations get very convoluted.
This fails. The crashing thread can reach P2_1_1 and never execute it, causing the fair thread to cycle in P2 forever. As with generalizing to three threads, fixing the resiliency bug requires major changes to the algorithm that are outside the scope of this book.
Summary
In this chapter we learned about fairness, liveness, termination, and stuttering. We also learned about temporal operators, how they are powerful, and how they can be tricky. We did an example of Dekker’s Algorithm.
With this, we have now covered all of the core material of the book. In the next chapters, we will not introduce any new syntax or rules. The rest of the book will teach you how to use TLA+ better and how to apply it to a wide variety of real-world problems.