Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

7. Algorithms

Hillel Wayne¹

(1)

Chicago, Illinois, USA

One of the benefits of TLA+ being a specification language is that operators can be far more expressive and powerful than program functions can be. This is also a drawback: if your spec uses a “too powerful” operator, you cannot directly translate it to code. Usually this is fine: if you’re specifying a large system, you probably aren’t worrying that your sort function is correct.

If you’re directly writing a new sorting algorithm, though, you want to specify it. This chapter is about how we can write and verify algorithms with TLA+. While we will be implementing them, our focus is on the verification, not the implementation.

By “algorithm,” we’re assuming that algorithms are code intended to terminate and produce an output, rather than run forever or continuously interact with its environment.

Single-Process Algorithms

Most single-process algorithm specifications follow a template:

---- MODULE name ----

EXTENDS * whatever

Expected(input) == * ...

Helpers == * ...

(*--algorithm name

variables

input in * ...

output; * ...

* helper variables

begin

* algorithm implementation

assert output = Expected(input);

end algorithm; *)

====

Expected is what we’re actually trying to implement: it takes some input and returns the value our algorithm should, if correct, output. Helpers are anything that the algorithm will use that is outside of our verification scope. For example, if we were specifying some code for Python, we might make a Sort operator, as Python would give us sorted() by default.

For the PlusCal algorithm , we want to specify it works for a given range of inputs, and we will store the return value in output. Here we aren’t defining an initial value for output, since that’s something the algorithm would have to assign. TLC will create the constant DefaultInitValue <- [model constant] and initialize output to that. We also place any helper variables here, as we can’t define new variables in the middle of an algorithm (barring use of with). In the body, we write our implementation of the algorithm. Finally, we make sure that whatever output value we get matches our expectation.

Of course, this is just a starting guideline, not a strict rule. If our algorithm is complex, we might add procedures and macros to break it into parts. We might add assert statements as guards or sanity checks. Or we might want to add a global invariant to hold at every step of the spec, like we do with our larger systems.

Here’s what this might look like, all filled out:

EXTENDS Integers, TLC

add(a, b) == a + b

(*--algorithm add

variables

in_a in -5..5,

in_b in -5..5,

output;

begin

output := in_a + in_b;

assert output = add(in_a, in_b);

end algorithm; *)

Let’s do some examples .

Max

Given a sequence of numbers, return the largest element of that sequence. For example, max(<<1, 1, 2, -1>>) = 2.

First of all, we need our expected operator. We know that for a set, we can get the maximum with CHOOSE x in set: A y in set: y <= x. The maximum of a sequence would just be the maximum of its range. Putting those together:

EXTENDS Sequences

Max(seq) ==

LET set == {seq[i]: i in 1..Len(seq)}

IN CHOOSE x in set: A y in set: y <= x

We could also find the index that gives us the largest number, and then return the number at that index. It’s some duplicated effort, but some people might find it clearer.

Max(seq) ==

LET index ==

CHOOSE x in 1..Len(seq):

A y in 1..Len(seq): seq[y] <= seq[x]

IN seq[index]

Either way, here’s a full version of the algorithm:

EXTENDS Integers, Sequences, TLC

CONSTANTS IntSet, MaxSeqLen

ASSUME IntSet subseteq Int

ASSUME MaxSeqLen > 0

PT == INSTANCE PT

Max(seq) ==

LET set == {seq[i]: i in 1..Len(seq)}

IN CHOOSE x in set: A y in set: y <= x

AllInputs == PT!SeqOf(IntSet, MaxSeqLen)

(*--algorithm max

variables seq in AllInputs, i = 1, max;

begin

max := seq[1];

while i <= Len(seq) do

if max < seq[i] then

max := seq[i];

end if;

i := i + 1;

end while;

assert max = Max(seq);

end algorithm; *)

While AllInputs is “too powerful” to use in our algorithm, we only use it to generate inputs and not implement the algorithm itself. Set

defaultInitValue <- [ model value ]

IntSet <- -5..5

MaxSeqLen <- 5

This fails. Looking at the error, it tried to calculate <<>>[1], which is undefined. This is, incidentally, a reason why we don’t assign output in the variable. Try replacing the definition of max with max = seq[1] and comparing the two error outputs.

We can make this precondition explicit by adding assert Len(seq) > 0 to the beginning of the algorithm. That tells the reader that this implementation is only valid if you pass in a nonempty list. After that, it is fine for us to remove the empty sequence from our possible initial states, as we made it explicit that <<>> is a bad value. This means we will also remove {<<>>} from AllInputs.

AllInputs == PT!SeqOf(IntSet, MaxSeqLen) {<<>>}

(*--algorithm max

variables seq in AllInputs, i = 1, max;

begin

assert Len(seq) > 0;

max := seq[1];

DEFINITION OVERRIDES

Some people don’t like the AllInputs. You could argue that we’re saying the algorithm is only specified for sequences of at most five elements! It’d be better to say the input can be any sequence of integers: in Seq(Int).

But Int is an infinite set of numbers and Seq(Int) is an infinite set of sequences. TLC can’t enumerate that. If we want to write the spec this way, we need to tell TLC to use a different value when checking the model. We might, say, have it replace Int with 1..5. This is called a definition override . See Chapter 4 for more information on using definition overrides.

This should pass (1,576,685 states), meaning our implementation of “find max value” is correct, at least for the parameters we tested.

Leftpad

Given a character, a length, and a string, return a string padded on the left with that character to length n. If n is less than the length of the string, output the original string. For example, Leftpad(" ", 5, "foo") = " foo" , while Leftpad(" ", 1, "foo") = "foo".

Leftpad is a popular milestone in learning a theorem prover. It’s a simple algorithm with a surprisingly complex complete specification. For the TLA+ version, we will use a sequence of characters instead of a string, since that works somewhat better with the TLA+ operators.

Call the operator Leftpad(c, n, str). The complete specification is the following:

1.
Len(Leftpad(c, n, str)) = Max(n, Len(str)).
2.
The suffix of the output matches str.
3.
All of the characters before str are c.

In other words, some number of characters c prepended to str, such that the final length is n.

Leftpad(c, n, str) ==

LET

outlength == PT!Max(Len(str), n)

padlength ==

CHOOSE padlength in 0..n:

padlength + Len(str) = outlength

[x in 1..padlength |-> c] o str

>> Leftpad(" ", 1, <<"f", "o", "o">>)

<<"f", "o", "o">>

>> Leftpad(" ", 5, <<"f", "o", "o">>)

<<" ", " ", "f", "o", "o">>

Since we can pad with any character, the state space explodes very quickly. For optimization reasons we should not test this with all possible alphanumeric characters. Rather, we should choose some restricted subset for both c and str.

Characters == {"a", "b", "c"}

(*--algorithm leftpad

variables

in_c in Characters union {" "},

in_n in 0..6,

in_str in PT!SeqOf(Characters, 6),

output;

begin

output := in_str;

while Len(output) < in_n do

output := <<in_c>> o output;

end while;

assert output = Leftpad(in_c, in_n, in_str);

end algorithm; *)

This passes with 125,632 states. Try adding errors to see that TLC catches them. What happens when we replace Len(output) < in_n with Len(output) <= in_n?

One odd case is if we replace in_n in -1..6. The error is that there is no padding that satisfies leftpad. This is because 0..-1 is the empty set, so padlength is undefined in Leftpad. This means either our definition is wrong, because it doesn’t define what it means to pad with a negative number; or the spec is wrong, because we’re not supposed to be able to pad with a negative number. In other words, does Leftpad take any integer, or only nonnegative integers?

The integer case is simple enough. We just have to expand the definition of Leftpad to be str for n < 0.

Leftpad(c, n, str) ==

IF n < 0 THEN str ELSE

LET

outlength == PT!Max(Len(str), n)

padlength ==

CHOOSE padlength in 0..n:

padlength + Len(str) = outlength

[x in 1..padlength |-> c] o str

If Leftpad is supposed to take nonnegative integers, then it’s correct and our spec is wrong. As with max, we need to add a precondition.

(*--algorithm leftpad

variables

in_c in Characters union {" "},

in_n in 0..6,

in_str in PT!SeqOf(Characters, 6),

output;

begin

assert in_n >= 0;

output := in_str;

while Len(output) < in_n do

output := <<in_c>> o output;

end while;

assert output = Leftpad(in_c, in_n, in_str);

end algorithm; *)

Properties of Algorithms

Verifying correctness is easy enough: just run the spec and confirm you have the right result at the end. Verifying other properties like performance characteristics or bounds are more difficult. We can sometimes handle this by adding auxiliary variables and asserting their values at the end.

Let’s take binary search. A correct implementation of binary search will take approximately log₂(n) comparisons. Can we verify an algorithm does that?

First, let’s write a “binary search.” The only additional operator we need for a binary search is the set of all ordered sequences. We can get these by taking PT!SeqOf and filtering out all of the unordered ones.

OrderedSeqOf(set, n) ==

{ seq in PT!SeqOf(set, n):

A x in 2..Len(seq):

seq[x] >= seq[x-1] }

Putting it all together :

MaxInt == 4

Range(f) == {f[x]: x in DOMAIN f}

(*--algorithm definitely_binary_search

variables i = 1,

seq in OrderedSeqOf(1..MaxInt, MaxInt),

target in 1..MaxInt,

found_index = 0;

begin

Search:

while i <= Len(seq) do

if seq[i] = target then

found_index := i;

goto Result;

else

i := i + 1;

end if;

end while;

Result:

if target in Range(seq) then

assert seq[found_index] = target;

else

* 0 is out of DOMAIN seq, so can represent "not found"

assert found_index = 0;

end if;

end algorithm; *)

Definitely a binary search! It works (1,666 states), it always gets the correct result, so it’s binary search, no questions asked.

Okay, maybe one question: binary search has a worst-case of O(log(n)), while this looks like a worst-case of O(n). While we can’t compute the exact runtime, we can count the number of times we iterate in the while loop and use that as a rough measure of runtime complexity. Instead of defining Log, let’s go the other way: if we take the number of loop iterations and exponent it, it should be under the length of the sequence. We can define Pow2 in a similar way to how we defined factorial back in Chapter 3, by defining a recursive function over the set 0..n.

Pow2(n) ==

LET f[x in 0..n] ==

IF x = 0

THEN 1

ELSE 2*f[x-1]

IN f[n]

>> {Pow2(x): x in 0..5}

{1, 2, 4, 8, 16, 32}

Note

As mentioned back in Chapter 3, we could also make a generalized exponent function as a binary operator, defined as a ** n and writing a*f[x-1]. For simplicity, we’re not doing it here.

Our complexity assertion then becomes that for some iteration counter counter, Pow2(counter) <= Len(seq). In practice, though, we need to subtract one from counter before exponentiating it. To see why, a list of one element should require at most one iteration (if the single element matches target, we’re done), or 2⁰. For two and three elements, we need two checks (2¹), while for four elements, we need at most three. However, 2³ = 8, so Pow2(3) = 8 > 4 = Len(seq). If we subtract one, the invariant holds (2^3 − 1 = 4). Similarly, for 10 elements, we should need four iterations, and 2^4 − 1 < 10 < 2⁴.This doesn’t change the complexity, though, as ${2}^{n-1}=frac{1}{2}{2}^n$ , and we can ignore constants when determining algorithmic complexity.

variables i = 1,

seq in OrderedSeqOf(1..MaxInt, MaxInt),

target in 1..MaxInt,

found_index = 0,

counter = 0;

Search:

while i <= Len(seq) do

counter := counter + 1;

if seq[i] = target then

found_index := m;

goto Result;

end if;

i := i + 1

end while;

Result:

if Len(seq) > 0 then

assert Pow2(counter-1) <= Len(seq);

end if;

if target in PT!Range(seq) then

assert seq[found_index] = target;

else

assert found_index = 0;

end if;

Now this fails, as our “binary search” is too inefficient. By contrast, this is a real binary search :

(*--algorithm binary_search

variables low = 1,

seq in OrderedSeqOf(1..MaxInt, MaxInt),

high = Len(seq),

target in 1..MaxInt,

found_index = 0,

counter = 0;

begin

Search:

while low <= high do

counter := counter + 1;

with

m = (high + low) div 2

if seq[m] = target then

found_index := m;

goto Result;

elsif seq[m] < target then

low := m + 1;

else

high := m - 1;

end if;

end with;

end while;

Result:

if Len(seq) > 0 then

assert Pow2(counter-1) <= Len(seq);

end if;

if target in Range(seq) then

assert seq[found_index] = target;

else

assert found_index = 0;

end if;

end algorithm; *)

This passes (1483 states). Try again with MaxInt == 7, which also passes (141,675 states). Testing on higher values of MaxInt require us to modify Advanced Options > Cardinality of Largest Enumerable Set in our model, so let’s avoid that. We’ve demoed how to test asymptotic complexity for a worst-case scenario. Testing average and best-case complexity is outside the scope of what we can easily do with TLA+, unfortunately, and you should start reaching for another tool.

Sharp readers might have noticed a subtle bug in our impementation of binary search. While it works as an abstract algorithm, low + high might overflow the integer value on a machine. To see this, let’s save that computation and assert it’s under MaxInt:

while low <= high do

counter := counter + 1;

with

lh = low + high,

m = lh div 2

assert lh <= MaxInt;

if seq[m] = target then

found_index := m;

goto Result;

This fails, as if the sequence has MaxInt elements low + high = MaxInt + 1. This bug was first discovered in 2006,¹ years after we “proved” Binary Search correct.² The proposed fix is to instead write

with

lh = high - low,

m = high - (lh div 2)

which no longer overflows and still has the same performance and correctness characteristics. If you’re writing specs in a context where this might be the case, then you’d ideally want to make an invariant that all operations don’t make a variable overflow. We could do that here by making m and lh global variables and then adding an invariant on all variables:

NoOverflows ==

A x in {m, lh, low, high}:

x <= MaxInt

Multiprocess Algorithm

Multiprocess algorithms are similar to single-process algorithms, except that we want to check our assertion when all of the processes terminate. Instead of hard-coding an assertion in, we should encode it as a liveness requirement. This means using the “eventually always” (<>[]) operator, which checks that the algorithm ends with a certain thing being true.

Remember to use fair processes if you don’t want to simulate your algorithm crashing midway through.

EXTENDS Integers, Sequences, TLC

(*--algorithm counter_incrementer

variables

counter = 0,

goal = 3;

define

Success == <>[](counter = goal)

end define;

fair process incrementer in 1..3

variable local = 0

begin

Get:

local := counter;

Increment:

counter := local + 1;

end process;

end algorithm; *)

This, unsurprisingly, fails, as our processes can increment based off stale memory. If we merge the two labels into one label, this succeeds with 22 states.

Summary

We verified single-process algorithms were correct and some additional nonfunctional properties about them, such as their worst-case performance and that they didn’t overflow our computer’s maximum value. We also briefly summarized how to extend these ideas to multiprocess algorithms, using temporal properties instead of bare assertions.

Many algorithms are defined for specific data structures. And many specs for systems are designed assuming you have your data organized in a specific way. In the next chapter, we will show how to write reusable data structures for algorithms and specifications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 7. Algorithms

Create new playlist

Sign In

Sign Up

7. Algorithms

Single-Process Algorithms

Max

DEFINITION OVERRIDES

Leftpad

Properties of Algorithms

Note

Multiprocess Algorithm

Summary

Table of Contents for
7. Algorithms