Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 2
Numerical Algorithms

Numerical algorithms calculate numbers. They perform such tasks as randomizing values, breaking numbers into their prime factors, finding greatest common divisors, and computing geometric areas.

All of these algorithms are useful occasionally, but they also demonstrate useful algorithmic techniques such as adaptive algorithms, Monte Carlo simulation, and using tables to store intermediate results.

Randomizing Data

Randomization plays an important role in many applications. It lets a program simulate random processes, test algorithms to see how they behave with random inputs, and search for solutions to difficult problems. Monte Carlo integration, which is described in the later section “Performing Numerical Integration,” uses randomly selected points to estimate the size of a complex geometric area.

The first step in any randomized algorithm is generating random numbers.

Generating Random Values

Even though many programmers talk about “random” number generators, any algorithm used by a computer to produce numbers is not truly random. If you knew the details of the algorithm and its internal state, you could correctly predict the “random” numbers it generates.

To get truly unpredictable randomness, you need to use a source other than a computer program. For example, you could use a radiation detector that measures particles coming out of a radioactive sample to generate random numbers. Because no one can predict exactly when the particles will emerge, this is truly random.

Other possible sources of true randomness include rolling dice, analyzing static in radio waves, and studying Brownian motion. Random.org measures atmospheric noise to generate random numbers. (You can go to https://www.random.org to get true random numbers.) You may also be able to use a hardware random number generator (HRNG). Search the Internet or look at https://en.wikipedia.org/wiki/Hardware_random_number_generator for more information on those.

Unfortunately, because these sorts of true random-number generators (TRNGs) are relatively complicated and slow, most applications use a faster pseudorandom number generator (PRNG) instead. For many applications, if the numbers are in some sense “random enough,” a program can still make use of them and get good results.

Generating Values

One simple and common method of creating pseudorandom numbers is a linear congruential generator, which uses the following relationship to generate numbers:

Here , , and are constants.

The value initializes the generator so that different values for produce different sequences of numbers. value that is used to initialize the pseudorandom number generator, such as in this case, is called the seed.

Because all the values in the number sequence are taken modulo , after at most M numbers, the generator produces a number it produced before, and the sequence of numbers repeats from that point.

As a small example, suppose , , and . If you start with , the previous equation gives you the following sequence of numbers:

Because , the sequence repeats.

The values 0, 5, 7, 10, 9, 2, 8, 6, 3, 4 appear to be fairly random. But now that you know the method that the program uses to generate the numbers, if someone tells you the method's current number, you can correctly predict those that follow.

Some PRNG algorithms use multiple linear congruential generators with different constants and then select from among the values generated at each step to make the numbers seem more random and to increase the sequence's repeat period. That can make programs produce more random-seeming results, but those methods are still not truly random.

One feature of PRNGs that is sometimes an advantage is that you can use a particular seed value to generate the same sequence of “random” values repeatedly. That may seem like a disadvantage because it means that the numbers are more predictable, but being able to use the same numbers repeatedly can make some programs much easier to debug.

Being able to repeat sequences of numbers also lets some applications store complex data in a very compact form. For example, suppose a program needs to make an object perform a long and complicated pseudorandom walk on a map. The program could generate the walk and save all of its coordinates so that it can redraw the route later. Alternatively, it could just save a seed value. Then, whenever it needs to draw the route, it can use the seed to reinitialize a PRNG so that it produces the same walk each time.

The RandomTrees and random_trees programs, shown in Figure 2.1, use seed values to represent random trees. Enter a seed and click Go to generate a random tree. If two seed values differ by even 1, they produce very different results.

Screenshot of the RandomTrees and random_trees programs. — **Figure 2.1**: Even slightly different seeds lead to very different random trees.

These programs use the seed value you enter to generate drawing parameters such as the number of branches the tree creates at each step, the angle at which the branches bend from their parent branch, and how much shorter each branch is than its parent. You can download the programs from the book's website to see the details.

If you enter the same seed number twice, you produce the same tree both times.

CRYPTOGRAPHICALLY SECURE PRNGs

Any linear congruential generator has a period over which it repeats, and that makes it unusable for cryptographic purposes.

For example, suppose you encrypt a message by using a PRNG to generate a value for each letter in the message and then add that value to the letter. For example, the letter A plus 3 would be D, because D is three letters after A in the alphabet. If you get to Z, you wrap around to A. So, for example, .

This technique works quite well as long as the sequence of numbers is random, but a linear congruential generator has a limited number of seed values. All you need to do to crack the code is to try to decrypt the message with every possible seed value. For each possible decryption, the program can look at the distribution of letters to see whether the result looks like real text. If you picked the wrong seed, every letter should appear with roughly equal frequency. If you guessed the right seed, some letters, such as E and T, will appear much more often than other letters, such as J and X. If the letters are very unevenly distributed, you have probably guessed the seed.

This may seem like a lot of work, but on a modern computer it's not very hard. If the seed value is a 32-bit integer, only about 4 billion seed values are possible. A modern computer can check every possible seed in just a few seconds or, at most, minutes.

A cryptographically secure pseudorandom number generator (CSPRNG) uses more complicated algorithms to generate numbers that are harder to predict and to produce much longer sequences without entering a loop. They typically have much larger seed values. A simple PRNG might use a 32-bit seed. A CSPRNG might use keys that are 3,072 bits long to initialize its algorithm.

CSPRNGs are interesting and very “random,” but they have a couple of disadvantages. They are complicated, so they're slower than simpler algorithms. They also may not allow you to do all of the initialization manually, so you may be unable to generate a repeatable sequence easily. If you want to use the same sequence more than once, you should use a simpler PRNG. Fortunately, most algorithms don't need a CSPRNG, so you can use a simpler algorithm.

Ensuring Fairness

Usually programs need to use a fair PRNG. A fair PRNG is one that produces all of its possible outputs with the same probability. A PRNG that is unfair is called a biased PRNG. For example, a coin that comes up heads two-thirds of the time is biased.

Many programming languages have methods that produce random numbers within any desired range. But if you need to write the code to transform the PRNG's values into a specific range, you need to be careful to do so in a fair way.

A linear congruential generator produces a number between 0 (inclusive) and (exclusive), where is the modulus used in the generator's equation:

Usually, a program needs a random number within a range other than 0 to . An obvious but bad way to map a number produced by the generator into a range to is to use the following equation:

For example, to get a value between 1 and 100, you would calculate the following:

The problem with this is that it may make some results more likely than others.

To see why, consider a small example where , , and . If the generator does a reasonable job, it produces the values 0, 1, and 2 with roughly equal probability. If you plug these three values into the preceding equation, you get the values shown in Table 2.1.

Table 2.1: PRNG Values and Results Mapped with a Modulus

GENERATOR VALUE	RESULT
0	0
1	1
2	0

The result 0 occurs twice as often as the result 1 in Table 2.1, so the final result is biased.

In a real PRNG where the modulus is very large, the problem is smaller, but it's still present.

A better approach is to convert the value produced by the PRNG into a fraction between 0 and 1 and then multiply that by the desired range, as in the following formula:

Another method of converting a pseudorandom value from one range to another is simply to ignore any results that fall outside the desired range. In the previous example, you would use the limited PRNG to generate a value between 0 and 2. If you get a 2, which is outside the desired range, you ignore it and get another number.

For a slightly more realistic example, suppose that you want to give a cookie to one of four friends, and you have a six-sided die. In that case, you could simply roll the die repeatedly until you get a value between 1 and 4.

Getting Fairness from Biased Sources

Even if a PRNG is unfair, there may be a way to generate fair numbers. For example, suppose that you think a coin is unfair. You don't know the probabilities of getting heads or tails, but you suspect the probabilities are not 0.5. In that case, the following algorithm produces a fair coin flip:

Flip the biased coin twice.
    If the result is {Heads, Tails}, return Heads.
    If the result is {Tails, Heads}, return Tails.
    If the result is something else, start over.

To see why this works, suppose the probability of the biased coin coming up heads is , and the probability of its coming up tails is . Then the probability of getting heads followed by tails is . The probability of getting tails followed by heads is . The two probabilities are the same, so the probability of the algorithm returning heads or tails is the same, and the result is fair.

If the biased coin gives you heads followed by heads or tails followed by tails, you need to repeat the algorithm. If you are unlucky or the coin is very biased, you may need to repeat the algorithm many times before you get a fair result. For example, if of the time the coin will give you heads followed by heads, and 1% of the time it will give you tails followed by tails. That means that you would fail to generate a fair flip and need to repeat the algorithm roughly 82% of the time.

You can use a similar technique to expand the range of a PRNG. For example, suppose you want to give one of your five friends a cookie and your only source of randomness is a fair coin. In that case, you can flip the coin three times and treat the results as a binary number with heads representing 1 and tails representing 0. For example, {heads, tails, heads} corresponds to the value 101 in binary, which is 5 in decimal. If you get a result that is outside the desired range (in this example, {heads, heads, heads} gives the result 111 binary or 7 decimal, which is greater than the number of friends present), you discard the result and try again.

In conclusion, the PRNG tools that come with your programming language are probably good enough for most programs. If you need better randomness, you may need to look at CSPRNGs. Using a fair coin to pick a random number between 1 and 100 or using a biased source of information to generate fair numbers are situations that are more useful under weird circumstances or as interview questions than they are in real life.

Randomizing Arrays

Randomizing the items in an array is a fairly common task in programs. In fact, it's so common that Python's random module includes a shuffle method that does exactly that.

For example, suppose a scheduling program needs to assign employees to work shifts. If the program assigns the employees alphabetically, in the order in which they appear in the database or in some other static order, the employee who always gets assigned to the late-night shift will complain.

Some algorithms can also use randomness to prevent a worst-case situation. For example, the standard quicksort algorithm usually performs well, but if the values it must sort are initially already sorted, the algorithm performs terribly. One way to avoid that situation would be to randomize the values before sorting them.

The following algorithm shows one way to randomize an array:

RandomizeArray(String: array[])
    Integer: max_i = <Upper bound of array>
    For i = 0 To max_i - 1
        // Pick the item for position i in the array.
        Integer: j = <random number between i and max_i>
        <Swap array[i] and array[j]>
    Next i
End RandomizeArray

This algorithm visits every position in the array once, so it has a run time of , which should be fast enough for most applications.

Note that repeating this algorithm does not make the array “more random.” When you shuffle a deck of cards, items that start near each other tend to remain near each other (although possibly less near each other), so you need to shuffle several times to get a reasonably random result. This algorithm completely randomizes the array in a single pass, so running it again just wastes time.

A task similar to randomizing an array is picking a certain number of random items from an array without duplication.

A FAIRLY RANDOM ARRAY

One other important consideration of this algorithm is whether it produces a fair arrangement. In other words, is the probability that an item ends up in any given position the same for all positions? For example, it would be bad if the item that started in the first position finished in the first position half of the time.

I said in the introduction that this book doesn't include long mathematical proofs, so if you want you can skip the following discussion and take my word for it that the randomness algorithm is fair. If you know some probability, however, you may find the discussion interesting.

For a particular item in the array, consider the probability of its being placed in position . To be placed in position , it must not have been placed in positions and then placed in position .

Define to be the probability of the item's not being placed in position given that it was not previously placed in positions . Also, define to be the probability of the item's being placed in position given that it was not placed in positions . Then the total probability that the item is placed in position is .

is , so is .

After the first item is assigned, items could be assigned to position 2, so is , and is .

More generally, and .

If you multiply the probabilities together, gives the following equation:

If you look at the equation, you'll see that the numerator of each term cancels out with the denominator of the following term. When you make all the cancelations, the equation simplifies to .

This means that the probability of the item being placed in position is no matter what is, so the arrangement is fair.

For example, suppose you're holding a drawing to give away five copies of your book (something I do occasionally), and you get 100 entries. One way to pick five names is to put the 100 names in an array, randomize it, and then give the books to the first five names in the randomized list. The probability that any particular name is in any of the five winning positions is the same, so the drawing is fair.

Generating Nonuniform Distributions

Some programs need to generate pseudorandom numbers that are not uniformly distributed. Often these programs simulate some other form of random-number generation. For example, a program might want to generate numbers between 2 and 12 to simulate the roll of two six-sided dice.

You can't simply pick pseudorandom numbers between 2 and 12, because you won't have the same probability of getting each number that you would get by rolling two dice.

The solution is actually to simulate the dice rolls by generating two numbers between 1 and 6 and adding them together.

Occasionally, you might want to select random items with specific probabilities. For example, you might like to pick one of the three colors. You want to pick red with probability 0.25 (25%), green with probability 0.30 (30%), and blue with probability 0.45 (45%).

One way to do that is to pick a random value between 0 and 1. Then loop through the probabilities subtracting each from the random value. When the result drops to zero or lower, you select the corresponding color. The following pseudocode shows this algorithm:

// Pick an item from a array with given probabilities.
Item: PickItemWithProbabilities(Item: items[],
        Float: probabilities[])
    Float: value = <PRNG value where 0 <= value < 1>
    For i = 0 To items.Length - 1
        value = value – probabilities[i]
        If (value <= 0) Then Return items[i]
    Next i
End PickItemWithProbabilities

For this method to work, the items and probabilities arrays must have the same lengths. The values in the probabilities arrays must also add up to 1.

Making Random Walks

As you can probably guess from the name, a random walk is a path generated at random. Usually the path consists of steps with a fixed length that move the path along some sort of lattice, such as a rectangular or hexagonal grid. The following pseudocode shows a method that generates points in a random walk on a rectangular grid:

Point[]: MakeWalk(Integer: num_points)
    Integer: x = <X coordinate of starting point>
    Integer: y = <Y coordinate of starting point>
    List Of Point: points
    points.Add(x, y)
    For i = 1 To num_points – 1
        direction = random(0, 3)
        If (direction == 0) Then        // Up
            y -= step_size
        Else If (direction == 1) Then   // Right
            x += step_size
        Else If (direction == 2) Then   // Down
            y += step_size
        Else                            // Left
            x -= step_size
        End If
 
        points.Add(x, y)
    Next i
 
    Return points
End MakeWalk

This method sets variables x and y to the coordinates of the starting point, possibly in the center of the drawing area. It then enters a loop where it picks a random integer between 0 and 3 and uses that value to move the point (x, y) up, down, left, or right. After the loop ends, the method returns the walk's points.

Figure 2.2 shows the RandomWalk example program using this algorithm to create a random walk.

Screenshot of RandomWalk example program, which uses algorithm to create a random walk. — **Figure 2.2**: This program generates random walks on a square grid.

You can make similar random walks on other grids. For example, you can make a walk on a triangular lattice by picking random directions, as shown in Figure 2.3.

Illustration of random directions that produce a walk on a triangular lattice. — **Figure 2.3**: These random directions produce a walk on a triangular lattice.

Making Self-Avoiding Walks

A random self-avoiding walk, which is also called a non-self-intersecting walk, is a random walk that is not allowed to intersect itself. Usually, the walk continues on a finite lattice until no more moves are possible.

Figure 2.4 shows the SelfAvoidingWalk example program displaying a random self-avoiding walk on a grid. The walk started at the circled node and continued until it got stuck at a point where it had no unvisited neighboring nodes.

Illustration of SelfAvoidingWalk example program depicting a random self-avoiding walk on a 6 x 6 grid, moving randomly but does not visit any node twice. — **Figure 2.4**: A self-avoiding random walk moves randomly but does not visit any node twice.

The algorithm is similar to the one used to build a random walk, except it only allows the walk to move to unvisited lattice points. The following pseudocode code shows the new algorithm:

Point[]: SelfAvoidingWalk(Integer: num_points)
    Integer: x = <X coordinate of starting point>
    Integer: y = <Y coordinate of starting point>
 
    List Of Point: points
    Points.Add(x, y)
 
    For i = 1 To num_points – 1
        List Of Point: neighbors = <unvisited neighbors of (x, y)>
        If (neighbors Is Empty) Then Return points
        <Move to a random unvisited neighboring point>
    Next i
 
    Return points
End SelfAvoidingWalk

At each step, the new algorithm makes a list of the points that are neighbors to the walk's most recent point. If the neighbor list is empty, then the walk cannot continue, so the method returns the walk so far. If the neighbor list is not empty, the algorithm moves to a random neighbor and continues.

Making Complete Self-Avoiding Walks

The self-avoiding random walk shown in Figure 2.4 eventually got stuck, so it did not visit every node in the lattice. A complete random self-avoiding walk is a walk that visits every node in a finite lattice. The CompleteSelfAvoidingWalk example program shown in Figure 2.5 draws complete self-avoiding walks.

Illustration of a complete self-avoiding random walk that visits every node in the lattice. — **Figure 2.5**: A complete self-avoiding random walk visits every node in the lattice.

Building a complete walk is a bit trickier than building any old random walk because many paths lead to dead ends where the walk cannot be extended. For example, the walk shown in Figure 2.4 starts at the circled node and then wanders around randomly until it reaches a point where it has no unvisited neighbors.

To avoid getting stuck in a dead end, the algorithm must be able to unwind bad decisions. It needs to be able to undo previous moves so that it can return to a state where a complete walk is possible. That strategy of unwinding bad decisions in a program is called backtracking.

The following pseudocode shows one possible backtracking approach to building complete walks:

Point[]: CompleteSelfAvoidingWalk(Integer: num_points)
    Integer: x = <X coordinate of starting point>
    Integer: y = <Y coordinate of starting point>
 
    List Of Point: points
    Points.Add(x, y)
 
    ExtendWalk(points, num_points)
 
    Return points
End CompleteSelfAvoidingWalk
 
// Extend a walk that we have built so far.
// Return True if we find a complete walk.
Boolean: ExtendWalk(Point[] walk, Integer: num_points)
    If (points.Length == num_points) Then Return True
 
    List Of Point: neighbors = <unvisited neighbors of (x, y)>
    If (neighbors Is Empty) Then Return False
    <Randomize neighbors>
 
    For Each neighbor In neighbors
        <Add neighbor to points>
        If (ExtendWalk(points, num_points) Then Return True
        <Remove neighbor from points>
    Next i
 
    Return False
End ExtendWalk

This CompleteSelfAvoidingWalk method creates a list of points and adds the starting point to it. It then calls the ExtendWalk method to try to extend the walk that it has created so far to a complete walk.

The ExtendWalk method first checks the length of the current walk. If the walk contains num_points steps, then it is a complete walk, so the method returns True to indicate that it found a complete walk.

If the walk is not complete, the method builds a list of points that neighbor the walk's current last point (x, y) and that have not been visited yet. If that list is empty, then the current walk cannot be extended, so the method returns False to indicate that no complete walk is possible starting from that initial walk.

If the neighbor list is not empty, the method randomizes the list and loops through them. It tries adding each neighbor to the walk and calls ExtendWalk to see whether the new partial walk can be extended to a complete walk. If any of the calls to ExtendWalk returns True, then it has found a solution, so the current call to ExtendWalk also returns True.

If a particular neighbor cannot lead to a complete walk, the method removes that point from the walk and tries again with the next neighbor.

If none of the neighbors leads to a complete walk, then no complete walk is possible from the starting walk, so the method returns False.

Finding Greatest Common Divisors

The greatest common divisor (GCD) of two integers is the largest integer that evenly divides both of the numbers. For example, GCD(60, 24) is 12 because 12 is the largest integer that evenly divides both 60 and 24. (The GCD may seem like an esoteric function, but it is actually quite useful in cryptographic routines that are widely used in business to keep such things as financial communications secure.)

The following section explains an algorithm for finding greatest common divisors. The section after that describes an extension that lets you find an equation related to greatest common divisors.

Calculating Greatest Common Divisors

One way to find the GCD is to factor the two numbers and see which factors they have in common. However, the Greek mathematician Euclid recorded a faster method in his treatise, Elements, circa 300 BC.

The following pseudocode shows the modern version of the algorithm. Because it is based on Euclid's work, this algorithm is called the Euclidian algorithm or Euclid's algorithm.

Integer: GCD(Integer: A, Integer: B)
    While (B != 0)
        Integer: remainder = A Mod B
        // GCD(A, B) = GCD(B, remainder)
        A = B
        B = remainder
    End While
    Return A
End GCD

For example, consider GCD(4851, 3003). Table 2.2 shows the values for , , and Mod at each step.

Table 2.2: Values Used to Calculate GCD(4851, 3003)

A	B	A MOD B
4,851	3,003	1,848
3,003	1,848	1,155
1,848	1,155	693
1,155	693	462
693	462	231
462	231	0
231	0

When becomes 0, the variable holds the GCD—in this example, 231. To verify the result, note that and , so 231 divides both numbers. The values 21 and 8 have no common factors (they are relatively prime), so 231 is the largest integer that divides 4,851 and 1,848.

GREAT GCDs

This is another mathematical explanation that you can skip if you really want.

The key to Euclid's algorithm is the fact that GCD(A, B) = GCD(B, A Mod B).

To understand why this is true, consider the definition of the modulus operator. If the remainder Mod , then for some integer . If is the of and , then divides evenly, so it must also divide evenly. Because divides evenly and , must also divide evenly. Because divides evenly, it must also divide evenly.

This proves that g divides both and . To say you still need to know that g is the largest integer that divides and evenly.

Suppose is an integer larger than , and divides both and . Then also divides . But , so G divides A as well. This means that is not . This contradicts the assumption that . Because the assumption that leads to a contradiction, there must be no such , and is .

This algorithm is quite fast because the value decreases by at least a factor of for every two trips through the While loop. Because the size of decreases by a factor of at least for every two iterations, the algorithm runs in time at most .

Extending Greatest Common Divisors

In addition to being the largest integer that divides evenly into two values and , the GCD also plays a role in an interesting theorem called Bézout's identity. That theorem states that, for any integers and , there are other integers and such that .

For example, . You can set and in Bézout's identity to get the equation .

You can use an extended version of Euclid's GCD algorithm to find the integers and that satisfy Bézout's identity. The extended algorithm defines four values at each step of the calculation.

The values and are the quotient and remainder that you get after dividing one number by the other. The remainder plays the role of and in Euclid's algorithm.

You calculate the values and by adding combinations of previous values multiplied by the current value of .

The following equations show how you initialize the first values for , , and :

The following equations show how you calculate , , , and for the algorithm's subsequent rounds:

Here, / represents integer division, and the algorithm discards any remainder. The % symbol represents the modulus operator.

When equals , the algorithm stops and the current values of and , which are and , are the values that satisfy Bézout's identity. The value holds the GCD.

For an example, let's work through the algorithm's calculations where and . The following table shows the values of , , , and for the algorithm's rounds:

ROUND	Q	R	X	Y
0
1
2
3
4
5

At this point, equals , so the algorithm stops. The bold value is 14, which is GCD(210, 154). The bold values and are and . The following equation shows those values plugged into Bézout's identity:

The extended GCD algorithm's pseudocode simply implements the algorithm described in the preceding paragraphs, so I won't show it here.

Performing Exponentiation

Sometimes a program needs to calculate a number raised to an integer power. That's not hard if the power is small. For example, is easy to evaluate by multiplying .

For larger powers such as , however, this would be fairly slow.

Fortunately, there's a faster way to perform this kind of operation. This method is based on the fact that you can quickly calculate powers of a number that are themselves powers of 2. For example, consider the value , which is simply . From that, you can calculate because . Similarly, you can calculate because . You can then calculate because , and so on.

Now that you know how to calculate some large powers of A quickly, you need to figure out how to assemble them to produce any large power. To do that, consider the binary representation of the exponent. Each of the digits in that representation correspond to the powers of , and so on.

For example, suppose you want to calculate . The binary representation of 18 is 10010. Reading the binary digits from right to left, the digits correspond to the values , and . You can use those special powers of A to calculate . In this case, .

That's the basis for the fast exponentiation algorithm. You build bigger and bigger powers of A and use the binary digits of the exponent to decide which of those should be multiplied into the final result. The following pseudocode shows the algorithm:

// Perform the exponentiation.
Integer: Exponentiate(Integer: value, Integer: exponent)
    Integer: result = 1
    Integer: factor = value
    While (exponent != 0)
        If (exponent Mod 2 == 1) Then result *= factor
        factor *= factor
        exponent /= 2
    End While
 
    Return result
End Exponentiate

The algorithm sets result to 1. Initially, this holds value to the 0^th power, which is 1 for any value.

The algorithm also sets factor equal to value. This will represent the powers of value. Initially, it holds value to the first power.

The code then enters a loop that executes as long as the exponent is not zero. Inside the loop, the algorithm uses the modulus operator to see whether the exponent is odd. If it is odd, then its binary representation ends with a 1. In that case, the algorithm multiplies result by the current value of factor to include that power of value in the result.

The algorithm then multiplies factor by itself so that it represents value raised to the next power of 2. It also divides the exponent by 2 to remove its least significant binary digits.

When the exponent reaches zero, the algorithm returns the result.

The algorithm's loop executes once for each binary digit in the exponent. If the exponent is , then it has binary digits, so the algorithm runs in time. That's fast enough to calculate some pretty large values. For example, if is , is about 20, so this algorithm uses about 20 steps.

One limitation of this algorithm is that values raised to large powers grow extremely large. Even a “small” value such as has 254 decimal digits. This means that multiplying the huge numbers needed to calculate large powers is slow and takes up quite a bit of space.

Fortunately, the most common applications for these kinds of huge powers are cryptographic algorithms that perform all of their operations in a modulus. The modulus is large, but it still limits the numbers' size. For example, if the modulus has 100 digits, the product of two 100-digit numbers can have no more than 200 digits. You then reduce the result with the modulus to again get a number with no more than 100 digits. Reducing each number with the modulus makes each step slightly slower, but it means you can calculate values of practically unlimited size.

Working with Prime Numbers

As you probably know, a prime number (or simply prime) is a counting number (an integer greater than 0) greater than 1 whose only factors are 1 and itself. A composite number is a counting number greater than 1 that is not prime.

Prime numbers play important roles in some applications where their special properties make certain operations easier or more difficult. For example, some kinds of cryptography use the product of two large primes to provide security. The fact that it is hard to factor a number that is the product of two large primes is what makes the algorithm secure.

The following sections discuss common algorithms that deal with prime numbers.

Finding Prime Factors

The most obvious way to find a number's prime factors is to try dividing the number by all of the numbers between 2 and 1 less than the number. When a possible factor divides the number evenly, save the factor, divide the number by it, and continue trying more possible factors. Note that you need to try the same factor again before moving on in case the number contains more than one copy of the factor.

For example, to find the prime factors of 127, you would try to divide 127 by 2, 3, 4, 5, and so on, until you reach 126.

The following pseudocode shows this algorithm:

List Of Integer: FindFactors(Integer: number)
    List Of Integer: factors
    Integer: i = 2
    While (i < number)
        // Pull out factors of i.
        While (number Mod i == 0)
            // i is a factor. Add it to the list.
            factors.Add(i)
 
            // Divide the number by i.
            number = number / i
        End While
 
        // Check the next possible factor.
        i = i + 1
    End While
 
    // If there's anything left of the number, it is a factor, too.
    If (number > 1) Then factors.Add(number)
 
    Return factors
End FindFactors

If the number is , this algorithm has run time .

You can improve this method considerably with these three key observations:

You don't need to test whether the number is divisible by any even number other than 2 because, if it is divisible by any even number, it is also divisible by 2. This means you only need to check divisibility by 2 and then by odd numbers instead of by all possible factors. Doing so cuts the run time roughly in half.
You only need to check for factors up to the square root of the number. If , then either or must be less than or equal to . (If both are greater than , then their product is greater than .) If you check possible factors up to , you will find the smaller factor, and when you divide by that factor, you will find the other one. This reduces the run time to .
Every time you divide the number by a factor, you can update the upper bound on the possible factors that you need to check.

These observations lead to the following improved algorithm:

List Of Integer: FindFactors(Integer: number)
    List Of Integer: factors
 
    // Pull out factors of 2.
    While (number Mod 2 == 0)
        factors.Add(2)
        number = number / 2
    End While
 
    // Look for odd factors.
    Integer: i = 3
    Integer: max_factor = Sqrt(number)
    While (i <= max_factor)
        // Pull out factors of i.
        While (number Mod i == 0)
            // i is a factor. Add it to the list.
            factors.Add(i)
 
            // Divide the number by i.
            number = number / i
 
            // Set a new upper bound.
            max_factor = Sqrt(number)
        End While
 
        // Check the next possible odd factor.
        i = i + 2
    End While

    // If there's anything left of the number, it is a factor, too.
    If (number > 1) Then factors.Add(number)
 
    Return factors
End FindFactors

NOTE

This prime factoring algorithm has run time , where is the number it is factoring, so it is reasonably fast for relatively small numbers. If gets really large, even isn't fast enough. For example, if is 100 digits long, has 50 digits. If happens to be prime, even a fast computer won't be able to try all of the possible factors in a reasonable amount of time. This is what makes some cryptographic algorithms secure.

The method of trying all the possible factors smaller than a number is called trial division. There are other factoring methods, such as wheel factorization and various field sieves. For information on those, search the Internet or look at such pages as the following: https://en.wikipedia.org/wiki/Integer_factorization and http://mathforum.org/library/drmath/view/65801.html. All of these methods depend on the size of the number and its factors, so a number's size gives you an idea of how hard a number might be to factor.

Finding Primes

Suppose that your program needs to pick a large prime number (yet another task required by some cryptographic algorithms). One way to find prime numbers is to use the algorithm described in the preceding section to test a bunch of numbers to see whether they are prime. For reasonably small numbers, that works, but for large numbers, it can be prohibitively slow.

The sieve of Eratosthenes is a simple method you can use to find all of the primes up to a given limit. This method works well for reasonably small numbers, but it requires a table with entries for every number that is considered. Therefore, it uses an unreasonable amount of memory if the numbers are too large.

The basic idea is to make a table with one entry for each of the numbers between 2 and the upper limit. Cross out all of the multiples of 2 (not counting 2 itself). Then, starting at 2, look through the table to find the next number that is not crossed out (3 in this case). Cross out all multiples of that value (not counting the value itself). Note that some of the values may already be crossed out because they were also a multiple of 2. Repeat this step, finding the next value that is not crossed out and crossing out its multiples until you reach the square root of the upper limit. At that point, any numbers that are not crossed out are prime.

The following pseudocode shows the basic algorithm:

// Find the primes between 2 and max_number (inclusive).
List Of Integer: FindPrimes(long max_number)
    // Allocate an array for the numbers.
    Boolean: is_composite = New Boolean[max_number + 1]
 
    // "Cross out" multiples of 2.
    For i = 4 to max_number Step 2
        is_composite[i] = true
    Next i
 
    // "Cross out" multiples of primes found so far.
    Integer: next_prime = 3
    Integer: stop_at = Sqrt(max_number)
    While (next_prime <= stop_at)
        // "Cross out" multiples of this prime.
        For i = next_prime * 2 To max_number Step next_prime Then
            is_composite[i] = true
        Next i
 
        // Move to the next prime, skipping the even numbers.
        next_prime = next_prime + 2
        While (next_prime <= max_number) And (is_composite[next_prime])
            next_prime = next_prime + 2
        End While
    End While
 
    // Copy the primes into a list.
    List Of Integer: primes
    For i = 2 to max_number
        If (Not is_composite[i]) Then primes.Add(i)
    Next i
 
    // Return the primes.
    Return primes
End FindPrimes

It can be shown that this algorithm has run time , but that is beyond the scope of this book.

Testing for Primality

The algorithm described in the earlier section “Finding Prime Factors” factors numbers. One way to determine whether a number is prime is to use that algorithm to try to factor it. If the algorithm doesn't find any factors, then the number is prime.

As that section mentioned, the algorithm works well for relatively small numbers. However, if the number has 100 digits, the number of steps the program must execute is a 50-digit number. Not even the fastest computers can perform that many operations in a reasonable amount of time. (A computer executing 1 trillion steps per second would need more than years.)

Some cryptographic algorithms need to use large prime numbers, so this method of testing whether a number is prime won't work. Fortunately, there are other methods. The Fermat primality test is one of the simpler.

Fermat's “little theorem” states that if is prime and , then . In other words, if you raise n to the power and then take the result modulo , the result is 1.

For example, suppose and . Then Mod 11. The value , so as desired.

Note that it is possible for , even if p is not prime. In that case, the value is called a Fermat liar because it incorrectly implies that is prime.

If , then n is called a Fermat witness because it proves that is not prime.

It can be shown that, for a natural number , at least half of the numbers between 1 and are Fermat witnesses. In other words, if is not prime and you pick a random number between 1 and , there is a 0.5 probability that is a Fermat witness, so .

Of course, there is also a chance you'll get unlucky and randomly pick a Fermat liar for n. If you repeat the test many times, you can increase the chances that you'll pick a witness if one exists.

It can be shown that at each test, there is a 50% chance that you'll pick a Fermat witness. So, if p passes tests, there is a chance that you got unlucky and picked Fermat liars every time. In other words, there is a chance that is actually a composite number pretending to be prime.

For example, if passes the test 10 times, there is a probability that is not prime. If you want to be even more certain, repeat the test 100 times. If p passes all 100 tests, there is only a probability that p is not prime.

The following pseudocode shows an algorithm that uses this method to decide whether a number is probably prime:

// Return true if the number p is (probably) prime.
Boolean: IsPrime(Integer: p, Integer: max_tests)
    // Perform the test up to max_tests times.
    For test = 1 To max_tests
        <Pick a pseudorandom number n between 1 and p (exclusive)>
        If (n
^p-1 Mod p != 1) Then Return false
    Next test
 
    // The number is probably prime.
    // (There is a 1/2^max_tests chance that it is not prime.)
    Return true
End IsPrime

If the number p is very large—which is the only time when this whole issue is interesting—calculating by using multiplication could take quite a while. Fortunately, you know how to do this quickly by using the fast exponentiation algorithm described in the earlier section, “Performing Exponentiation.”

Once you know how to determine whether a number is (probably) prime, you can write an algorithm to pick prime numbers.

// Return a (probable) prime with max_digits digits.
Integer: FindPrime(Integer: num_digits, Integer: max_tests)
    Repeat
        <Pick a pseudorandom number p with num_digits digits>
        If (IsPrime(p, max_tests)) Then Return p
End FindPrime

Performing Numerical Integration

Numerical integration, which is also sometimes called quadrature or numeric quadrature, is the process of using numerical techniques to approximate the area under a curve defined by a function. Often, the function has one variable so it looks like and the result is a two-dimensional area, but some applications might need to calculate the three-dimensional volume under a surface defined by a function . You could even calculate areas defined by higher-dimensional functions.

If the function is easy to understand, you may be able to use calculus to find the exact area. Unfortunately, you may not always be able to calculate the function's antiderivative. For example, the function's equation might be very complicated, or you might have data generated by some physical process, so you don't know the function's equation. In that case, you can't use calculus, but you can use numerical integration.

There are several ways to perform numerical integration. The most straightforward involve Newton-Cotes formulas, which use a series of polynomials to approximate the function. The two most basic kinds of Newton-Cotes formulas are the rectangle rule and the trapezoid rule.

The Rectangle Rule

The rectangle rule uses a series of rectangles of uniform width to approximate the area under a curve. Figure 2.6 shows the RectangleRule sample program (which is available for download on the book's website) using the rectangle rule. The program also uses calculus to find the exact area under the curve so that you can see how far the rectangle rule is from the correct result.

Illustration of RectangleRule sample program, which uses calculus to find the exact area under the curve. — **Figure 2.6**: The RectangleRule sample program uses the rectangle rule to approximate the area under the curve .

images — **Figure 2.6**: The RectangleRule sample program uses the rectangle rule to approximate the area under the curve .

The following pseudocode shows an algorithm for applying the rectangle rule:

Float: UseRectangleRule(Float: function(), Float: xmin, Float: xmax,
  Integer: num_intervals)
    // Calculate the width of a rectangle.
    Float: dx = (xmax - xmin) / num_intervals
 
    // Add up the rectangles' areas.
    Float: total_area = 0
    Float: x = xmin
    For i = 1 To num_intervals
        total_area = total_area + dx * function(x)
        x = x + dx
    Next i
 
    Return total_area
End UseRectangleRule

The algorithm simply divides the area into rectangles of constant width and with height equal to the value of the function at the rectangle's left edge. It then loops over the rectangles and adds their areas.

The Trapezoid Rule

You can see in Figure 2.6 where the rectangles don't fit the curve exactly, producing an error in the total calculated area. You can reduce the error by using more, skinnier rectangles. In this example, increasing the number of rectangles from 10 to 20 reduces the error from roughly to .

An alternative strategy is to use trapezoids to approximate the curve instead of using rectangles. Figure 2.7 shows the TrapezoidRule sample program (which is available for download on the book's website) using the trapezoid rule.

Illustration of the TrapezoidRule sample program that uses the trapezoid rule to make a better approximation than the RectangleRule program does. — **Figure 2.7**: The TrapezoidRule sample program uses the trapezoid rule to make a better approximation than the RectangleRule program does.

The following pseudocode shows an algorithm for applying the trapezoid rule:

Float: UseTrapezoidRule(Float: function(), Float: xmin, Float: xmax,
  Integer: num_intervals)
    // Calculate the width of a trapezoid.
    Float: dx = (xmax - xmin) / num_intervals
 
    // Add up the trapezoids' areas.
    Float: total_area = 0
    Float: x = xmin
    For i = 1 To num_intervals
     total_area = total_area + dx * (function(x) + function(x + dx)) / 2
     x = x + dx
    Next i
 
    Return total_area
End UseTrapezoidRule

The only difference between this algorithm and the rectangle rule algorithm is in the statement that adds the area of each slice. This algorithm uses the formula for the area of a trapezoid: area = width × average of the lengths of the parallel sides.

You can think of the rectangle rule as approximating the curve with a step function that jumps from one value to another at each rectangle's edge. The trapezoid rule approximates the curve with line segments.

Another example of a Newton-Cotes formula is Simpson's rule, which uses polynomials of degree 2 to approximate the curve. Other methods use polynomials of even higher degree to make better approximations of the curve.

Adaptive Quadrature

A variation on the numerical integration methods described so far is adaptive quadrature, in which the program detects areas where its approximation method may produce large errors and refines its method in those areas.

For example, look again at Figure 2.7. In areas where the curve is close to straight, the trapezoids approximate the curve very closely. In areas where the curve is bending sharply, the trapezoids don't fit as well.

A program using adaptive quadrature looks for areas where the trapezoids don't fit the curve well and uses more trapezoids in those areas.

The AdaptiveMidpointIntegration sample program, shown in Figure 2.8, uses the trapezoid rule with adaptive quadrature. When calculating the area of a slice, this program first uses a single trapezoid to approximate its area. It then breaks the slice into two pieces and uses two smaller trapezoids to calculate their areas. If the difference between the larger trapezoid's area and the sum of the areas of the smaller trapezoids is more than a certain percentage, the program divides the slice into two pieces and calculates the areas of the pieces in the same way.

Illustration of the AdaptiveMidpointIntegration program that uses an adaptive trapezoid rule to make a better approximation than the TrapezoidRule program. — **Figure 2.8**: The AdaptiveMidpointIntegration program uses an adaptive trapezoid rule to make a better approximation than the TrapezoidRule program.

The following pseudocode shows this algorithm:

// Integrate by using an adaptive midpoint trapezoid rule.
Float: IntegrateAdaptiveMidpoint(Float: function(),
  Float: xmin, Float: xmax, Integer: num_intervals,
Float: max_slice_error)
    // Calculate the width of the initial trapezoids.
    Float: dx = (xmax - xmin) / num_intervals
    double total = 0
 
    // Add up the trapezoids' areas.
    Float: total_area = 0
    Float: x = xmin
    For i = 1 To num_intervals
        // Add this slice's area.
        total_area = total_area +
            SliceArea(function, x, x + dx, max_slice_error)
        x = x + dx
    Next i
 
    Return total_area
End IntegrateAdaptiveMidpoint
 
// Return the area for this slice.
Float: SliceArea(Float: function(),Float: x1, Float: x2,
  Float: max_slice_error)
    // Calculate the function at the endpoints and the midpoint.
    Float: y1 = function(x1)
    Float: y2 = function(x2)
    Float: xm = (x1 + x2) / 2
    Float: ym = function(xm)
 
    // Calculate the area for the large slice and two subslices.
    Float: area12 = (x2 - x1) * (y1 + y2) / 2.0
    Float: area1m = (xm - x1) * (y1 + ym) / 2.0
    Float: aream2 = (x2 - xm) * (ym + y2) / 2.0
    Float: area1m2 = area1m + aream2
 
    // See how close we are.
    Float: error = (area1m2 - area12) / area12
 
    // See if this is small enough.
    If (Abs(error) < max_slice_error) Then Return area1m2
 
    // The error is too big. Divide the slice and try again.
    Return
        SliceArea(function, x1, xm, max_slice_error) +
        SliceArea(function, xm, x2, max_slice_error)
End SliceArea

If you run the AdaptiveMidpointIntegration program and start with only two initial slices, the program divides them into the 24 slices shown in Figure 2.8 and estimates the area under the curve with error. If you use the TrapezoidRule program with 24 slices of uniform width, the program has an error of , roughly twice as much as that produced by the adaptive program. The two programs use the same number of slices, but the adaptive program positions them more effectively.

The AdaptiveTrapezoidIntegration sample program uses a different method to decide when to break a slice into subslices. It calculates the second derivative of the function at the slice's starting value and divides the interval into one slice plus 1 per second derivative value. For example, if the second derivative is 2, the program divides the slice into three pieces. (The formula for the number of slices was chosen somewhat arbitrarily. You might get better results with a different formula.)

Of course, this technique won't work if you can't calculate the curve's second derivative. The technique used by the AdaptiveMidpointIntegration program seems to work fairly well in any case, so you can fall back on that technique.

Adaptive techniques are useful in many algorithms because they can produce better results without wasting effort in areas where it isn't needed. The AdaptiveGridIntegration program shown in Figure 2.9 uses adaptive techniques to estimate the area in the shaded region. This region includes the union of vertical and horizontal ellipses, minus the areas covered by the three circles inside the ellipses.

Illustration of the AdaptiveGridIntegration program that uses adaptive integration to estimate the area in the shaded region. — **Figure 2.9**: The AdaptiveGridIntegration program uses adaptive integration to estimate the area in the shaded region.

This program divides the whole image into a single box and defines a grid of points inside the box. In Figure 2.9, the program uses a grid with four rows and columns of points. For each point in the grid, the program determines whether the point lies inside or outside the shaded region.

If none of the points in the box lies within the shaded region, the program assumes that the box is not inside the region and ignores it.

If every point in the box lies inside the shaded region, the program considers the box to lie completely within the region and adds the box's area to the region's estimated area.

If some of the points in the box lie inside the shaded region and some lie outside the region, the program subdivides the box into smaller boxes and uses the same technique to calculate the smaller boxes' areas.

In Figure 2.9, the AdaptiveGridIntegration program has drawn the boxes it considered so that you can see them. You can see that the program considered many more boxes near the edges of the shaded region than far inside or outside the region. In total, this example considered 19,217 boxes, mostly focused on the edges of the area it was integrating.

Monte Carlo Integration

Monte Carlo integration is form of numeric integration in which the program generates a series of pseudorandom points uniformly within an area and determines whether each point lies within the target region. When it has finished, the program uses the percentage of the points that were inside the target region to estimate the region's total area.

For example, suppose the area within which the points are generated is a square, so it has an area of 400 square units, and 37% of the pseudorandom points are within the region. Then the region has an area of roughly square units.

The MonteCarloIntegration sample program shown in Figure 2.10 uses this technique to estimate the area of the same shape used by the AdaptiveGridIntegration program.

Illustration of the MonteCarloIntegration sample program that uses a technique to estimate the area of the same shape used by the AdaptiveGrid-Integration program. — **Figure 2.10**: Points inside the shaded region are black, and points outside the region are gray.

Monte Carlo integration generally is more prone to error than more methodical approaches such as trapezoid integration and adaptive integration. However, it is sometimes easier to implement because it doesn't require you to know much about the nature of the shape you're measuring. You simply throw points at the shape and see how many hit it.

Finding Zeros

Sometimes a program needs to figure out where an equation crosses the -axis. In other words, given an equation , you may want to find where . Values such as this are called the equation's roots.

Newton's method, which is sometimes called the Newton-Raphson method, is a way to approximate an equation's roots successively.

The method starts with an initial guess for the root. If is not close enough to 0, the algorithm follows a line that is tangent to the function at the point until the line hits the -axis. It uses the -coordinate at the intersection as a new guess for the root.

The algorithm then repeats the process starting from the new guess . The algorithm continues the process of following tangents to the function to find new guesses until it finds a value where is sufficiently close to 0.

The only tricky part is figuring out how to follow tangent lines. If you use a little calculus to find the derivative of the function , which is also written , then the following equation shows how the algorithm can update its guess by following a tangent line:

Figure 2.11 shows the process graphically. The point corresponding to the initial guess is labeled 1. That point's value is far from 0, so the algorithm follows the tangent line until it hits the -axis. It then calculates the function at the new guess to get the point labeled 2 in Figure 2.11. This point's -coordinate is also far from 0, so the algorithm repeats the process to find the next guess, labeled 3. The algorithm repeats the process one more time to find the point labeled 4. Point 4's -coordinate is close enough to 0, so the algorithm stops.

Illustration of Newton's method that follows a function's tangent lines to zero in on the function's roots. — **Figure 2.11**: Newton's method follows a function's tangent lines to zero in on the function's roots.

The following pseudocode shows the algorithm:

// Use Newton's method to find a root of the function f(x).
Float: NewtonsMethod(Float: f(), Float: dfdx(), Float: initial_guess,
  Float: maxError)
 
    float x = initial_guess
    For i = 1 To 100  // Stop at 100 in case something goes wrong.
        // Calculate this point.
        float y = f(x)
 
        // If we have a small enough error, stop.
        if (Math.Abs(y) < maxError) break
 
        // Update x.
        x = x − y / dfdx(x)
    Next i
 
    Return x
End NewtonsMethod

The algorithm takes as parameters a function y = f(x) , the function's derivative dfdx, an initial guess for the root's value, and a maximum acceptable error.

The code sets the variable x equal to the initial guess and then enters a For loop that repeats, at most, 100 times. Normally, the algorithm quickly finds a solution. But sometimes, if the function has the right curvature, the algorithm can diverge and not zero in on a solution. Or it can get stuck jumping back and forth between two different guesses. The maximum of 100 iterations means the program cannot get stuck forever.

Within the For loop, the algorithm calculates f(x). If the result isn't close enough to 0, the algorithm updates x and tries again.

Note that some functions have more than one root. In that case, you need to use the FindZero algorithm repeatedly with different initial guesses to find each root.

Figure 2.12 shows the NewtonsMethod sample program, which is available for download on this book's website. This program uses Newton's method three times to find the three roots of the function . Circles show the program's guesses as it searches for each root.

Illustration of the NewtonsMethod sample program that demonstrates Newton’s method. — **Figure 2.12**: The NewtonsMethod sample program demonstrates Newton's method to find the three roots of the function .

Gaussian Elimination

Root finding algorithms let you find values for that make the equation equal to zero. Gaussian elimination is a technique that does something similar for a system of linear equations. It attempts to find values for the 's in the following equations to make all of the equations true simultaneously:

Here all of the and values are numbers given by the problem. For a concrete example, consider the following system of equations:

The goal is to find numbers , , and that simultaneously satisfy all three equations.

It's easier to work with the equations if you represent them as an augmented matrix. The first entries in each row hold the equations' coefficients (the values). An extra final column holds the values. The following shows the augmented matrix for the preceding equations:

Gaussian elimination works in two stages that are sometimes called forward elimination and back substitution.

Forward Elimination

During forward elimination, you use two operations to rearrange the matrix until it has the following upper-triangular form:

The entries on the matrix's lower left are all zeros. The other entries are any numbers and may or may not be zero. (That's what the asterisks mean.)

You can use the following two row operations to manipulate the matrix during forward elimination:

Swap the positions of two rows.
Add a nonzero multiple of one row to another row.

Each of these operations preserves the truth of the equations. If the values satisfy the original equations, then they also satisfy the matrix's modified equations.

For an example, consider the following augmented matrix, which was shown earlier:

We start by using the first entry in the first row (highlighted in bold) to zero out the entries in that column in lower rows. The first entry in the first row is 2, and the first entry in the second row is 3. We can zero out the 3 by multiplying the first row by and then adding it to the second row. The following matrix shows the calculation:

Next, we perform a similar operation to zero out the first entry in the last row. This time, we multiply the first row by and add it to the final row as shown here:

Now that we've zeroed out the entries in the first column after the first row, we turn to the second column. We want to zero out the entries in the second column below the second row.

Unfortunately, to do that we would need to multiply the second row by 2/0. That's a problem because we cannot divide by zero.

In cases like this one, where the next entry that we want to use to zero out a column is already 0, we swap that row with one of the later rows that does not have a zero in that column. In this example, the third row has in its second column, so we swap the second and third rows to get the following:

Now that you have a nonzero entry in the column, you can use it to continue forward elimination. In this example, the final row already has a zero in its second column, so the matrix is in upper triangular form, and we can move on to back substitution.

Back Substitution

During back substitution, you work through the matrix from the bottom up to find the values for the x's that satisfy the equations. Each time you examine a row, you learn one more value. You can then plug in the x values that you know to find the next value in the row above.

For a concrete example, consider the previous augmented matrix in upper-triangular form:

The last row in the matrix represents the following equation:

Because the first two coefficients are zero, their terms drop out leaving . You can easily solve this by dividing both sides of the equation by to get . You have your first value!

Now you move up a row. The matrix's second row represents the following equation:

But you now know that . Plugging that value into the equation gives the following:

This simplifies to the following equation:

Now you can divide both sides of the equation by to get .

You move up a row again to the top row, which corresponds to the following equation:

Plugging in the values that you now have for x₃ and x₂ gives the following:

Rearranging this gives the following:

Now you can divide both sides of the equation by 2 to learn .

The complete solution to the original system of equations is , , . If you plug those values into the original equations, you'll see that they are correct.

The Algorithm

Gaussian elimination is a nice, straightforward, methodical way to solve a system of linear equations. Unfortunately, it requires you to perform a long sequence of relatively simple steps, so you have many chances to make small arithmetic errors. This is exactly the kind of situation that a computer handles well because it can quickly perform any number of simple arithmetic calculations without making mistakes.

The following pseudocode shows the Gaussian elimination algorithm at a high level:

// Solve a linear system of equations.
Item: GaussianEliminate(Float: As[,], Float: Cs[])
    <Use the As and Cs to build the augmented matrix.>
    <Use row operations to put the matrix in upper triangular form.>
    <Perform back substitution to find the Xs.>
End GaussianEliminate

The details involve some long but relatively straightforward sequences of loops to perform the calculations.

Least Squares Fits

A least squares fit attempts to find a function to fit a collection of data values. The result is called a least squares fit because it finds the least possible sum of the squares of the distances between the data points and the corresponding points on the function.

Figure 2.13 shows a function approximating a set of data values. The vertical lines show the distances between the data points and the corresponding points on the function. A least squares fit considers variations of the function and finds the one that minimizes the sum of the squares of those vertical distances.

Illustration of a least squares fit that minimizes the sum of the squares of the distances between the data points and the function. — **Figure 2.13**: A least squares fit minimizes the sum of the squares of the distances between the data points and the function.

Calculating a least squares fit can be intimidating, mostly because it can involve a lot of terms. Fortunately, those terms are often relatively simple. They can look scary when they're all arranged in one grand equation, but individually the terms are easy to manage. I admit that you need to use a little calculus to find a least squares fit. Fortunately, it's pretty easy calculus, so you should be able to understand the following discussion even if you haven't taken a derivative in a while.

The following sections describe two kinds of least squares fits. In the first, the function that approximates the data is a line. In the second, the function is a polynomial of any degree.

Linear Least Squares

In a linear least squares fit, the goal is to find a line that minimizes the sum of the squares of the vertical distances between the data points and the line. It's the same situation shown in Figure 2.13, except the curve is a line.

You can represent a line with the equation where m is the line's slope and is its -intercept. (The point where the line crosses the -axis.)

Suppose that you have a set of n data points . Then the vertical distance between one of the points and the line is simply . The square of that distance is . If you add up all of the squared terms for all of the points, you get the following equation:

You can write this more concisely by using the mathematical summation symbol :

Here the symbol simply means that you should add up all of the values for .

This equation looks pretty intimidating in both forms. After all, they include two variables, m and b, plus a bunch of and values. Things are simpler if you remember that the and values are part of the data—they're just numbers like 6 and –13.

Now it's time for the calculus. To find the minimum value for this equation, you take its partial derivatives with respect to the variables m and , set the derivatives equal to zero, and solve for and .

The following equation shows the error equation's derivative with respect to :

Multiplying this out and rearranging a bit gives the following:

The following equation shows the error equation's derivative with respect to :

Multiplying this out and rearranging a bit gives the following:

These new equations don't look any simpler until you remember that only the m and b terms are variables. All of the and values are numbers that you're given by the data. To make it easier to work with the equations, let's make the following substitutions:

If we plug those values into the partial derivatives and set them equal to zero, we get the following:

Now you can solve these two equations for and to get the following:

All of the terms are easy to calculate from the data values, so now you can find and to minimize the error squared.

The follow pseudocode shows the algorithm for finding a linear least squares fit at a high level:

Float[]: FindLinearLeastSquaresFit(Point[]: points)
    <Calculate the sums Sx, Sxx, Sxy, Sy, and S1.>
    <Use the S values to callculate m and b.>
End FindLinearLeastSquaresFit

This code simply performs the calculations described in the preceding paragraphs.

Polynomial Least Squares

A linear least squares fit uses a line to fit a set of data points. A polynomial least squares fit uses a polynomial of the form to fit the data points.

The degree of the polynomial is the largest power of used by the equation. The preceding equation has degree . You can pick the degree to fit the data. In general, higher degrees will fit the data points more closely, although they may imply an artificially-high accuracy.

For example, a degree polynomial can fit data points exactly, but it may need to wiggle all over the place to do so. Figure 2.14 shows a degree 5 polynomial that exactly fits six data points.

Illustration of degree 5 polynomial that exactly fits six data points. — **Figure 2.14**: A high-degree polynomial may match a set of data values very closely but misleadingly.

It's usually better to pick the smallest degree that fits the data reasonably well. Figure 2.15 shows the same data points as shown in Figure 2.14, but this time fit by a degree 3 polynomial that probably does a better job of representing the data.

Illustration of the lowest-degree polynomial that fits the data reasonably well. — **Figure 2.15**: You should use lowest-degree polynomial that fits the data reasonably well.

You find a polynomial fit in the same way that you find a linear fit: you take the partial derivatives of the error function with respect to the values, set the derivatives equal to zero, and solve for the values.

The following equation shows the error function:

Here the sum is taken over all of the data points ( , ).

The next step is to take the partial derivatives of this equation with respect to the A values. This is a pretty complicated equation, so you might think that this will be hard. Fortunately, only a few terms in the equation involve any given A value, so most of the terms become zero in the partial derivatives.

The following equation shows the partial derivative with respect to :

If you multiply the term through and add up the terms separately, you get the following:

This equation also looks intimidating, but if you look closely, you'll see that most of the terms are sums that you can calculate by using the data points ( , ). For example, in the partial derivative with respect to , the term is the sum of the x_i values raised to the power.

If you replace the sums with the values , then the equation simplifies to the following:

Now you have linear equations with unknowns through . To finish solving the problem, you set the equations equal to zero and then use Gaussian elimination to solve them for the values.

When you set a partial derivative equal to zero, you can divide both sides of the equation by 2 and then move the A terms to the other side of the equals sign to get the following equation:

This has the format used in the earlier section on Gaussian elimination.

All of this leads to the following high-level algorithm for finding a polynomial least squares fit to a set of data points:

// Find a polynomial least squares fit to a set of data points.
Float[]: FindPolynomialLeastSquaresFit(Point[]: points)
    <Calculate the sums S, S0, S1, …, Sn.>
    <Use the S values to build coefficients for
     a system of linear equations.>
    <Use Gaussian elimination to solve the system
     of equations and find the A values.>
End FindPolynomialLeastSquaresFit

This code simply performs the calculations described in the preceding text.

Summary

Some numerical algorithms, such as randomization, are useful in a wide variety of applications. Other algorithms, such as prime factoring and finding the greatest common divisor, have more limited use. If your program doesn't need to find greatest common divisors, the GCD algorithm won't be much help.

However, the techniques and concepts demonstrated by these algorithms can be useful in many other situations. For example, the idea that an algorithm can be probabilistic is very important in many applications. That idea can help you devise other algorithms that don't work with perfect certainty (and that could easily be the subject of an interview question).

This chapter explained the ideas of fairness and bias, two important concepts for any sort of randomized algorithm, such as the Monte Carlo integration algorithm, which also was described in this chapter.

This chapter also explained adaptive quadrature, a technique that lets a program focus most of its work on areas that are the most relevant and pay less attention to areas that are easy to manage. This idea of making a program adapt to spend the most time on the most important parts of the problem is applicable to many algorithms.

Many numerical algorithms, such as GCD, Fermat's primality test, the rectangle and trapezoid rules, and Monte Carlo integration, don't need complex data structures. In contrast, most of the other algorithms described in this book do require specialized data structures to produce their results.

The next chapter explains one kind of data structure: linked lists. These are not the most complicated data structures you'll find in this book, but they are useful for many other algorithms. Also, the concept of linking data is useful in other data structures, such as trees and networks.

Exercises

You can find the answers to these exercises in Appendix B. Asterisks indicate particularly difficult problems.

Write an algorithm to use a fair six-sided die to generate coin flips.
The section “Getting Fairness from Biased Sources” explains how you can use a biased coin to get fair coin flips by flipping the coin twice. However, sometimes doing two flips produces no result, so you need to repeat the process. Suppose that the coin produces heads three-fourths of the time and tails one-fourth of the time. In that case, what is the probability that you'll get no result after two flips and have to try again?
Again, consider the coin described in Exercise 2. This time, suppose you were wrong and the coin is actually fair but you're still using the algorithm to get fair flips from a biased coin. In that case, what is the probability that you'll get no result after two flips and have to try again?
Write an algorithm to use a biased six-sided die to generate fair values between 1 and 6. How efficient is this algorithm?
Write an algorithm to pick M random values from an array containing items (where ). What is its run time? How does this apply to the example described in the text where you want to give books to five people selected from 100 entries? What if you got 10,000 entries?
Write an algorithm to deal five cards to players for a poker program. Does it matter whether you deal one card to each player in turn until every player has five cards or whether you deal five cards all at once to each player in turn?
Write a program that simulates rolling two six-sided dice and draws a bar chart or graph showing the number of times each roll occurs. Compare the number of times each value occurs with the number you would expect for two fair dice in that many trials. How many trials do you need to perform before the results fit the expected distribution well?
In the complete self-avoiding random walk algorithm, what is the key backtracking step? In other words, exactly where does the backtracking occur?
When building a complete self-avoiding random walk, what happens if the algorithm does not randomize the neighbor list? Would that change the algorithm's performance?
What happens to Euclid's algorithm if initially?
The least common multiple (LCM) of integers A and B is the smallest integer that A and B both divide into evenly. How can you use the GCD to calculate the LCM?
How would you need to change the fast exponentiation algorithm to implement modular fast exponentiation?
*Write a program that calculates the GCD for a series of pairs of pseudorandom numbers and graphs the number of steps required by the GCD algorithm versus the average of the two numbers. Does the result look logarithmic?
The following pseudocode shows how the sieve of Eratosthenes crosses out multiples of the prime next_prime:
```
// "Cross out" multiples of this prime.
For i = next_prime * 2 To max_number Step next_prime Then
    is_composite[i] = true
Next i 
```
The first value crossed out is next_prime * 2. But you know that this value was already crossed out because it is a multiple of 2; the first thing the algorithm did was cross out multiples of 2. How can you modify this loop to avoid revisiting that value and many others that you have already crossed out?
*In an infinite set of odd composite numbers, called Carmichael numbers, every relatively prime smaller number is a Fermat liar. In other words, is a Carmichael number if is odd and every n where and is a Fermat liar. Write an algorithm to list the Carmichael numbers between 1 and 10,000 and their prime factors.
When you use the rectangle rule, parts of some rectangles fall above the curve, increasing the estimated area, and parts of some rectangles fall below the curve, reducing the estimated area. What do you think would happen if you used the function's value at the midpoint of the rectangle for the rectangle's height instead of the function's value at the rectangle's left edge? Write a program to check your hypothesis.
Could you make a program that uses adaptive Monte Carlo integration? Would it be effective?
Write a high-level algorithm for performing Monte Carlo integration to find the volume of a three-dimensional shape.
How could you use Newton's method to find the points where two functions intersect?
Could you use least squares with functions other than lines and polynomials?
What happens to the linear least squares calculation if you have only two data points and they have the same x-coordinate? What causes that behavior?
What line best fits two data points that have the same x-coordinate? How does that line relate to your answer to Exercise 21?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.