Chapter 12. Shor’s Factoring Algorithm

If you’d heard about one application of quantum computing before you picked up this book, there’s a good chance it was Shor’s factoring algorithm.

Quantum computing was mostly considered to be of academic, rather than practical, interest until Peter Shor’s 1994 discovery that a sufficiently powerful quantum computer can find the prime factors of a number exponentially faster than any conventional machine. In this chapter, we take a hands-on look at one specific implementation of Shor’s QPU factoring algorithm.

Far from being a mere mathematical curiosity, the ability to quickly factorize large numbers can help break the Rivest–Shamir–Adleman (RSA) public-key cryptosystem. Anytime you spin up an ssh session you’re making use of RSA. Public-key cryptosystems like RSA work by a process wherein a freely available public key can be used by anybody to encrypt information. But once encrypted, the information can only be decrypted using a secret, privately held key. Public-key cryptosystems are often compared to an electronic version of a mailbox. Imagine a locked box with a slit allowing anyone to post (but not retrieve) a message, and a door with a lock to which only the owner has the key. It turns out that the task of finding the prime factors of a large number N works really well as part of a public-key cryptosystem. The assurance that someone can only use the public key to encrypt and not to decrypt information rests on the assumption that finding the prime factors of N is a computationally infeasible task.

A full explanation of RSA is beyond the scope of this book, but the key point is that if Shor’s algorithm provides a way to find the prime factors of a large number N, then it has implications for one of the modern internet’s backbone components.

There are other very good reasons to get to grips with Shor’s algorithm besides its cryptographic implications, as it’s the best-known example of a class of algorithms solving instances of the so-called hidden subgroup problem. In this kind of problem, we want to determine the periodicity of a given periodic function, where the periodicity can be potentially quite complicated. A number of problems in discrete mathematics are instances of the hidden subgroup problem, such as period finding, order finding (which is the underlying difficult problem in factoring), finding discrete logarithms, and many others. A similar procedure to what we’ll see in this chapter can also provide solutions to some of these other problems.1

Shor’s algorithm is another prime example (pun intended) of how our QPU primitives are put to use. We’ve seen in Chapter 7 that the QFT is perfectly suited to investigating periodic signals, and Shor’s algorithm makes heavy use of it.

An especially instructive aspect of Shor’s algorithm is that it also ends by leveraging a conventional program to retrieve the desired prime factors from a periodicity learned with the QFT. The algorithm works so well because it accepts the QPU’s role as a coprocessor, applying quantum ideas only in those parts of the problem to which they are well suited.

Let’s take a closer look at the idea and code behind the algorithm.

Hands-on: Using Shor on a QPU

In keeping with the hands-on theme of this book, the sample code in Example 12-1 will allow you to see Shor’s factoring algorithm in operation right away, using built-in functions from QCEngine.

As mentioned earlier, the QPU is doing only part of the work here. The Shor() function makes calls to two other functions. The first, ShorQPU(), leverages our QPU (or a simulation thereof) to help find the repeat period of a function, while the rest of the work in ShorLogic() is performed with conventional software running on a CPU. We’ll dive into each of these functions in more detail in the next section.

Note

The ShorLogic() implementation we use in our example is for illustrative purposes only. Although simpler to explain, it will struggle with very large numbers. Full-scale Shor algorithms are the focus of ongoing research.

The remainder of the chapter walks through Shor’s algorithm, presented in a standard and approachable way. Take note, however, that the implementation we present is not the most efficient realization of Shor’s original idea, and actual implementations “in the wild” are likely to vary across QPU hardware.

What Shor’s Algorithm Does

Let’s start with the ShorQPU() function. We’re going to assert without proof a useful fact from number theory.2 Namely, it’s possible to solve the problem of finding prime factors p and q of a number N = pq if we are able to solve the seemingly unrelated problem of finding the repeat period of the function ax mod(N), as the integer variable x is varied. Here N is still the number we want to factor; a is called the coprime. The value of the coprime can be chosen to be any prime number that we like.

If the idea of finding the repeat period of ax mod(N) sounds obscure, fear not. All it means is that as you change the value of x, eventually the sequence of numbers returned by ax mod(N) repeats itself. The repeat period is just the number of x values between repeats, as you can see in Figure 12-1.

Note

To keep things simple, we’ll choose 2 as our coprime. Besides being the smallest prime number, this choice has the advantage that our QPU implementation of ax can be implemented by simply shifting bits. It’s a good choice for the cases we cover in this chapter, but will not be appropriate in other cases.

The repeat periods for two different values of N
Figure 12-1. The repeat periods for two different values of N

Once we know the repeat period p, then one of N’s prime factors might be given by gcd(N,ap/2+1) and the other might be given by gcd(N,ap/2-1). Again, these are statements that we give here without proof, but which follow from number theory arguments. Here, gcd is a function that returns the greatest common divisor of its two arguments. The well-known Euclidean algorithm can quickly work out gcd for us on a conventional CPU (see the online sample code at http://oreilly-qc.github.io?p=12-1 for an implementation of gcd).

Tip

While we might be able to find the prime factors from these two gcd expressions, this is not guaranteed. Success depends on the chosen value of the coprime a. As previously mentioned, we have chosen 2 as our coprime for illustrative purposes, and as a result there are some numbers that this implementation will fail to factor, such as 171 or 297.

Do We Need a QPU at All?

We’ve reduced the problem of prime factoring to finding the periodicity p of ax mod(N). It’s actually possible to try to find p with a conventional CPU program. All we need do is repeatedly evaluate ax mod(N) for increasing values of x, counting how many values we have tried and keeping track of the return values we get. As soon as a return value repeats, we can stop and declare the period to be the number of values we tried.

Note

This brute-force method for finding the period of ax mod(N) assumes that if we get the same value back from the function, then we must have gone through one full repeat period p. Although not immediately obvious, a mathematical property of ax mod(N) is that it can only assume any given value once within one period. Hence the moment we get the same result twice, we know we have completed a full period.

Example 12-2 implements this nonquantum, brute-force approach to finding p.

The code shown in Example 12-2 runs quickly on a whole range of numbers. The period-finding loop only needs to run until it finds the first repetition, and then it’s done. So why do we need a QPU at all?

Although ShorNoQPU() in Example 12-2 might not look too costly to evaluate, the number of loops required to find the repeat pattern (which is given by the repeat period of the pattern itself) grows exponentially with the number of bits in N, as shown in Figure 12-2.

The maximum number of loops required to find the repeat period of an integer represented by a bitstring of length N. Each bar also displays a histogram showing the distribution of repeat periods for integers represented by a bitstring of length N.
Figure 12-2. The maximum number of loops required to find the repeat period of an integer represented by a bitstring of length N. Each bar also displays a histogram showing the distribution of repeat periods for integers represented by a bitstring of length N.
Note

There are more efficient approaches to finding prime factors on a conventional CPU (such as the General Number Field Sieve), but they all run into similar scaling problems as the size of N increases. The runtime of the most efficient algorithm for factoring on a conventional CPU scales exponentially with the input size, whereas the runtime of Shor’s algorithm scales polynomially with the input size.

So then, let’s crack open the QPU.

The Quantum Approach

Although we’ll give a step-by-step walkthrough of how ShorQPU() works in the next section, we’ll first spend a little time highlighting the creative way it makes use of the QFT.

As we’ll see shortly, thanks to some initial HAD operations, ShorQPU() leaves us with ax mod(N) represented in the magnitudes and relative phases of our QPU register. Recall from Chapter 7 that the QFT implements the Discrete Fourier Transform, and leaves our output QPU register in a superposition of the different frequencies contained in the input.

Figure 12-3 shows what a DFT of ax mod(N) looks like.

Computation performed while factoring 15
Figure 12-3. Computation performed while factoring 15

This DFT includes a spike at the correct signal frequency of 4, from which we could easily calculate p. That’s all well and good for this conventional DFT, but recall from Chapter 7 that if we perform a QFT on the signal, these output peaks occur in superposition within our QPU output register. The value we obtain from the QFT after a READ is unlikely to yield the desired frequency.

It may look like our idea for using the QFT has fallen short. But if we play around with different values of N, we find something interesting about the DFT results for this particular kind of signal. Figure 12-4 shows the DFT of ax mod(N) with N = 35.

Computation performed while factoring 35
Figure 12-4. Computation performed while factoring 35

In this case, the repeat period of ax mod(N) is 12, and there are also 12 evenly spaced highest-weighted spikes in the DFT (these are the values we’re most likely to observe in a readout after the QFT). Experimenting with different N values, we start to notice a trend: for a pattern with repeat period p, the magnitude of the Fourier Transform will have exactly p evenly spaced spikes, as shown in the figure.

Since the spikes are evenly spaced and we know the size of our register, we can estimate how many spikes there were in the QFT—even though we can’t hope to observe more than one of them. (We give an explicit algorithm for doing this shortly.) From our experimenting, we know that this number of peaks is the same as the repeat period, p, of our input signal—precisely what we’re after!

This is a more indirect use of the QFT than we covered in Chapter 7, but it demonstrates a crucial point—we shouldn’t be afraid to use all the tools at our disposal to experiment with what’s in our QPU register. It’s reassuring to see that the hallowed programmer’s mantra “Change the code and see what happens” remains fruitful even with a QPU.

Step by Step: Factoring the Number 15

Let’s take a step-by-step look at using a QPU to factor the number 15. (Spoiler alert: the answer is 3 × 5.) Our Shor program grows in complexity as we try larger numbers, but 15 is a good place to start. The walkthrough we give will use the settings in Example 12-3 to demonstrate the algorithm’s operation.

The first argument, N, in this example is set to 15, the number we want to factor. The second argument, precision_bits, is 4. A larger number of precision bits will generally be more likely to return the correct answer, but conversely requires more qubits and many more instructions to execute. The third argument, coprime, will be left at 2, which is the only value that our simplified QPU implementation of Shor’s algorithm supports.

We already know that our main Shor() function executes two smaller functions. A QPU program determines the repeat period, and then passes the result to a second function that determines the prime factors using conventional digital logic on a CPU. The steps taken by the constituent parts of Shor() are as follows:

  • Steps 1–4 create a superposition of evaluations of ax mod(N).

  • Steps 5–6 implement the QFT trick outlined previously for learning the period p of this signal.

  • Steps 7–8 use p in a conventional CPU algorithm to find our prime factors.

We’ll walk through these steps for the case of an eight-qubit register. Four of these qubits will be used to represent (superpositions of) the different values of x that we pass to ax mod(N), while the other four qubits will process and keep track of the values that this function returns.

This means that in total we will need to keep track of 28 = 256 values—256 circles in our circle notation. By arranging all these circles into a 16 × 16 square, we can visualize the 16 states from each of our 4-qubit registers separately (see Chapter 3 for a recap on reading circle notation in this way). For convenience, we also add tags to these grids of 16 × 16 circles, showing the probabilities for each value from each of the two registers. Okay, enough talk; let’s factor some numbers with a QPU!

Step 1: Initialize QPU Registers

To start the quantum part of the Shor program, we initialize the registers with the digital values 1 and 0 as shown in Figure 12-5. We’ll see shortly that starting the work register with value 1 is necessary for the way we’ll calculate ax mod(N).

QPU instructions for step 1
Figure 12-5. QPU instructions for step 1

Figure 12-6 shows the state of our two registers in circle notation after initialization.

Step 1: Work and precision registers are initialized to values of 1 and 0, respectively
Figure 12-6. Step 1: Work and precision registers are initialized to values of 1 and 0, respectively

Step 2: Expand into Quantum Superposition

The precision register is used to represent the x values that we’ll pass to the function ax mod(N). We’ll use quantum superposition to evaluate this function for multiple values of x in parallel, so we apply HAD operations as shown in Figure 12-7 to place the precision register into a superposition of all possible values.

QPU instructions for step 2
Figure 12-7. QPU instructions for step 2

This way, each row in the circle grid shown in Figure 12-8 is ready to be treated as a separate input to a parallel computation.

Step 2: A superposition of the precision register prepares us to evaluate _a_^_x_^ mod(_N_) in superposition
Figure 12-8. Step 2: A superposition of the precision register prepares us to evaluate ax mod(N) in superposition

Step 3: Conditional Multiply-by-2

We now want to perform our function ax mod(N) on the superposition of inputs we have within the precision register, and we’ll use the work register to hold the results. The question is, how do we perform ax mod(N) on our qubit register?

Recall that we have chosen a = 2 for our coprime value, and consequently the ax part of the function becomes 2x. In other words, to enact this part of the function we need to multiply the work register by 2. The number of times we need to perform this multiplication is equal to x, where x is the value represented in binary within our precision register.

Multiplication by 2 (or indeed any power of 2) can be achieved on any binary register with a simple bit shift. In our case, each qubit is exchanged with the next highest-weighted position (using the QCEngine rollLeft() method). To multiply by 2 a total of x times, we simply condition our multiplication on the qubit values contained in the precision register. Note that we will only use the two lowest-weight qubits of the precision register to represent values of x (meaning that x can take the values 0, 1, 2, 3). Consequently, we only need to condition on these two qubits.

Note

Why is precision a four-qubit register if we only need two qubits for x? Although we will never directly use the additional 0x4 and 0x8 qubits that are present in the precision register, including them in all consequent calculations effectively stretches out the circle-notation patterns that we’ll observe within our QPU registers. Shor’s algorithm would run just fine without them, but pedagogically, it would be a little harder for us to point out the patterns explaining how the algorithm works.

If the lowest-weight qubit in precision has a value 1, then we will need to include one ×2 multiplication to the work register, and so we perform a single rollLeft() on work conditioned on this qubit’s value, as shown in Figure 12-9.

QPU instructions for step 3.
Figure 12-9. QPU instructions for step 3

The result is shown in Figure 12-10.

Step 3: To begin implementing _a_^_x_^ we multiply by 2 all qubits in the work register, conditioned on the lowest-weight qubit of precision being 1
Figure 12-10. Step 3: To begin implementing ax we multiply by 2 all qubits in the work register, conditioned on the lowest-weight qubit of precision being 1
Tip

In QPU programming, using conditional gates as the equivalent of if/then operations is extremely useful, as the “condition” is effectively evaluated for all possible values at once.

Step 4: Conditional Multipy-by-4

If the next-highest-weight qubit of the precision register is 1, then that implies a binary value of x also requiring another two multiplications by 2 on the work register. Therefore, as shown in Figure 12-11, we perform a shift by two qubits—i.e., two rollLeft() operations—conditional on the value of the 0x2 qubit from the precision register.

QPU instructions for step 4
Figure 12-11. QPU instructions for step 4

We now have a value of ax in the work register, for whatever value of x is encoded in the (first two) qubits of the precision register. In our case, precision is in a uniform superposition of possible x values, so we will obtain a corresponding superposition of associated ax values in work.

Although we’ve performed all the necessary multiplications by 2, it may seem like we’ve fallen short of implementing the function ax mod(N) by taking no action to take care of the mod part of the function. In fact, for the particular example we’ve considered, our circuit manages to take care of the modulus automatically. We’ll explain how in the next section.

Figure 12-12 shows how we’ve now managed to compute ax mod(N) on every value of x from the precision register in superposition.

Step 4: The +work+ register now holds a superposition of 2^_x_ mod(15) for every possible value of $x$ in the +precision+ register
Figure 12-12. Step 4: The work register now holds a superposition of 2x mod(15) for every possible value of x in the precision register

The preceding circle notation shows a familiar pattern. Having performed the function ax mod(N) on our QPU register, the superposition amplitudes exactly match the plot of ax mod(N) that we first produced at the beginning of the chapter in Figure 12-1 (albeit with the axes rotated 90°). We highlight this in Figure 12-13.

Hey, this looks familiar!
Figure 12-13. Hey, this looks familiar!

We now have the repeating signal for ax mod(N) encoded into our QPU registers. Trying to read either register now will, of course, simply return a random value for the precision register, along with the corresponding work result. Luckily, we have the DFT-spike-counting trick up our sleeves for finding this signal’s repeat period. It’s time to use the QFT.

Step 5: Quantum Fourier Transform

By performing a QFT on the precision register as shown in Figure 12-14, we effectively perform a DFT on each column of data, transforming the precision register states (shown as the rows of our circle-notation grid) into a superposition of the periodic signal’s component frequencies.

Note

Looking at the periodic pattern shown in the circle notation of Figure 12-12, you might wonder why we don’t need to QFT both registers (after all, it was the work register that we applied ax mod(N) to!). Pick a work value with nonzero amplitude from the circle-notation grid and look up and down the circles of that column. You should clearly see that as the precision register value is varied, we get a periodic variation of amplitude (with a period of 4 in this case). This is the register for which we therefore want to find the QFT.

Reading from top to bottom, Figure 12-15 now resembles the DFT plot we originally saw toward the beginning of the chapter in Figure 12-3, as we highlight in Figure 12-16. Note in Figure 12-15 that the QFT has also affected the relative phases of the register amplitudes.

QPU instructions for step 5
Figure 12-14. QPU instructions for step 5
Step 5: Frequency spikes along the +precision+ register following application of the QFT
Figure 12-15. Step 5: Frequency spikes along the precision register following application of the QFT
Once again, this resembles something we've seen
Figure 12-16. Once again, this resembles something we’ve seen

Each column in Figure 12-16 now contains the correct number of frequency spikes (four in this case). Recall from our earlier discussion that if we can count these spikes, that’s all we need to find the factor we seek via some conventional digital logic. Let’s READ out the precision register to get the information we need.

Step 6: Read the Quantum Result

The READ operation used in Figure 12-17 returns a random digital value, weighted by the probabilities in the circle diagram. Additionally, the READ destroys all values from the superposition that are in disagreement with the observed digital result.

QPU instructions for step 6
Figure 12-17. QPU instructions for step 6

In the example readout shown in Figure 12-18, the number 4 has been randomly obtained from the four most probable options. The QPU-powered part of Shor’s algorithm is now finished, and we hand this READ result to the conventional logic function ShorLogic() used in the next step.

Step 6: After a `READ` on the +precision+ register
Figure 12-18. Step 6: After a READ on the precision register

Step 7: Digital Logic

Our work up until this point has produced the number 4, although looking back at Figure 12-15, we could equally have randomly received the results 0, 4, 8, or 12.

As noted earlier, given our knowledge that the QFT spikes are evenly distributed across the register, we can determine what periods are consistent with our READ value using conventional digital logic. The estimate_num_spikes() function in Example 12-4 explicitly shows the logic for doing this. In some cases, this function may return more than one candidate number of spikes for the DFT plot. For example, if we pass it our READ value of 4, then it returns the two values 4 and 8, either of which is a number of spikes in the DFT consistent with our READ value.

Since (in this example) we’ve ended up with two candidate results (4 and 8), we’ll need to check both to see whether they give us prime factors of 15. We introduced the method ShorLogic(), implementing the gcd equations that determine prime factors from a given number of DFT spikes back at the start of the chapter, in Example 12-1. We first try the value 4 in this expression, and it returns the values 3 and 5.

Warning

Not all of the available values will lead to a correct answer. What happens if we receive the value 0? This is 25% likely, and in this case the estimate_num_spikes() function returns no candidate values at all, so the program fails. This is a common situation with quantum algorithms, and not a problem when we can check the validity of our answer quickly. In this case, we make such a check and then, if necessary, run the program again, from the beginning.

Step 8: Check the Result

The factors of a number can be difficult to find, but once found the answer is simple to verify. We can easily verify that 3 and 5 are both prime and factors of 15 (so there’s no need to even try checking the second value of 8 in ShorLogic()).

Success!

The Fine Print

This chapter presented a simplified version of what is typically a very complicated algorithm. In our version of Shor’s algorithm, a few aspects were simplified for ease of illustration, although at the cost of generality. Without digging too deeply, we here mention some of the necessary simplifications. More information can also be found in the online sample code.

Computing the Modulus

We already mentioned that in our QPU calculation of ax mod(N) the modulus part of the function was somehow automatically taken care of for us. This was a happy coincidence specific to the particular number we wanted to factor, and sadly won’t happen in general. Recall that we multiplied the work register by powers of two through the rolling of bits. If we simply start with the value 1 and then shift it 4 times, we should get 24 = 16; however, since we only used a 4-bit number and allowed the shifted bits to wrap around, instead of 16 we get 1, which is precisely what we should get if we’re doing the multiplications followed by mod(15).

You can verify that this trick also works if we’re trying to factor 21; however, it fails on a larger (but still relatively tiny) number such as 35. What can we do in these more general cases?

When a conventional computer calculates the modulus of a number, such as 1024 % 35, it performs integer division and then returns the remainder. The number of conventional logic gates required to perform integer division is very (very) large, and a QPU implementation is well beyond the scope of this book.

Nevertheless, there is a way we can solve the problem, by using an approach to calculating the modulus that, although less sophisticated, is well suited to QPU operations. Suppose we wanted to find y mod(N) for some value y. The following code will do the trick:

y -= N;
if (y < 0) {
    y += N;
}

We simply subtract N from our value, and use the sign of the result to determine whether we should allow the value to wrap around or return it to its original value. On a conventional computer this might be considered bad practice. In this case, it gives us exactly what we need: the correct answer, using decrements, increments, and comparisons—all logic we know can be implemented with QPU operations from our discussions in Chapter 5.

A circuit for performing the modulus by this method (on some value val) is shown in Figure 12-19.

Quantum operations to perform a multiply-by-2 modulo 35
Figure 12-19. Quantum operations to perform a multiply by 2 modulo 35

While this example does use a large number of complicated operations to perform a simple calculation, it is able to perform the modulus in superposition.

Tip

The modulus implementation in Figure 12-19 actually uses one additional scratch qubit to stash away the sign bit of the work register, for use in the conditional addition.

Time Versus Space

The modulus operation described in the preceding section slows things down considerably for the general factoring case, primarily because it requires the multiply-by-2 operations to be performed one at a time. This increase in the number of operations needed (and therefore overall operation time) destroys our QPU’s advantage. This can be solved by increasing the number of qubits used, and then applying the modulus operation a logarithmic number of times. For an example of this, see http://oreilly-qc.github.io?p=12-A. Much of the challenge of QPU programming involves finding ways to balance program depth (the number of operations) against required number of qubits.

Coprimes Other Than 2

The implementation covered here will factor many numbers, but for some it returns undesired results. For example, using it to factor 407 returns [407, 1]. While this is technically correct, we would much prefer the nontrivial factors of 407, which are [37, 11].

A solution to this problem is to replace our coprime=2 with some other prime number, although the quantum operations required to perform non-power-of-2 exponentiation are outside the scope of this book. The choice of coprime=2 is a useful illustrative simplification.

1 Note that not all problems of this class are known to have solutions of this form. For example, graph isomorphism (which tests the equivalence of graphs under relabeling of vertices) also belongs to the hidden subgroup class, but it is unknown whether an efficient QPU algorithm exists for this problem.

2 See Chapter 14 for more in-depth references.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset