The classical method of multiplying two binary numbers of length n requires Θ(n2) operations. Let the two numbers r and s of 2k bits each be of the following form:
Here, A represents the k most significant bits of r and B the k least significant bits. Similar for C and D with respect to s. In other words, we can write r = A 2k + B and s = C 2k + D. From that we obtain
Instead of following this approach, Karatsuba’s algorithm (Anatoly Alexeyevich Karatsuba, 1937–2008) recursively computes the three products AC, (A + B)(C + D), and BD. Then, rs can be computed performing only three multiplications of numbers with bit length at most k
Let tmult(n) be the time which this recursive method takes to multiply numbers of n bits. Since n-bit numbers can be added in time (n), the master theorem yields the following estimate for tmult(n):
Thus, by a divide-and-conquer approach, we have reduced the exponent of the classical method from 2 to approximately 1.58496.
Another important arithmetic operation is the computation of a mod m. Because of
this operation can be reduced to subtraction, multiplication and integer division. For subtraction, the complexity of the classical method is linear in the number of digits. For multiplication, we have just seen an algorithm with subquadratic running time. So, now division remains to be investigated. To this end, we sketch in the following a method, which, using only subtractions and multiplications, is able to compute sufficiently many decimal places of the reciprocal of a given integer. Once we know enough bits of after the decimal point, we can compute by multiplying a and . Those bit values of contributing merely to the fractions part of this product need not be computed.
To approximate , we can use the Newton method (Sir Isaac Newton, 1643–1727) to find the root of For this purpose, we compute better and better approximations xi to this root by the rule
In our particular case, this results in the rule
If we begin with an initial value x0 between 0 and , the sequence (xi)i∈ℕ very quickly converges to . A good initial value is, for example, x0 = 2−⌊log2 m⌋−1. The value ⌊log2 m⌋ + 1 can be found easily by counting the number of binary digits of m. Without further computation, the initial value x0 can be specified in binary by a sequence of zeros after the decimal point, followed by a 1. The problem that we have to work with noninteger numbers can be eliminated by previously multiplying by a sufficiently large power of 2; this can be achieved by a simple shift operation.
The goal of the discrete Fourier transform (named after Jean Baptiste Joseph Fourier, 1768–1830) is to multiply polynomials very efficiently. Let R be a commutative ring and b ≥ 1 a natural number with an inverse in R. That is, the multiplicative inverse b−1 ∈ R exists. This is always the case if R is a field of characteristic zero, like ℂ. It also holds in fields of characteristic p such that gcd(b, p) = 1. And it is also true if b = 2r is a power of two and R = ℤ/nℤ for odd n. Note that ℤ/nℤ may have various zero divisors.
The b-fold direct product Rb of R consists of all vectors (u0, . . . , ub−1) with ui ∈ R. For Rb, we can consider two different multiplications to be defined in a natural way. First, we can perform a componentwise multiplication:
Or we interpret a vector (u0, . . . , ub−1) as a polynomial Σi uiXi and multiply polynomials in the quotient ring R [X]/(Xb −1). To distinguish these two kinds of multiplication, we denote the product of the polynomials f and g as f ∗ g (it is called the convolution product). By convention ui = 0 for all i < 0 and all i ≥ b. This eases notation because we need no explicit summation limits. In this quotient ring, the identity X b = 1 holds, and we obtain
Let now ω ∈ R be a bth root of unity, that is, an element satisfying ωb = 1. Then, we can evaluate f(X) ∈ R[X]/(Xb − 1) at powers ωi and
defines a ring homomorphism from R[X]/(Xb − 1) into Rb. A primitive bth root of unity is an element ω ∈ R {1} satisfying the following conditions:
(a)ωb = 1,
(b) for all 1 ≤ k < b.
If ω is a primitive bth root of unity, then ω−1 is as well, since the equation in condition (b) can be multiplied by ω−k(b−1). If c ≥ 1 divides b, then ωc is a primitive (b/c)th root of unity in R because we have
with k' = kc < b. Note that, in the third term of the equation above, each power is counted exactly c times. From Corollary 1.57, the complex number is a primitive bth root of unity in ℂ. The discrete Fourier transform is possible in each ring R in which b has a multiplicative inverse b−1 and where a primitive bth root of unity ω exists. For such rings R, the polynomial ring R[X]/(Xb − 1) with the identity Xb = 1 and the direct product Rb are isomorphic. The isomorphism can be explained by matrix multiplications. The {0, . . . , b − 1}×{0, . . . , b − 1} matrices F = (ωij)i,j and satisfy
Here, (aij)i,j denotes the matrix with entry aij at the (i, j)-position. For i ≠ j we have and for i = j this sum equals b. Thus, F ⋅ F̄ is a diagonal matrix with value b on the diagonal. In particular, the matrices F and F̄ are invertible and F ⋅F̄ ⋅b−1 is the unit matrix. Now, and we obtain the following ring isomorphism F : R[X]/(Xb − 1) → Rb, mapping each element f(X) = Σi aiXi to
In particular, multiplicationwith the matrix F can be interpreted as evaluating a polynomial f at points 1, ω, ω2, . . . , ωb−1. The mapping F : R[X]/(Xb − 1) → Rb is called discrete Fourier transform. For the inverse mapping, the matrix F is replaced by and in the end the result is multiplied by the scalar b−1.
Fig. 3.1. Computing f ∗ g using the discrete Fourier transform.
To compute the coefficients zi in f(X) ∗ g(X) = Σi ziXi according to Figure 3.1, we use the following strategy:
(1)Compute F(f(X)) and F(g(X)), that is, f(ωi) and g(ωi ) for 0 ≤ i < b.
(2)Evaluate the products hi = f(ωi)⋅ g(ωi) in R for 0 ≤ i < b.
(3)Compute
Since F is a ring isomorphism, we have
Next, let us consider the special case that b is a power of 2, that is, b = 2r for r ≥ 1. Then, the computation of F (f(X)) can efficiently be performed using a divide-and-conquer strategy. This leads to the fast Fourier transform (abbreviated FFT) as follows. We write polynomials f(X) of degree less than b in the form
The polynomials fj have degree less than b/2 and ω2 is a primitive (b/2)th root of unity. If f = (a0, . . . , ab−1), then f0 = (a0, a2, . . . , ab−2) and f1 = (a1, a3, . . . , ab−1).
Let F' : R[X]/(Xb/2 − 1) → Rb/2 be the Fourier transform of dimension b/2 with the root of unity ω2. If we have computed F'(f0) = (u0, . . . , ub/2−1) and F'(f1) = (υ0, . . . , υb/2−1), then we can determine F (f) as follows. With vectors
of length b, we determine by componentwise computation
The transformed F(f0) and F(f1) can be recursively computed with the same procedure. The additions in R and multiplications with ωi and b−1 are called elementary arithmetic operations. Let t(b) be the number of elementary arithmetic operations necessary in this scheme for computing F (f). Then, t(b) satisfies the recurrence t(b) ≤ 2 t(b/2) + (b) since to determine the Fourier transform two recursive calls for f0 and f1 of half the size are needed. The results of these two recursive calls will be combined to obtain the transform f using a linear number of operations. The master theorem, Theorem 3.2, then yields t(b) ∈ (b log b).
If we want to compute the product f(X) ∗ g(X) of two polynomials of degree less than d in R[X], then we can choose b as the smallest power of 2 with b ≥ 2d. If a primitive bth root of unity ω is known, then using the FFT according to the scheme in Figure 3.1, the product of f(X) and g(X) can be computed with only (d log d) elementary arithmetic operations. The naive approach, in contrast, requires (d2) elementary arithmetic operations.
Let b ∈ ℕ and ' be a field of characteristic zero or characteristic p with gcd(b, p) = 1. Then, b is invertible in . Moreover, the formal derivative of the polynomial Xb − 1 is bXb−1. Therefore, the polynomial Xb−1 has no multiple roots. In a field extension of , the polynomial Xb − 1 splits into linear factors and the bth roots in the field extension are pairwise distinct. The group of bth roots of unity thus has b elements. It is cyclic and generated by an element ω. Then, ωb = 1 and ωk ≠ 1 for all 1 ≤ k < b. This implies for all 1 ≤ k < b. Since fields have no zero divisors, follows; and the generator ω is a primitive bth root of unity. If = ℂ, for example, we may choose
In general, however, a ring has zero divisors, and therefore the properties ωb = 1 and ωk ≠ 1 for all 1 ≤ k < b are not sufficient to ensure The remaining part of this section is dedicated to the following theorem which is crucial for Schönhage and Strassen’s algorithm for the fast multiplication of large numbers.
Theorem 3.17. Let b = 2r and let m be a multiple of b, and let n = 2m + 1. We define ψ = 2m/b and ω = ψ2. Then the following properties hold in ℤ/nℤ:
(a)b−1 = −2m−r
(b)ψb = −1
(c)ω is a primitive bth root of unity.
Proof: In ℤ/nℤ we have 2m = −1. The claims b−1 = −2m−r and ψb = −1, as well as ωb = 1, are therefore trivial. We have to show for all 1 ≤ k < b. Since b = 2r is a power of 2, an induction on r yields
Now let k = 2ℓu with an odd number u. Using 0 ≤ ℓ < r, we have a factor of the form in the above product. Now, Since u is odd, we obtain
□
Multiplying two n-bit numbers requires n2-bit operations when using the classical method. In 1960, Karatsuba noticed that it can be done much faster with a surprisingly simple divide-and-conquer approach; he showed that asymptotically less than n1.6 operations are sufficient; see Section 3.8. A major breakthrough came when Arnold Schönhage (born 1934) and Volker Strassen were able to present a method with a nearly linear number of arithmetic operations in 1971 [91]. More specifically, they showed that the multiplication of two n-bit numbers can be done in time (n log n log log n) on a multitape Turing machine. It took 35 more years, until Martin Fürer (born 1947) could further improve this time bound to (n log n 2log∗ n); see [43]. Here, log∗ n denotes the number of applications of the logarithm needed to have a result less than 1. The function 2log∗ n asymptotically grows much slower than log log n; however, on all realistic inputs, log log n is smaller.
In the remainder of this section, we prove the result of Schönhage–Strassen.
Theorem 3.18. The multiplication of two natural numbers of bit size n can be done with (n log n log log n) bit operations.
The input consists of two natural numbers u and υ. We may assume that the binary representation of the product uυ requires at most n bits and that n = 2s is a power of 2. It is sufficient to compute uυ modulo 2n + 1. In the according quotient ring, 2n = −1 holds. We define b = 2⌈s/2⌉ and ℓ = 2⌊s/2⌋. As a rule of thumb, we note
–2n + 1 is large and odd.
–n = 2s is the input size and a power of 2.
–s ∈ ℕ is a small number.
–
We decompose the input into b blocks of length ℓ and write u = Σi ui2ℓi and υ = Σi υi2ℓi, where 0 ≤ ui , υi < 2ℓ for all i. By convention, ui = υi = 0 holds for all i < 0 and for all i ≥ b. Let yi = Σj ujυi−j; then yi ≠ 0 can only hold for 0 ≤ i < 2b − 1. For 0 ≤ i < b at most i + 1 summands ujυi−j in yi are nonzero, which yields yi < (i + 1)22ℓ; and in yb+i at most b − i −1 summands do not vanish, resulting in yb+i < (b − i −1)22ℓ. With 2bℓ = −1, we obtain
in ℤ/(2n + 1)ℤ. Since −(b − i − 1)22ℓ < wi < (i + 1)22ℓ, it is sufficient to determine the values wi modulo b(22ℓ + 1). Since b is a power of 2 and 22ℓ + 1 is odd, according to the Chinese remainder theorem we can first compute the wi modulo b, then modulo 22ℓ + 1, and finally combine these two intermediate steps for the result modulo b(22ℓ + 1). Letting mod b and mod 22ℓ + 1, we obtain wi from and by the computation
since 22ℓ ≡ 0 mod b. The resulting values wi with −(b − i − 1)22ℓ < wi < (i + 1)22ℓ each have bits. From w0, . . . , wb−1 we compute
This requires additions or subtractions, respectively, each on operands of length These can be performed by the classical methods in linear time. It remains to show how to find the values and
We first describe how the values wiare computed. First, we let mod b and mod b for 0 ≤ i < b. This is easy because b is a power of 2 and ui and υi are given in binary. Since we have mod b. Therefore, it is sufficient to determine the for 0 ≤ i < 2b. We know that and therefore 1 + 3 log b < 4 log b bits for each are sufficient. Let and This means that u' can be obtained from the by inserting sufficiently long blocks of zeros between them; υ' can be constructed in the same way. Then, we have
The binary lengths of u' and υ' are bounded by 4b log b, so with the method of Karatsuba, we can compute the product u' ⋅ υ' in time ((b log b)1.6)⊆ (b2) = (n). From the result, one can directly extract the values yi since the corresponding blocks do not overlap.
Next, wewant to determine the We let N = 22ℓ+1 andR = ℤ/Nℤ. In particular, 22ℓ = −1 in R. Note that N is approximately The number N is large, but it is much smaller than 2n. The subsequent calculations are performed in the polynomial ring R[X]/(Xb − 1). The elements ψ = 22ℓ/b and ω = ψ2 have the properties given by Theorem 3.17. Thus, this polynomial ring is isomorphic to the direct product Rb, and multiplication of polynomials can be reduced to multiplications in R as shown in Section 3.9. We consider the following two polynomials:
Since ψb = −1 and Xb = 1, we have the following:
By the factor ψi in the definition of f and g it is ensured that yb+i has the desired negative sign. Let h(X) = f(X)∗g(X) ∈ R[X]/(Xb −1) and write Then, we obtain the as follows:
We compute the zi using the fast discrete Fourier transform, in which we recursively use the same algorithm for multiplication in R. It remains to show how to compute the number ziψ−i mod 22ℓ + 1 efficiently. The time bound (ℓ) is sufficient. Elements of R are represented by numbers z in {0, . . . ,22ℓ}. Numbers z ∈ R can efficiently be multiplied with −1 by computing 22ℓ + 1 − z with the classical method. Because of ψ−i = −ψb−i in R, it suffices to compute elements zψj mod N for 0 < j < b. Now, ψj ≡ 2k mod N for k = 2ℓj/b ∈ {1, . . . ,2ℓ − 1}. The value z2k can be computed by left-shifting z by k bits. We can split the result of the shift operation into two 2ℓ-bit blocks z' , z' ∈ {0, . . . ,22ℓ − 1} such that z2k = z' + z22ℓ. Subtracting z' from z' modulo N now yields the desired result since z' − z' ≡ z' + z22ℓ = z2k ≡ ψj mod N.
Overview of the algorithm
In the following, we give a sketch of how the algorithm works. Here, x ∈ R means that the element x from the ring R is given or was computed. Arrows stand for a causal connection. In order to give a better insight into the proportions, b and ℓ are replaced by
The individual steps of the algorithm can be summarized as follows:
(1)The numbers u and υ each are decomposed in blocks of bits. Each block is computed modulo then u' and υ' are compiled from the blocks with sufficiently many zeros inserted.
(2)The numbers u' and υ' are multiplied using the Karatsuba multiplication algorithm.
(3)Due to the shielding by sufficiently many zeros, the blocks can be reconstructed from the product uυ.
(4)The blocks are viewed as numbers modulo and interpreted as coefficients of polynomials f and g.
(5)The Fourier transforms F (f) and F (g) of f and g are computed.
(6)In the computation of F(f) ⋅ F(g) for each of the multiplications in the ring the algorithm is called recursively.
(7)Using the inverse Fourier transform the product of the polynomials f and g is determined.
(8)The values appear as coefficients of the polynomial f(X) ∗ g(X).
(9)With the Chinese remainder theorem the numbers mod and mod can be combined to wi mod
(10)The product uυ is computed from the numbers wi.
Computing F(f(X)), F(g(X)), and takes (b log b) elementary arithmetic operations, each of which can be performed with (ℓ)-bit operations. This results in (ℓ ⋅ b log b) = (n log n) operations. Furthermore, b products hi = f(ωi)⋅ g(ωi ) modulo 22ℓ + 1 have to be computed recursively. Let M (n) be the number of bit operations needed to multiply two n-bit numbers using the Schönhage–Strassen algorithm. Then, we obtain the following recurrence:
Thus, M(n)/n ≤ 2M(2ℓ)/2ℓ + O(log n). For n = 2s and t(s) = M(2s)/2s, we obtain
This yields t(s) ∈ (s log s) by the master theorem (see Theorem 3.2 and Example 3.3).
Thus, we finally have M(n) ∈ (n log n log log n).
3.1. (Master theorem II) Let and Show that f(n) ∈ (n).
3.2. Show that on input two binary numbers a, b ∈ ℕ, one can decide in polynomial time whether there exists c ∈ ℚ with c2 = a/b.
3.3. Give an extension of the binary gcd algorithm such that, on input k,ℓ ∈ ℕ, it computes a, b, t with ak + bℓ = t = gcd(k, ℓ). The algorithm should not use division and multiplications by numbers other than 2.
3.4. Show that 1729 is an Eulerian pseudoprime to base 2, but not a strong pseudoprime to base 2. In addition, use the Miller–Rabin scheme to find a nontrivial divisor of 1729.
3.5. (Lucas test; named after Édouard Lucas, 1842–1891) Let n ≥ 1 and a ∈ ℤ. Show that n is prime if the following two conditions are satisfied:
(i)an−1 ≡ 1 mod n and
(ii) a(n−1)/q ≢ 1 mod n for all prime divisors q of n − 1.
3.6. (Pépin test; named after Théophile Pépin, 1826–1904) Let n ≥ 1. Show that the Fermat number is prime if and only if 3(fn−1)/2 ≡ −1 mod fn.
3.7. Let p > 2 be a prime number and consider the Mersenne number n = 2p − 1 (named after Marin Mersenne, 1588–1648).
(a)Show that if n is prime, then f(X) = X2 − 4X + 1 ∈ n[X] is irreducible.
(b)Suppose that n is prime and let K = n[X]/f . Show that Xn = 4 − X and (X − 1)n+1 = −2 in K.
(c)Suppose that n is prime and let K = n[X]/f . Show that (X − 1)n+1 = 2X(n+1)/2 in K.
(d)Show that n is prime if and only if X(n+1)/2 ≡ −1 mod X2 − 4X + 1 in (ℤ/nℤ)[X].
3.8. (Lucas–Lehmer test; named after É. Lucas and Derrick Lehmer, 1905–1991) Let p > 2 be a prime number and n = 2p − 1 the corresponding Mersenne number. The sequence (ℓj)j∈ℕ is defined by ℓ0 = 4 and mod n. Show that n is prime if and only if ℓp−2 = 0.
Hint: Show that mod X2 − 4X + 1 in (ℤ/nℤ)[X].
3.9. A linear Diophantine equation has the form a1X1 +⋅ ⋅ ⋅+anXn = an+1 with ai ∈ ℤ. Explain how to check whether a solution (x1, . . . , xn)∈ ℤn exists and how to compute it.
3.10. Compute the square root of 2 in the field 41 = ℤ/41ℤ. Use Tonelli’s algorithm and choose g = 3 in the first step.
3.11. Let p be prime with p ≡ 5 mod 8. We consider the field p and a square a ∈ p. Show that if a(p−1)/4 = 1, then a(p+3)/8 is a square root of a; and if a(p−1)/4 = −1, then 2−1(4a)(p+3)/8 is a square root of a.
3.12. Let p be a prime number with p ≡ −1 mod 4 and let f(X) = X2 + 1.
(a)Show that the polynomial f(X) is irreducible in p[X].
(b)Show how to efficiently find an element a with 1 ≤ a < p − 1 such that and
(c)Show that in p2 = p[X]/f it is possible to extract roots in deterministic polynomial time.
3.13. Compute the square root of 2 in the field 23 = ℤ/23ℤ. Use Cipolla’s algorithm and choose t = 0 in the first step.
3.14. Use Pollard’s p − 1 algorithm to find a nontrivial divisor of n = 253. In the first step, let B = {2, 3} and a = 2, in the second step B = {2, 3, 5} and a = 2.
3.15. Find a nontrivial divisor of n = 689 with Pollard’s ρ algorithm. Use F (x) = x2 + 1 mod n and the initial value x0 = 12.
3.16. Compute the discrete logarithm of 3 to base 2 in the multiplicative group (ℤ/19ℤ)∗ using Shanks’ baby-step giant-step algorithm.
3.17. Let G be a finite group and g ∈ G. Show how to determine the order of g using Shanks’ baby-step giant-step algorithm.
3.18. Let G = (ℤ/23ℤ)∗ and g = 3.
(a)Determine the order of g in G.
(b)Use Pollard’s ρ algorithm to compute the discrete logarithm of 18 to base 3 in G. Decompose G with respect to the residue classes modulo 3 and use r1 = s1 = 1 as initial values.
3.19. Compute the discrete logarithm of 5 to base 2 in the multiplicative group (ℤ/19ℤ)∗. Reduce the problem with the Pohlig–Hellman method to discrete logarithms in groups of smaller order.
3.20. Let be a Fermat prime. Show that discrete logarithms in (ℤ/pℤ)∗ can be computed in deterministic polynomial time.
3.21. Let q = 97, b = 4 and ω = −22.
(a)Show that ω is a primitive bth root of unity in q.
(b)Specify the Fourier matrices F, with respect to ω.
(c)Compute the product f ∗ g of the polynomials f(X) = 1 + X + X2 and g(X) = 2 + 3X using the FFT.
3.22. Let p1, p2, p3 ∈ 256ℤ + 1 be distinct primes of at most 64 bits, let ωi be a primitive 256th root of unity in the field Ki = ℤ/piℤ, and let Fi be the Fourier transform in Ki with respect to ωi. On input two numbers u = Σj≥0 uj264j and υ = Σj≥0 υj264j with 0 ≤ uj , υj < 264 we determine Ui(X) = Σj≥0(uj mod pi)Xj and Vi(X) = Σj≥0(υj mod pi)Xj. Then, in Ki we compute the coefficients of
using the FFT. Finally, we determine the polynomial W(X) = Σj≥0wjXj with wj ≡ wi,j mod pi by means of the Chinese remainder theorem. At last, we return the number w = W(264). What is the running time of this algorithm? What are the maximum values for u and υ to ensure that w = uυ holds?
Notions
–addition chain
–Carmichael number
–square-free
–pseudoprime
–Eulerian pseudoprime
–strong pseudoprime
–square, quadratic residue
–square root
–root extraction in groups
–factorization
–pseudo-random sequence
–B-smooth
–discrete logarithm
–primitive root of unity
–elementary arithmetic operation
–master theorem for estimation of recurrence equations
–birthday paradox: For random sequences of m events from Ω with the probability to have two identical elements in the sequence is greater than 1/2.
–fast exponentiation for computing an with O(log n) multiplications
–binary gcd
–Fermat test
–properties of Carmichael numbers
–The error probability of the Solovay–Strassen primality test is at most 1/2.
–description of the Miller–Rabin primality test
–The error probability of the Miller–Rabin primality test is at most 1/4.
–Miller–Rabin scheme yields an algorithm for factorization of n if a multiple of φ(n) is known or if φ(n) has only small prime divisors
–Miller–Rabin scheme as evidence for the secret RSA key to be secure
–n Eulerian pseudoprime to base a ⇒ n pseudoprime to base a
–n strong pseudoprime to base a⇒ n Eulerian pseudoprime to base a
–root extraction in groups of odd order
–root extraction in q with q ≡ −1 mod 4
–Tonelli’s algorithm for extracting roots in q with odd number q
–Cipolla’s algorithm for extracting roots in q with odd q
–Pollard’s p − 1 algorithm
–Pollard’s ρ algorithm for integer factorization
–quadratic sieve
–Shanks’ baby-step giant-step algorithm for discrete logarithm
–Pollard’s ρ algorithm for discrete logarithm
–Pohlig–Hellman algorithm for discrete logarithm: step 1 reduction to prime powers, step 2 reduction to prime divisors
–index calculus algorithm
–Karatsuba integer multiplication in (n1,58496⋅⋅⋅ )
–division using Newton method
–discrete Fourier transform: ring isomorphism R[X]/(Xb − 1) → Rb
–fast Fourier transform: computing the discrete Fourier transform with (b log b) elementary arithmetic operations
–primitive roots of unity in ℤ/(2m + 1)ℤ
–Schönhage–Strassen integer multiplication in (n log n log log n)