7.3 LOGARITHMIC, EXPONENTIAL, AND TRIGONOMETRIC FUNCTIONS

Most often the computation of functions such as logarithms and exponential or trigonometric functions are made through software-implemented algorithms applied to floating-point representations. Hardware or microprogrammed systems are mainly justified for special-purpose computing devices such as ASIC or embedded systems. As it is generally not possible to get an exact result, approximation methods have to be used together with error estimation techniques. Newton–Raphson, Taylor–MacLaurin series, or polynomial approximations are the most common approaches to compute these functions. For trigonometric functions, CORDIC (linear convergence) algorithms are well suited. Arguments included in the range [1, 2[ (floating-point IEEE standard) are suitable for most approximation methods that need to limit the range of the argument. Whenever a specific range is imposed on the operands, a prescaling operation may be necessary: so an initial step may be included in the algorithmic procedure. Crucial questions for approximation methods are error estimation and effective rounding techniques; these problems start from table design (first approximation LUT) up to the final result. Numerous methods, algorithms, and implementations are proposed in the literature ([SPE1965], [CHE1972], [ERC1977], [COD1980], [KOS1991], [MAR1990], [TAN1991], [ERC1995], [PAL2000], [CAO2001]). As for the basic operations, the choice will depend on the speed/cost compromises and other constraints imposed on the designer. Approximation methods usually assume the availability of the four basic operations as arithmetic primitives at hand, together with look-up tables for a first “reasonably good” approximation to start from.

7.3.1 Taylor–MacLaurin Series

The classical Taylor expansion (at point a) formula can be written

image

When this expansion converges over a certain range of x, that is, when

image

the expansion is referred to as the Taylor series; if a = 0, it is called the MacLaurin series (Chapter 2).

Consider the convergence behavior of the exponential functions expressed by the following series at x = 0:

image

image

image

The above expressions can be factorized (Hörner scheme) to suggest a simple iterative computational step; for example, (7.13) for n = 8 can be written

image

Formula (7.16) can be computed in 8 divide-and-add steps with an error inferior to (1 + 0.1).x9/9! > R8, that is, for 0 ≤ x ≤ 1/2, an error ε (maximum for x = 1/2) given by

image

Actually, (7.16), computed at x = 1/2, yields (rounded down) 1.6487212650; ten-decimal precision computation of e1/2 yields (rounded up) 1.6487212708, that is, an error inferior to the value computed at (7.17). Obviously, the convergence is faster as x becomes smaller, so a prescaling operation could be necessary if the number of algorithmic steps is fixed. Expression (7.14) can be factorized (Hörner scheme) as

image

Formula (7.18) is computed in the same way as (7.16), but because of the alternating signs, the absolute value of the error εmax is somewhat smaller, actually x9/9!. Expression (7.15) is quite similar to (7.14) excepted for the prescaling: the range of x.ln (a) has to be considered instead of just that of x.

Logarithmic function series are also easy to compute, in particular, ln (x) can be written

image

or, for n = 10 (Hörner scheme),

image

A close analysis of (7.19) shows that in the range [1, 2] the convergence rate can be very slow; typically, for x = 2, the error εmax is expressed as

image

If the range is restricted to [0, 1/2], the maximum error after n steps will be

image

that is, for x = 1/2, less than 10−4 after 10 steps.

Prescaling the argument of logarithmic functions, together with the corresponding result adjustment, does not involve a significant increase in the overall complexity. Actually, if the argument has been initially divided by k, a correction term +ln (k) has to be added to adjust the final result. For exponential functions the general prescaling problem is generally not straightforward. The particular case of the function 2N, with N expressed in floating point, can be processed as follows. Let m.2k be the floating-point representation of N; m is the mantissa and k the exponent. The scaling process consists of reducing N as

image

So, assuming E = imagem.2kimage, one can write

image

where 2Ns is the mantissa of 2N and E the exponent. So the scaling process reduces the computation of 2N to that of 2Ns.

Trigonometric functions are also readily expressed as series, in particular,

image

or, for n = 6 (Hörner scheme)

image

To get a fast convergence rate, scaling is also required in this case. Since trigonometric functions are periodic, the scaling can be a straight reduction of the argument to, for example, the range image using trigonometric equivalence relations; this can thus be processed without calling for any postcorrection to the final result. If the range image holds, the convergence rate is fair: εmaximage can be expressed as

image

That is, for image

7.3.2 Polynomial Approximation

Basically, polynomial approximation methods consist of building a polynomial P(x) of degree n, that matches the target function at n points. The principle is general for continuous functions. Taylor–MacLaurin series are particular cases. The choice of the matching points is not fixed, neither is the degree of the polynomial nor the quantity of them (e.g., several polynomials may approximate the same function in separate ranges). A number of methods and algorithms have been proposed in the literature. Moreover, prescaling is often recommended to speed-up the convergence. A classical numerical calculus technique is used to compute the coefficients of the approximation polynomial, once the degree and the desired matching points {xk} are selected; it basically consists of resolving n+1 equations, linear in {ci}, of the form

image

where image are, respectively, the degree-n polynomial and the function to approximate, both computed at xk. Several alternative methods exist to obtain these coefficients without actually solving (7.27). The polynomial coefficients are computed once, then stored; the way the polynomial is then computed for a given value of the argument defines the algorithm computing the approximated function at this argument. A Hörner-type factorization is a first approach to define an iterative computation step.

Comment 7.3 Hörner Scheme Revisited In Sections 7.3.1 and 7.3.2, the Hörner scheme was mentioned as a possible iterative way to implement algorithms. Whenever pipelined or parallel arithmetic operators are available, this approach could be significantly improved; for instance, if one can take advantage of multiple multiply-and-add and squaring units, a useful generalized Hörner scheme may be defined as follows.

Let

image

be the Hörner expansion of the polynomial P(x).

Let

image

An extended Hörner expression can be written

image

Assuming available operators computing (7.29) and squaring units, the number of steps is roughly divided by two. Expressions (7.29) and (7.30) can be generalized further; assuming

image

one can write

image

called the generalized Hörner expansion (GHE), which takes advantage of higher level primitive polynomial operators. Expansion (7.32) corresponds to a polynomial of degree k.m − 1. Therefore, thanks to the availability of parallel and/or pipelined operators, multiple level Hörner computation schemes can be built to speed-up the overall process. For instance, a degree-63 polynomial could be computed in 9 multiply-and-add steps if a 3-level scheme is used. First, 16 degree-3 polynomials can be computed (3 steps); four degree-15 polynomials are then worked out using the degree-3 polynomials as primitives (3 steps), and another 3 steps are finally needed to compute the degree-63 polynomial using the degree-16 ones as primitives. A practical implementation of this circuit is shown in Chapter 14.

7.3.3 Logarithm and Exponential Functions Approximation by Convergence Methods

Convergence methods consist of two parallel processes on two related sequences; typically, one sequence converges to 1 (mutiplicative normalization) or 0 (additive normalization) while the other one converges to the function to approximate. Division using Goldschmidt's algorithm is an example of multiplicative normalization: while the divisor sequence converges to 1, the dividend converges to the desired quotient.

7.3.3.1 Logarithm Function Approximation by Multiplicative Normalization

Define

image

as the multiplicative normalizing function, where ai is selected in such a way that the sequence

image

converges toward 1. Then the sequence

image

can be set to converge toward the result ln (x). If y(0) and x(0) are, respectively, set to 0 and to the argument x, and assuming image one can write

image

To make the convergence of (7.34) possible, the argument x needs to be in a range such that

image

that is,

image

This means that the argument x could need to be prescaled to fall in the range (7.37). An argument x in the range [1, 2[ (such as, e.g., a floating-point mantissa) fits perfectly; otherwise use a straightforward prescaling operation that replaces x by x′ such that x = x′.2s (x′ in [1, 2[). The algorithm computes ln (x′), then a final additive correction of s.ln (2) is completed. Observe that the lower bound of (7.37) can be lowered to 0.21, as (1 + 20) can be accepted as a first normalizing factor for computing x(1).

In practical implementations of this algorithm, look-up tables are used to read out the successive values of ln (1 ± 2i), needed to compute y(i + 1) of (7.35). For x in [1/2, 2[, ai can be selected according to the following rules:

image

image

image

The above rules are justified by the following two lemmas, also showing that the convergence rate reaches precision p after p steps (linear convergence).

Lemma 7.1 Let

image

be the n-bit auxiliary sequence vector at step k; then

image

Proof The left inequality is trivial, it corresponds to ε = 0. The right inequality is deduced from the computation of x(k).(1 − 2k) for ε maximum, 2k − 2n.

The practical interpretation of (7.42) is the impact of rule (7.39) on x(k + 1) whenever x(k) is greater than one with a fractional part made up of a (k − 1)-zero string and a one at position k. Then x(k + 1) will be either greater than one, exhibiting a similar pattern with at least one zero more, or inferior to one (x0(k + 1) = 0) with at least 2k ones as the header of the fractional part. In both cases, the target value x(p) = 1 is approximated by x(k + 1) with at least one bit more.

Lemma 7.2 Let

image

be the n-bit auxiliary sequence vector at step k, then

image

Proof The right inequality is trivial, it corresponds to ε = 0. The left inequality is deduced from the computation of x(k).(1 + 2k) for ε maximum, that is, 2k − 2n.

The practical interpretation of (7.44) is the impact of rule (7.40) on x(k + 1) whenever x(k) is less than one with a fractional part made up of a k-one string and a zero at position k + 1. Then x(k + 1) will be either less than one, exhibiting a similar pattern with at least 2k ones as the header of the fractional part, or greater than one (x0(k + 1) = 1) with at least k + 1 zeros as the header of the fractional part. In both cases, the target value x(p) = 1 is approximated by x(k + 1) with at least one bit more.

Comment 7.4

  1. The selection (7.38) is justified by the fact that a decision about multiplying by ai.2i+1 (7.33) cannot be made before knowing the next bit. Actually, considering bit x0 only (either 1 or 0) one cannot know whether the sequence x(i) is already 1 (end of convergence process) or not.
  2. When x(i) > 1, the strategy described by (7.39) consists of detecting the first nonzero bit of x(i) then multiplying by (−2i + 1). When x(i) > 1, Lemma 7.1 shows that, at step i, bits xk > −i (i) are all zeros.
  3. When x(i) < 1, the strategy described by (7.40) consists of detecting the last nonzero bit of x(i) then multiplying by (2i + 1). When x(i) ≤ 1, Lemma 7.2 shows that, at step i, bits xk > −i (i) are all ones.

Algorithm 7.8 Logarithm Computation by Multiplicative Normalization

The argument x is in [1/2, 2[: x = x(0).x(1) x(2) … x(n). Let xx(i, j) be the component j of xx(i) = xx(i, 0).xx(i, 1) xx(i, 2) … xx(i, n). Let lut(i) = ln (1 + a(i).2i) read from the table.

a(0):=0; c(0):=1; xx(1):=x; yy(1):=0;
for i in 1..p-1 loop
  if xx(i)=1 then exit; end if;
  if xx(i)>1 then a(i):=-xx(i,i) else
  a(i):=xx(i,i)*not(xx(i,i+1)); end if;
  c(i):=1+a(i)*2**(-i); xx(i+1):=xx(i)*c(i);
  yy(i+1):=yy(i)-lut(i);
  end loop;

Example 7.8 In this example the auxiliary sequence x(i) is computed in the binary system, while, for readability, the sequence y(i) is computed in decimal; the precision is then readily verified. The functional values ln (1 ± 2i) are assumed given by look-up tables. x is in [1, 2[.

Let

image

Compute ln (x) with precision p = 8.

image

The actual decimal value of ln (1.71875) is 0.541597282 ± 10−9, the difference from the computed result is less than 8.10−4 < 2−10.

As it appears in the preceding example, whenever ai = 0, the only effect of step i on the computation process consists of incrementing the step number; both sequences x(i) and y(i) remain unchanged. So, by detecting strings of 0 or 1 in x(i), one could readily jump to the next nontrivial computation step. The following example illustrates this feature.

Example 7.9 As in the preceding example the auxiliary sequence x(i) is computed in the binary system, while sequence y(i) is computed in decimal The functional values ln (1 ± 2i) are given by look-up tables. x is now in [1/2, 2[. Strings 00 .. or 11 .. are highlighted by bold type.

Let

image

Compute ln (x) with precision p = 10 (see Table 7.3).

The actual decimal value of ln (0.59375) is −0.521296923 ± 10−9, the difference from the computed result is less than 4.10−6 < 2−10.

7.3.3.2 Exponential Function Approximation by Additive Normalization

Define

image

as the additive normalizing function, where ai is selected in such a way that the sequences

image

converges toward 0. Then the sequence (main sequence)

image

can be set to converge toward the result ex. If y(0) and x(0) are, respectively, set to 1 and to the argument x, and assuming image one can write

image

To make the convergence of (7.46) possible, the argument x needs to be in a range such that

image

that is,

image

TABLE 7.3

image

or

image

Observe that the upper bound of (7.48) can be raised to 1.562, as ln (1 + 20) can be accepted as the first normalizing factor for computing x(1). This extends the range (7.49) to] −1.242, 1.562]. Here again, the argument x could need to be prescaled to fall in the (extended) range (7.49). The range [1, 2[ (floating-point mantissa) doesn't fit any more; the range [−1, 1[ does. A possible prescaling operation can replace x by x′ such that x = x′.2s (x′ in [−1, 1[). The algorithm first computes ex′, then ex is provided after an s-time squaring (square rooting if s < 0) correction on ex′. In particular, a single shift reduces the range [1, 2[to [1/2, 1[: x = x′.2 → x′ in [1/2, 1[; squaring ex′ provides ex.

In the practical implementations of this algorithm, look-up tables are used to read out the successive values of ln (1 ± 2i), needed to compute x(i + 1) of (7.46). For x in [−1, 1[, ai can be selected according to the following rule:

whenever nonzero, ai holds the same sign as x(i), then

image

This simple rules ensures that, at each step, |x(i)| will either decrease or stay unchanged; ai = 0 prevents x(i) from increasing. Precision p is reached after p steps.

An interesting feature of the method comes from the following relation:

image

2i is the first term of the Taylor–MacLaurin series (7.19). The approximation ln image produces an error bounded by (2−2i), that is, less than 2−20 after 10 steps. This very fast convergence behavior (see Table 7.3) can be used to speed-up the algorithm. If the desired precision p is great enough, the algorithmic procedure can be stopped after p/2 iteration steps; at this point one can write

image

corresponding to the selection image

Actually (7.52), the convergence of x(p) to zero, is settled by subtracting image for p/2 great enough xi.2i can be approximated by ln image

On the other hand, for p great enough (2i small enough),

image

As a consequence, the final p/2 iteration steps can be replaced by

image

The auxiliary sequence has been built up to x(p/2), then settled to x(p) = 0 by (7.52); the main sequence y(p) can be computed from y(p/2) in just one computation step (7.54). Algorithm 7.9 resumes the general procedure. Chapter 14 describes an implementation including the above-mentioned one-step simplification for the last p/2 steps.

Algorithm 7.9 Exponential Computation by Additive Normalization

The argument x is in [−1, 1[. Let lutn(i) = (1 − 2i), lutp(i) = (1 + 2i), lutlnn(i) = ln (1-2i) and lutlnp(i) = ln (1 + 2i) be read from look-up tables.

xx(0):=x; yy(0):=1;
for i in 0..p−1 loop
  if xx(i)=0 then exit, end if;
  if xx(i)<0 then
    if abs(lutlnn(i))<=abs(2*xx(i)) then
    xx(i+1):=xx(i)-lutlnn(i); yy(i+1):=yy(i)*lutn(i);
    else xx(i+1):=xx(i); yy(i+1):=yy(i); end if;
  else
    if abs(lutlnp(i))<=abs(2*xx(i)) then
    xx(i+1):=xx(i)-lutlnp(i); yy(i+1):=yy(i)*lutp(i);
    else xx(i+1):=xx(i); yy(i+1):=yy(i); end if;
  end if;
end loop;

Example 7.10 For the sake of readability, the following example is treated in decimal. Table 7.4 exhibits the values of ln (1 + 2i) and ln (1 − 2i) for i = 0, 1, …, 15. This table is used to compute the successive values of x (i).

The problem at hand is to compute e0.65625 with precision 2−32. The first 16 steps are displayed in Table 7.5. The desired precision is reached through a final single step computing (7.54) for p = 16. The result is 1.92755045011, while the exact result rounded to the 11th decimal is e0.65625 = 1.92755045016. This makes a difference around 5.10−11 < 2−32.

Comment 7.5 The general exponentiation function xy may be computed as

image

which may be calculated by logarithm computation and multiplication. The particular cases of integer powers (squaring, cubing, etc.) may be calculated through customized multiplication schemes. Convergence methods, such as the one described above, are most often used to implement exponentiation algorithms. An important particular case is the square root (power image) for which a classical digit recurrence algorithm, very similar to division, can easily be adapted. Square rooting is reviewed in Section 7.4.

TABLE 7.4 Look-Up Table ln(1 ± 2−i)

image

TABLE 7.5 e0.65625

image

7.3.4 Trigonometric Functions—CORDIC Algorithms

CORDIC algorithms ([VOL1959], [ERC1987], [DEL1989], [VOL2000]) belong to the class of linear convergence methods. They are particularly well suited to evaluation of trigonometric functions, although they apply to a number of other functions such as the logarithm, exponential, or square root. Historically, the Coordinate Rotation Digital Computer had been designed as a special-purpose digital computer for airborne computation in real-time. The CORDIC techniques were then extended for solving trigonometric relationships involved in plane coordinate rotation and conversion from Cartesian to polar coordinate systems. The CORDIC algorithms converge linearly (one bit per cycle) and are rather simple to implement in practice.

Conceptually, the method consists of computing the new coordinates of a given vector after rotation by a given angle α; after rotation, the unit vector (0, 0) − (1, 0) would get its new end point coordinates as (cos α, sin α).

Let (0, 0) − (x1, y1) be the vector (0, 0) − (x0, y0) rotated by an angle α (Figure 7.1), the basic rotation formulas are given by

image

image

The CORDIC algorithm defines, as the auxiliary sequence, the residual rotation remaining to be completed after step i, namely,

image

If a0 is the desired rotation angle, the auxiliary sequence will be set to converge to zero while the main sequence converges toward the function to evaluate. The method replaces rotations as defined by formulas (7.56) and (7.57) by pseudorotations as illustrated in Figure 7.2. Pseudorotations expand coordinates by a factor of 1/cos αi = (1 + tan2 αi)1/2; (7.56) and (7.57) for step i are replaced by

image

and

image

Assuming x0 and y0 to be the initial coordinates x and y, after p rotations by angle αi, the recursive use of (7.59) and (7.60) leads to

image

and

image

On the other hand, assuming a to be the overall rotation angle, that is, the argument of the trigonometric function to be evaluated, then the auxiliary sequence is written

image

The set {αi} is selected in order to make (7.61) and (7.62) easy to calculate and, in particular, to be able to precompute the factor image. For this sake, {αi} is pre-set except for the sign; then, convergence will be achieved by a suitable strategy on sign selection. In the following, an algorithm is developed for computing sine and cosine functions.

Let the set {αi} be such that

image

Equations (7.59) and (7.60) can then be written

image

and

image

while (7.58) becomes

image

After p iterations, ap is assumed close enough to zero, and (7.63) image computed iteratively by (7.65) and (7.66), are introduced in (7.61) and (7.62), respectively, leading to

image

and

image

image

Figure 7.1 Basic rotation formulas.

image

Figure 7.2 Pseudorotations.

where the factor image can be precomputed since tan2αi is not dependent on the sign of αi. Then, setting the initial values x0 = x = 1/k and y0 = y = 0, the cosine and sine functions are given by xp and yp, respectively.

Precomputed values of tan−1 (si 2i) are stored in a look-up table. The size of the look-up table sets the precision p. Table 7.6 shows the first 25 values of tan−1 (si 2i) together with the fast converging values of image k = 1.646760258121 and x0 = x = 1/k = 0.607252935009. The strategy for selecting the sign si is trivial. Since each angle αi (Table 7.6) is smaller than half the preceding angle αi−1, the convergence may be achieved by selecting si = sign (ai−1). The domain of convergence is greater than [−90°, 90°], so any trigonometric value can be computed (with a prospective use of trigonometric relations for angle reduction). More precisely, for 25-step precision, the range is [−99.88°, 99.88°], ± the sum of the αi listed in Table 7.6. Algorithm 7.10 describes the CORDIC procedure for computing sin a and cos a. Without loss of generality, the decimal system has been used for a better readability.

Algorithm 7.10 Sine and Cosine CORDIC Procedure

Let lut(i) be αi = tan−1 (2i) read from the table.

x(0):=0.607252935009; y(0):=0; a(0):=a;
if a(0)=0 then x(0):=1; y(0):=0; exit; end if;
for i in 0..p-1 loop
   if a(i)=0 then exit; end if;
   if a(i)>0 then s(i):=1 else s(i):=-1; end if;
   x(i+1):=x(i)-(y(i)*s(i)*2**(-i));
   y(i+1):=y(i)+(x(i)*s(i)*2**(-i));
   a(i+1):=a(i)-(s(i)*lut(i));
   end loop;

TABLE 7.6 Look-Up Table for tan−1 (2−i)

image

In binary, the operations involved in one step of Algorithm 7.10 are: two i-bit shifts, one table look-up, and three signed additions. Computation of x(i + 1), y(i + 1) and a(i + 1) are independent; hardware implementations can thus profit from the potential parallelism. In this case, the elementary step delay is that of a table look-up (or a shift—whatever operation is slower) plus a signed addition. Algorithm 7.10 is illustrated in Example 7.11.

Example 7.11 Let a = 75°; compute the binary expression of cos a and sin a Table 7.6 displays the first steps of the algorithm. Since the binary system is at hand, the initial value of x, namely x(0) = 0.60725293500910, has to be expressed in binary, with the appropriate precision. In the example, 13 bits have been used (x(0) = 0.1001101101110) to set the procedure for a 12-bit precision. The results after 5 steps (Table 7.7) are cos 75° = x(5) = 0.0100001011011 and sin 75° = y(5) = 0.1111011011111. The first seven bits are correct but this precision is irrelevant because too few steps have been completed. The average convergence rate (1 bit per step) is reached after a sufficient number of steps, namely, whenever tan−1 2i is close enough to 2i; Table 7.6 shows that for i = 10, tan−1 2i may be set to 2i with an error inferior to 4.10−7.

TABLE 7.7 cos 75° sin 75°: First Five Computational Steps

image

The next decimal example runs 28 computational steps (Table 7.8). In this example, the results are cos 30° = x(28) = 0.8660254031 and sin 30° = y(28) = 0.5000000012. In both cases, the error is O(10−9), that is, O(2−28).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset