16.1 FLOATING-POINT SYSTEM DEFINITION

Assume that a set of real numbers x belonging to the interval

image

is represented in such a way that the following specifications are satisfied:

d1 is the maximum distance between small exactly-represented non zero numbers;

d2 is the maximum distance between large exactly-represented numbers;

xmin is the maximum distance between 0 and the smallest exactly-represented numbers:

where the adjectives small and large refer to the absolute value of the corresponding numbers.

Every number x will be represented in the form ±s.be, with b ≥ 2, s being the significand and e the exponent.

In order to make the implementation of the arithmetic operations easier (Section 16.2), the two following conditions must be satisfied:

  1. The significand s is represented in base B = b.
  2. The significand belongs to the interval

image

Thus x is expressed in the form

image

The values of p, emin, and emax are chosen in such a way that

image

image

image

image

Example 16.1 Define a floating-point representation system where

image

Choose B = 2. A straightforward solution of the system (16.2)–(16.5) is

image

The smallest nonzero exactly-represented positive number is 2−30; the distance between small exactly-represented numbers is

image

the largest exactly-represented positive number is

image

the distance between large exactly-represented numbers is

image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset