Chapter 1. Introduction

1–1 Notation

This book distinguishes between mathematical expressions of ordinary arithmetic and those that describe the operation of a computer. In “computer arithmetic,” operands are bit strings, or bit vectors, of some definite fixed length. Expressions in computer arithmetic are similar to those of ordinary arithmetic, but the variables denote the contents of computer registers. The value of a computer arithmetic expression is simply a string of bits with no particular interpretation. An operator, however, interprets its operands in some particular way. For example, a comparison operator might interpret its operands as signed binary integers or as unsigned binary integers; our computer arithmetic notation uses distinct symbols to make the type of comparison clear.

The main difference between computer arithmetic and ordinary arithmetic is that in computer arithmetic, the results of addition, subtraction, and multiplication are reduced modulo 2n, where n is the word size of the machine. Another difference is that computer arithmetic includes a large number of operations. In addition to the four basic arithmetic operations, computer arithmetic includes logical and, exclusive or, compare, shift left, and so on.

Unless specified otherwise, the word size is 32 bits, and signed integers are represented in two’s-complement form.

Expressions of computer arithmetic are written similarly to those of ordinary arithmetic, except that the variables that denote the contents of computer registers are in bold face type. This convention is commonly used in vector algebra. We regard a computer word as a vector of single bits. Constants also appear in bold-face type when they denote the contents of a computer register. (This has no analogy with vector algebra because in vector algebra the only way to write a constant is to display the vector’s components.) When a constant denotes part of an instruction, such as the immediate field of a shift instruction, light-face type is used.

If an operator such as “+” has bold face operands, then that operator denotes the computer’s addition operation (“vector addition”). If the operands are light-faced, then the operator denotes the ordinary scalar arithmetic operation. We use a light-faced variable x to denote the arithmetic value of a bold-faced variable x under an interpretation (signed or unsigned) that should be clear from the context. Thus, if x = 0x80000000 and y = 0x80000000, then, under signed integer interpretation, x = y = –231, x + y = – 232, and x + y = 0. Here, 0x80000000 is hexadecimal notation for a bit string consisting of a 1-bit followed by 31 0-bits.

Bits are numbered from the right, with the rightmost (least significant) bit being bit 0. The terms “bits,” “nibbles,” “bytes,” “halfwords,” “words,” and “doublewords” refer to lengths of 1, 4, 8, 16, 32, and 64 bits, respectively.

Short and simple sections of code are written in computer algebra, using its assignment operator (left arrow) and occasionally an if statement. In this role, computer algebra is serving as little more than a machine-independent way of writing assembly language code.

Programs too long or complex for computer algebra are written in the C programming language, as defined by the ISO 1999 standard.

A complete description of C would be out of place in this book, but Table 1–1 contains a brief summary of most of the elements of C [H&S] that are used herein. This is provided for the benefit of the reader who is familiar with some procedural programming language, but not with C. Table 1–1 also shows the operators of our computer-algebraic arithmetic language. Operators are listed from highest precedence (tightest binding) to lowest. In the Precedence column, L means left-associative; that is,

TABLE 1–1. EXPRESSIONS OF C AND COMPUTER ALGEBR

Image
Image
Image

a • b • c = (a • b) • c

and R means right-associative. Our computer-algebraic notation follows C in precedence and associativity.

In addition to the notations described in Table 1–1, those of Boolean algebra and of standard mathematics are used, with explanations where necessary.

Our computer algebra uses other functions in addition to “abs,” “rem,” and so on. These are defined where introduced.

In C, the expression x < y < z means to evaluate x < y to a 0/1-valued result, and then compare that result to z. In computer algebra, the expression x < y < z means (x < y) & (y < z).

C has three loop control statements: while, do, and for. The while statement is written:

while (expression) statement

First, expression is evaluated. If true (nonzero), statement is executed and control returns to evaluate expression again. If expression is false (0), the while-loop terminates.

The do statement is similar, except the test is at the bottom of the loop. It is written:

do statement while (expression)

First, statement is executed, and then expression is evaluated. If true, the process is repeated, and if false, the loop terminates.

The for statement is written:

for (e1; e2; e3) statement

First, e1, usually an assignment statement, is executed. Then e2, usually a comparison, is evaluated. If false, the for-loop terminates. If true, statement is executed. Finally, e3, usually an assignment statement, is executed, and control returns to evaluate e2 again. Thus, the familiar “do i = 1 to n” is written:

for (i = 1; i <= n; i++)

(This is one of the few contexts in which we use the postincrement operator.)

The ISO C standard does not specify whether right shifts (“>>” operator) of signed quantities are 0-propagating or sign-propagating. In the C code herein, it is assumed that if the left operand is signed, then a sign-propagating shift results (and if it is unsigned, then a 0-propagating shift results, following ISO). Most modern C compilers work this way.

It is assumed here that left shifts are “logical.” (Some machines, mostly older ones, provide an “arithmetic” left shift, in which the sign bit is retained.)

Another potential problem with shifts is that the ISO C standard specifies that if the shift amount is negative or is greater than or equal to the width of the left operand, the result is undefined. But, nearly all 32-bit machines treat shift amounts modulo 32 or 64. The code herein relies on one of these behaviors; an explanation is given when the distinction is important.

1–2 Instruction Set and Execution Time Model

To permit a rough comparison of algorithms, we imagine them being coded for a machine with an instruction set similar to that of today’s general purpose RISC computers, such as the IBM RS/6000, the Oracle SPARC, and the ARM architecture. The machine is three-address and has a fairly large number of general purpose registers—that is, 16 or more. Unless otherwise specified, the registers are 32 bits long. General register 0 contains a permanent 0, and the others can be used uniformly for any purpose.

In the interest of simplicity there are no “special purpose” registers, such as a condition register or a register to hold status bits, such as “overflow.” The machine has no floating-point instructions. Floating-point is only a minor topic in this book, being mostly confined to Chapter 17.

We recognize two varieties of RISC: a “basic RISC,” having the instructions shown in Table 1–2, and a “full RISC,” having all the instructions of the basic RISC, plus those shown in Table 1–3.

TABLE 1–2. BASIC RISC INSTRUCTION SET

Image
Image

TABLE 1–3. ADDITIONAL INSTRUCTIONS FOR THE “FULL RISC

Image
Image

In Tables 1–2, 1–3, and 1–4, RA and RB appearing as source operands really means the contents of those registers.

A real machine would have branch and link (for subroutine calls), branch to the address contained in a register (for subroutine returns and “switches”), and possibly some instructions for dealing with special purpose registers. It would, of course, have a number of privileged instructions and instructions for calling on supervisor services. It might also have floating-point instructions.

Some other computational instructions that a RISC computer might have are identified in Table 1–3. These are discussed in later chapters.

It is convenient to provide the machine’s assembler with a few “extended mnemonics.” These are like macros whose expansion is usually a single instruction. Some possibilities are shown in Table 1–4.

TABLE 1–4. EXTENDED MNEMONICS

Image

The load immediate instruction expands into one or two instructions, as required by the immediate value I. For example, if 0 ≤ I < 216, an or immediate (ori) from R0 can be used. If – 215I < 0, an add immediate (addi) from R0 can be used. If the rightmost 16 bits of I are 0, add immediate shifted (addis) can be used. Otherwise, two instructions are required, such as addis followed by ori. (Alternatively, in the last case, a load from memory could be used, but for execution time and space estimates we assume that two elementary arithmetic instructions are used.)

Of course, which instructions belong in the basic RISC and which belong in the full RISC is very much a matter of judgment. Quite possibly, divide unsigned and the remainder instructions should be moved to the full RISC category. Conversely, possibly load byte signed should be in the basic RISC category. It is in the full RISC set because it is probably of rather low frequency of use, and because in some technologies it is difficult to propagate a sign bit through so many positions and still make cycle time.

The distinction between basic and full RISC involves many other such questionable judgments, but we won’t dwell on them.

The instructions are limited to two source registers and one target, which simplifies the computer (e.g., the register file requires no more than two read ports and one write port). It also simplifies an optimizing compiler, because the compiler does not need to deal with instructions that have multiple targets. The price paid for this is that a program that wants both the quotient and remainder of two numbers (not uncommon) must execute two instructions (divide and remainder). The usual machine division algorithm produces the remainder as a by-product, so many machines make them both available as a result of one execution of divide. Similar remarks apply to obtaining the doubleword product of two words.

The conditional move instructions (e.g., moveq) ostensibly have only two source operands, but in a sense they have three. Because the result of the instruction depends on the values in RT, RA, and RB, a machine that executes instructions out of order must treat RT in these instructions as both a use and a set. That is, an instruction that sets RT, followed by a conditional move that sets RT, must be executed in that order, and the result of the first instruction cannot be discarded. Thus, the designer of such a machine may elect to omit the conditional move instructions to avoid having to consider an instruction with (logically) three source operands. On the other hand, the conditional move instructions do save branches.

Instruction formats are not relevant to the purposes of this book, but the full RISC instruction set described above, with floating-point and a few supervisory instructions added, can be implemented with 32-bit instructions on a machine with 32 general purpose registers (5-bit register fields). By reducing the immediate fields of compare, load, store, and trap instructions to 14 bits, the same holds for a machine with 64 general purpose registers (6-bit register fields).

Execution Time

We assume that all instructions execute in one cycle, except for the multiply, divide, and remainder instructions, for which we do not assume any particular execution time. Branches take one cycle whether they branch or fall through.

The load immediate instruction is counted as one or two cycles, depending on whether one or two elementary arithmetic instructions are required to generate the constant in a register.

Although load and store instructions are not often used in this book, we assume they take one cycle and ignore any load delay (time lapse between when a load instruction completes in the arithmetic unit and when the requested data is available for a subsequent instruction).

However, knowing the number of cycles used by all the arithmetic and logical instructions is often insufficient for estimating the execution time of a program. Execution can be slowed substantially by load delays and by delays in fetching instructions. These delays, although very important and increasing in importance, are not discussed in this book. Another factor, one that improves execution time, is what is called “instruction-level parallelism,” which is found in many contemporary RISC chips, particularly those for “high-end” machines.

These machines have multiple execution units and sufficient instruction-dispatching capability to execute instructions in parallel when they are independent (that is, when neither uses a result of the other, and they don’t both set the same register or status bit). Because this capability is now quite common, the presence of independent operations is often pointed out in this book. Thus, we might say that such and such a formula can be coded in such a way that it requires eight instructions and executes in five cycles on a machine with unlimited instruction-level parallelism. This means that if the instructions are arranged in the proper order (“scheduled”), a machine with a sufficient number of adders, shifters, logical units, and registers can, in principle, execute the code in five cycles.

We do not make too much of this, because machines differ greatly in their instruction-level parallelism capabilities. For example, an IBM RS/6000 processor from ca. 1992 has a three-input adder and can execute two consecutive add-type instructions in parallel even when one feeds the other (e.g., an add feeding a compare, or the base register of a load). As a contrary example, consider a simple computer, possibly for low-cost embedded applications, that has only one read port on its register file. Normally, this machine would take an extra cycle to do a second read of the register file for an instruction that has two register input operands. However, suppose it has a bypass so that if an instruction feeds an operand of the immediately following instruction, then that operand is available without reading the register file. On such a machine, it is actually advantageous if each instruction feeds the next—that is, if the code has no parallelism.

Exercises

1. Express the loop

for (e1; e2; e3) statement

in terms of a while loop.

Can it be expressed as a do loop?

2. Code a loop in C in which the unsigned integer control variable i takes on all values from 0 to and including the maximum unsigned number, 0xFFFFFFFF (on a 32-bit machine).

3. For the more experienced reader: The instructions of the basic and full RISCs defined in this book can be executed with at most two register reads and one write. What are some common or plausible RISC instructions that either need more source operands or need to do more than one register write?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset