Chapter 6. Controlling Execution Flow

When the processor runs your programs, it is unlikely that it will start with the first instruction and proceed sequentially through all the instructions in the program until the last instruction. Instead, your program will most likely use branches and loops to perform the necessary logic to implement the functions it needs.

Similar to high-level languages, assembly language provides instructions to help the programmer code logic into applications. By jumping to different sections of the program, or looping through sections multiple times, you can alter the way the program handles data.

This chapter describes the different assembly language instructions used to do jumps and loops. Because both of these functions manipulate the instruction pointer, the first section provides a brief refresher on how the instruction pointer is used to keep track of the next instruction to process, and what instructions can alter the instruction pointer. The next section discusses unconditional branches and demonstrates how they are used in assembly language programs. After that, conditional branches are presented, showing how they can be used to implement logic functions in the application. The next two sections describe loops, special instructions that enable the program to loop through data for a predetermined number of times. Finally, you will learn some tips for optimizing programs that utilize jumps and loops.

The Instruction Pointer

Before diving into the details of changing the course of the program, it is a good idea to first understand how the program is executed on the processor. The instruction pointer is the traffic cop for the processor. It determines which instruction in the program is the next in line to be executed. It proceeds in a sequential manner through the instruction codes programmed in the application.

However, as described in Chapter 2, "The IA-32 Platform," it is not always an easy task to determine when and where the next instruction is. With the invention of the instruction prefetch cache, many instructions are preloaded into the processor cache before they are actually ready to be executed. With the invention of the out-of-order engine, many instructions are even executed ahead of time in the application, and the results are placed in order as required by the application by the retirement unit.

With all of this confusion, it can be difficult to determine what exactly is the "next instruction." While there is a lot of work going on behind the scenes to speed up execution of the program, the processor still needs to sequentially step through the program logic to produce the proper results. Within that framework, the instruction pointer is crucial to determining where things are in the program. This is shown in Figure 6-1.

Figure 6-1

Figure 6.1. Figure 6-1

An instruction is considered executed when the processor retirement unit executes the result from the out-of-order engine from the instruction. After that instruction is executed, the instruction pointer is incremented to the next instruction in the program code. That instruction may or may not have already been performed by the out-of-order engine, but either way, the retirement unit does not process its results until it is time to do so in the program logic.

As the instruction pointer moves through the program instructions, the EIP register is incremented. Remember, instructions can be multiple bytes in length, so pointing to the next instruction is more than just incrementing the instruction pointer by one each time.

Your program cannot directly modify the instruction pointer. You do not have the capability to directly change the value of the EIP register to point to a different location in memory with a MOV instruction. You can, however, utilize instructions that alter the value of the instruction pointer. These instructions are called branches.

A branch instruction can alter the value of the EIP register either unconditionally (unconditional branches), or based on a value of a condition (conditional branches). The following sections describe unconditional and conditional branches and show how they affect the instruction pointer and the course of the program logic.

Unconditional Branches

When an unconditional branch is encountered in the program, the instruction pointer is automatically routed to a different location. You can use three types of unconditional branches:

  • Jumps

  • Calls

  • Interrupts

Each of these unconditional branches behaves differently within the program, and it is up to you to decide which one to use within the program logic. The following sections describe the differences between each of these types of unconditional branches and how to implement them in your assembly language programs.

Jumps

The jump is the most basic type of branch used in assembly language programming. If you are familiar with the BASIC programming language, you have most likely seen GOTO statements. Jump statements are the assembly language equivalent of the BASIC GOTO statement.

In structured programming, GOTO statements are considered to be a sign of bad coding. Programs are supposed to be compartmentalized and flow in a sequential manner, calling functions instead of jumping around the program code. In assembly language programs, jump instructions are not considered bad programming, and in fact are required to implement many functions. However, they can have a detrimental impact on your program's performance (see "Optimizing Branch Instructions" later in this chapter).

The jump instruction uses a single instruction code:

jmp location

where location is the memory address to jump to. In assembly language, the location value is declared as a label within the program code. When the jump occurs, the instruction pointer is changed to the memory address of the instruction code located immediately after the label. This is shown in Figure 6-2.

Figure 6-2

Figure 6.2. Figure 6-2

The JMP instruction changes the instruction pointer value to the memory location specified in the JMP instruction.

Behind the scenes, the single assembly jump instruction is assembled into one of three different types of jump opcodes:

  • Short jump

  • Near jump

  • Far jump

The three jump types are determined by the distance between the current instruction's memory location and the memory location of the destination point (the "jump to" location). Depending on the number of bytes jumped, the different jump types are used. A short jump is used when the jump offset is less than 128 bytes. A far jump is used in segmented memory models when the jump goes to an instruction in another segment. The near jump is used for all other jumps.

When using the assembly language mnemonic instruction, you do not need to worry about the length of the jump. A single jump instruction is used to jump anywhere in the program code (although there may be a performance difference, as described later in the "Optimizing Branch Instructions" section).

The following jumptest.s program demonstrates the operation of an unconditional jump instruction:

# jumptest.s – An example of the jmp instruction
.section .text
.globl _start
_start:
   nop
   movl $1, %eax
   jmp overhere
   movl $10, %ebx
   int $0x80
overhere:
   movl $20, %ebx
   int $0x80

The jumptest.s program simply assigns the value 1 to the EAX register for the exit Linux system call. Then, a jump instruction is used to jump over the part where the value 10 is assigned to the EBX register, and the Linux system call is made. Instead, the program jumps to where the value 20 is assigned to the EBX register, and the Linux system call is made. You can ensure that the jump occurred by running the program and checking the result code generated in Linux:

$ as -o jumptest.o jumptest.s
$ ld -o jumptest jumptest.o
$ ./jumptest
$ echo $?
20
$

Indeed, the expected result code was produced due to the jump. That in itself might not be too exciting. Instead, you can watch the actual memory locations used in the program with the debugger and the objdump programs.

First, you can see how the instruction codes are arranged in memory by using the objdump program to disassemble the assembled code:

$ objdump -D jumptest

jumptest:     file format elf32-i386

Disassembly of section .text:

08048074 <_start>:
 8048074:       90                      nop
 8048075:       b8 01 00 00 00          mov    $0x1,%eax
 804807a:       eb 07                   jmp    8048083 <overhere>
 804807c:       bb 0a 00 00 00          mov    $0xa,%ebx
 8048081:       cd 80                   int    $0x80

08048083 <overhere>:
 8048083:       bb 14 00 00 00          mov    $0x14,%ebx
 8048088:       cd 80                   int    $0x80
$

The disassembler output shows the memory locations that will be used by each instruction (the value shown in the first column). Now, you can run the jumptest program in the debugger and watch what happens:

$ as -gstabs -o jumptest.o jumptest.s
$ ld -o jumptest jumptest.o
$ gdb -q jumptest
(gdb) break *_start+1
Breakpoint 1 at 0x8048075: file jumptest.s, line 5.
(gdb) run
Starting program: /home/rich/palp/chap06/jumptest

Breakpoint 1, _start () at jumptest.s:5
5          movl $1, %eax
Current language:  auto; currently asm
(gdb) print/x $eip
$1 = 0x8048075
(gdb)

After assembling the code for the debugger (using the -gstabs parameter), and setting the breakpoint at the start of the program, run the program and view the first memory location used (shown in the EIP register). The value is 0x8048075, which corresponds to the same memory location shown in the objdump output. Next, step through the debugger until the jump instruction has been executed, and then display the EIP register value again:

(gdb) step
_start () at jumptest.s:6
6          jmp overhere
(gdb) step
overhere () at jumptest.s:10
10         movl $20, %ebx
(gdb) print $eip
$2 = (void *) 0x8048083
(gdb)

As expected, the program jumped to the 0x8048083 location, which is exactly where the overhere label pointed to, as shown in the objdump output.

Calls

The next type of unconditional branch is the call. A call is similar to the jump instruction, but it remembers where it jumped from and has the capability to return there if needed. This is used when implementing functions in assembly language programs.

Functions enable you to write compartmentalized code; that is, you can separate different functions into different sections of the text. If multiple areas of the program use the same functions, the same code does not need to be written multiple times. The single function can be referenced using call statements.

The call instruction has two parts. The first part is the actual CALL instruction, which requires a single operand, the address of the location to jump to:

call address

The address operand refers to a label in the program, which is converted to the memory address of the first instruction in the function.

The second part of the call instruction is the return instruction. This enables the function to return to the original part of the code, immediately after the CALL instruction. The return instruction has no operands, just the mnemonic RET. It knows where to return to by looking at the stack. This is demonstrated in Figure 6-3.

Figure 6-3

Figure 6.3. Figure 6-3

When the CALL instruction is executed, it places the EIP register onto the stack and then modifies the EIP register to point to the called function address. When the called function is completed, it retrieves the old EIP register value from the stack and returns control back to the original program.

How functions return to the main program is possibly the most confusing part of using functions in assembly language. It is not as simple as just using the RET instruction at the end of a function. Instead, it relates to how information is passed to functions and how functions read and store that information.

This is all done using the stack. As mentioned in Chapter 5, "Moving Data," not only can the data on the stack be referenced using PUSH and POP instructions, it can also be directly referenced using the ESP register, which points to the last entry in the stack. Functions normally copy the ESP register to the EBP register, and then use the EBP register value to retrieve information passed on the stack before the CALL instruction, and to place variables on the stack for local data storage (see Chapter 11, "Using Functions"). This complicates how the stack pointer is manipulated within the function.

This is a somewhat simplified explanation of how functions use the stack. Chapter 11, "Using Functions," describes this in great detail, explaining and demonstrating how functions use the stack for data storage.

The return address is added to the stack when the CALL instruction is executed. When the called function begins, it must store the ESP register somewhere it can be restored to its original form before the RET instruction attempts to return to the calling location. Because the stack may also be manipulated within the function, the EBP register is often used as a base pointer to the stack. Thus, the ESP register is usually copied to the EBP register at the beginning of the function as well.

While this seems confusing, it is not too difficult if you create a standard template to use for all of your function calls. The form to remember for functions is as follows:

function_label:
    pushl %ebp
    movl %esp, %ebp
    < normal function code goes here>
    movl %ebp, %esp
    popl %ebp
    ret

Once the EBP register has been saved, it can be used as the base pointer to the stack for all access to the stack within the function. Before returning to the calling program, the ESP register must be restored to the location pointing to the calling memory location.

A simple call example is demonstrated in the calltest.s program:

# calltest.s - An example of using the CALL instruction
.section .data
output:
   .asciz "This is section %d
"
.section .text
.globl _start
_start:
   pushl $1
   pushl $output
   call printf
   add  $8, %esp           # should clear up stack
   call overhere
   pushl $3
   pushl $output
   call printf
   add  $8, %esp           # should clear up stack
   pushl $0
   call exit
overhere:
   pushl %ebp
movl %esp, %ebp
   pushl $2
   pushl $output
   call printf
   add  $8, %esp           # should clear up stack
   movl %ebp, %esp
   popl %ebp
   ret

The calltest.s program begins by using the printf C function to display the first text line, showing where the program is. Next, the CALL instruction is used to transfer control to the overhere label. At the overhere label, the ESP register is copied to the EBP pointer so it can be restored at the end of the function. The printf function is used again to display a second line of text, and then the ESP and EBP registers are restored.

Control of the program returns to the instruction immediately after the CALL instruction, and the third text line is displayed, again using the printf function. The output should look like this:

$ ./calltest
This is section 1
This is section 2
This is section 3
$

Be careful when calling external functions from your application. There is no guarantee that the function will return your registers in the same way that you left them before the call. This is discussed in more detail in Chapter 11.

Interrupts

The third type of unconditional branch is the interrupt. An interrupt is a way for the processor to "interrupt" the current instruction code path and switch to a different path. Interrupts come in two varieties:

  • Software interrupts

  • Hardware interrupts

Hardware devices generate hardware interrupts. They are used to signal events happening at the hardware level (such as when an I/O port receives an incoming signal). Programs generate software interrupts. They are a signal to hand off control to another program.

When a program is called by an interrupt, the calling program is put on hold, and the called program takes over. The instruction pointer is transferred to the called program, and execution continues from within the called program. When the called program is complete, it can return control back to the calling program (using an interrupt return instruction).

Software interrupts are provided by the operating system to enable applications to tap into functions within the operating system, and in some cases even the underlying BIOS system. In the Microsoft DOS operating system, many functions are provided with the 0x21 software interrupt. In the Linux world, the 0x80 interrupt is used to provide low-level kernel functions.

You have already seen several examples of using the software interrupt in many of the example programs presented so far. Simply using the INT instruction with the 0x80 value transfers control to the Linux system call program. The Linux system call program has many subfunctions that can be used. The subfunctions are performed based on the value of the EAX register at the time of the interrupt. For example, placing the value of 1 in the EAX register before the interrupt calls the exit Linux system call function.

Chapter 12, "Using Linux System Calls," describes all of the functions available with the 0x80 interrupt.

When debugging an application that contains software interrupts, it is difficult to see what is happening within the interrupt section, as the debugging information is not compiled into the functions. You may have noticed when running the debugger that it performs the interrupt instruction but then immediately returns back to the normal program.

Conditional Branches

Unlike unconditional branches, conditional branches are not always taken. The result of the conditional branch depends on the state of the EFLAGS register at the time the branch is executed.

There are many bits in the EFLAGS register, but the conditional branches are only concerned with five of them:

  • Carry flag (CF) - bit 0 (lease significant bit)

  • Overflow flag (OF) - bit 11

  • Parity flag (PF) - bit 2

  • Sign flag (SF) - bit 7

  • Zero flag (ZF) - bit 6

Each conditional jump instruction examines specific flag bits to determine whether the condition is proper for the jump to occur. With five different flag bits, several jump combinations can be performed. The following sections describe the individual jump instructions.

Conditional jump instructions

The conditional jumps determine whether or not to jump based on the current value of the EFLAGS register. Several different conditional jump instructions use different bits of the EFLAGS register. The format of the conditional jump instruction is

jxx address

where xx is a one- to three-character code for the condition, and address is the location within the program to jump to (usually denoted by a label). The following table describes all of the conditional jump instructions available.

Instruction

Description

EFLAGS

JA

Jump if above

CF=0 and ZF=0

JAE

Jump if above or equal

CF=0

JB

Jump if below

CF=1

JBE

Jump if below or equal

CF=1 or ZF=1

JC

Jump if carry

CF=1

JCXZ

Jump if CX register is 0

 

JECXZ

Jump if ECX register is 0

 

JE

Jump if equal

ZF=1

JG

Jump if greater

ZF=0 and SF=OF

JGE

Jump if greater or equal

SF=OF

JL

Jump if less

SF<>OF

JLE

Jump if less or equal

ZF=1 or SF<>OF

JNA

Jump if not above

CF=1 or ZF=1

JNAE

Jump if not above or equal

CF=1

JNB

Jump if not below

CF=0

JNBE

Jump if not below or equal

CF=0 and ZF=0

JNC

Jump if not carry

CF=0

JNE

Jump if not equal

ZF=0

JNG

Jump if not greater

ZF=1 or SF<>OF

JNGE

Jump if not greater or equal

SF<>OF

JNL

Jump if not less

SF=OF

JNLE

Jump if not less or equal

ZF=0 and SF=OF

JNO

Jump if not overflow

OF=0

JNP

Jump if not parity

PF=0

JNS

Jump if not sign

SF=0

JNZ

Jump if not zero

ZF=0

JO

Jump if overflow

OF=1

JP

Jump if parity

PF=1

JPE

Jump if parity even

PF=1

JPO

Jump if parity odd

PF=0

JS

Jump if sign

SF=1

JZ

Jump if zero

ZF=1

You may notice that many of the conditional jump instructions seem redundant (such as JA for jump if above and JG jump if greater). The difference is when working with signed and unsigned values. The jump instructions using the above and lower keywords are used for evaluating unsigned integer values. You will learn more about the different types of integers in Chapter 7, "Using Numbers."

The conditional jump instructions take a single operand in the instruction code—the address to jump to. While usually a label in an assembly language program, the operand is converted into an offset address in the instruction code. Two types of jumps are allowed for conditional jumps:

  • Short jumps

  • Near jumps

A short jump uses an 8-bit signed address offset, whereas a near jump uses either a 16-bit or 32-bit signed address offset. The offset value is added to the instruction pointer.

Conditional jump instructions do not support far jumps in the segmented memory model. If you are programming in the segmented memory model, you must use programming logic to determine whether the condition exists, and then implement an unconditional jump to the instructions in the separate segment.

To be able to use a conditional jump, you must have an operation that sets the EFLAGS register before the conditional jump. The next section shows some examples of using conditional jumps in assembly language programs.

The compare instruction

The compare instruction is the most common way to evaluate two values for a conditional jump. The compare instruction does just what its name says, it compares two values and sets the EFLAGS registers accordingly.

The format of the CMP instruction is as follows:

cmp operand1, operand2

The CMP instruction compares the second operand with the first operand. It performs a subtraction operation on the two operands behind the scenes (operand2operand1). Neither of the operands is modified, but the EFLAGS register is set as if the subtraction took place.

When using the GNU assembler, remember that operand1 and operand2 are the reverse from what is described in the Intel documentation for the CMP instruction. This little feature has caused many hours of debugging nightmares for many an assembly language programmer.

The following cmptest.s program is shown as a quick demonstration of how the compare and conditional jump instructions work together:

# cmptest.s - An example of using the CMP and JGE instructions
.section .text
.globl _start
_start:
   nop
   movl $15, %eax
   movl $10, %ebx
   cmp %eax, %ebx
   jge greater
   movl $1, %eax
   int $0x80
greater:
   movl $20, %ebx
   movl $1, %eax
   int $0x80

The cmptest.s program first assigns to immediate data values: the value 15 to the EAX register, and the value 10 to the EBX register. Next, the CMP instruction is used to compare the two registers, and the JGE instruction is used to branch, depending on the values:

cmp %eax, %ebx
jge greater

Because the value of the EBX register is less than the value of EAX, the conditional branch is not taken. The instruction pointer moves on to the next instruction, which moves the immediate value of 1 to the EAX register, and then calls the Linux exit system call. You can test this by running the program and displaying the result code:

$ ./cmptest
$ echo $?
10
$

Indeed, the branch was not taken, and the value in the EBX register remained at 10.

The preceding example compared the values of two registers. Following are some other examples of the CMP instruction:

cmp $20, %ebx     ; compare EBX with the immediate value 20
cmp data, %ebx   ; compare EBX with the value in the data memory location
cmp (%edi), %ebx ; compare EBX with the value referenced by the EDI pointer

Examples of using the flag bits

Trying to code conditional jump instructions can be tricky. It helps to have an understanding of each of the flag bits that must be present in order for the different conditions to be met. The following sections demonstrate how each of the flag bits affect the conditional jumps, so you can get a feeling for what to look for when coding your programming logic.

Using the Zero flag

The Zero flag is the easiest thing to check when doing conditional jumps. The JE and JZ instructions both branch if the Zero flag is set (the two operands are equal). The Zero flag can be set by either a CMP instruction or a mathematical instruction that evaluates to Zero, as shown in the following example:

movl $30, %eax
subl $30, %eax
jz overthere

The JZ instruction would be performed, as the result of the SUB instruction would be Zero (the SUB instruction is covered in Chapter 8, "Basic Math Functions").

You can also use the Zero flag when decreasing the value of a register to determine whether it has reached Zero:

movl $10, %edi
loop1:
   < other code instructions>
   dec %edi
   jz out
   jmp loop1
out:

This code snippet uses the EDI register as an indexed counter that counts back from 10 to 1 (when it reaches zero, the JZ instruction will exit the loop).

Using the overflow flag

The overflow flag is used specifically when working with signed numbers (see Chapter 7). It is set when a signed value is too large for the data element containing it. This usually happens during arithmetic operations that overflow the size of the register holding the data, as shown in the following example:

movl $1, %eax      ; move 1 to the EAX register
   movb $0x7f, %bl    ; move the signed value 127 to the 8-bit BL register
   addb $10, %bl      ; Add 10 to the BL register
   jo overhere
   int $0x80          ; call the Linux system call
overhere:
   movl $0, %ebx      ; move 0 to the EBX register
   int $0x80          ; call the Linux system call

This code snippet adds 10 to a signed byte value of 127. The result would be 137, which is a valid value in the byte, but not within a signed byte number (which can only use −127 to 127). Because the signed value is not valid, the overflow flag is set, and the JO instruction is implemented.

Using the parity flag

The parity flag indicates the number of bits that should be one in a mathematical answer. This can be used as a crude error-checking system to ensure that the mathematical operation was successful.

If the number of bits set to one in the resultant is even, the parity bit is set (one). If the number of bits set to one in the resultant is odd, the parity bit is not set (zero).

To test this concept, you can create the paritytest.s program:

# paritytest.s - An example of testing the parity flag
.section .text
.globl _start
_start:
   movl $1, %eax
   movl $4, %ebx
   subl $3,  %ebx
   jp overhere
   int $0x80
overhere:
   movl $100, %ebx
   int $0x80

In this snippet, the result from the subtraction is 1, which in binary is 00000001. Because the number of one bits is odd, the parity flag is not set, so the JP instruction should not branch; instead, the program should exit with a result code of the result of the subtraction, 1:

$ ./paritytest
$ echo $?
1
$

To test the opposite, change the SUB instruction line to create a result with an even number of one bits:

subl $1, %ebx

The result of this subtraction will be 3, which in binary is 00000011. Because this is an even number of one bits, the parity flag will be set, and the JP instruction should branch to the overhere label, setting the result code to 100:

$ ./paritytest
$ echo $?
100
$

Yes, it worked as expected!

Using the sign flag

The sign flag is used in signed numbers to indicate a sign change in the value contained in the register. In a signed number, the last (highest order) bit is used as the sign bit. It indicates whether the numeric representation is negative (set to 1) or positive (set to 0).

This is handy when counting within a loop and watching for zero. You saw with the Zero flag that the flag was set when the value that was being decreased reached zero. However, if you are dealing with arrays, most likely you also need to stop after the zero value, not at the zero value (because the first offset is 0).

Using the sign flag, you can tell when a value has passed from 0 to −1, as shown in the following signtest.s program:

# signtest.s - An example of using the sign flag
.section .data
value:
   .int 21, 15, 34, 11, 6, 50, 32, 80, 10, 2
output:
   .asciz "The value is: %d
"
.section .text
.globl _start
_start:
   movl $9, %edi
loop:
   pushl value(, %edi, 4)
   pushl $output
   call printf
   add $8, $esp
   dec %edi
   jns loop
   movl $1, %eax
   movl $0, %ebx
   int $0x80

The signtest.s program walks backward through a data array using the EDI register as an index, decreasing it for each array element. The JNS instruction is used to detect when the value of the EDI register becomes negative, and loops back to the beginning if not.

Because the signtest.s program uses the printf C function, remember to link it with the dynamic loader (as described in Chapter 4, "A Sample Assembly Language Program"). The output from the program should look like this:

$ ./signtest
The value is: 2
The value is: 10
The value is: 80
The value is: 32
The value is: 50
The value is: 6
The value is: 11
The value is: 34
The value is: 15
The value is: 21
$

Using the carry flag

The carry flag is used in mathematical expressions to indicate when an overflow has occurred in an unsigned number (remember that signed numbers use the overflow flag). The carry flag is set when an instruction causes a register to go beyond its data size limit.

Unlike the overflow flag, the DEC and INC instructions do not affect the carry flag. For example, this code snippet will not set the carry flag:

movl $0xffffffff, %ebx
inc %ebx
jc overflow

However, this code snippet will set the carry flag, and the JC instruction will jump to the overflow location:

movl $0xffffffff, %ebx
addl $1,  %ebx
jc overflow

The carry flag will also be set when an unsigned value is less than zero. For example, this code snippet will also set the carry flag:

movl $2, %eax
subl $4, %eax
jc overflow

The resulting value in the EAX register is 254, which represents −2 as a signed number, the correct answer. This means that the overflow flag would not be set. However, because the answer is below zero for an unsigned number, the carry flag is set.

Unlike the other flags, there are instructions that can specifically modify the carry flag. These are described in the following table.

Instruction

Description

CLC

Clear the carry flag (set it to zero)

CMC

Complement the carry flag (change it to the opposite of what is set)

STC

Set the carry flag (set it to one)

Each of these instructions directly modifies the carry flag bit in the EFLAGS register.

Loops

Loops are another way of altering the instruction path within the program. Loops enable you to code repetitive tasks with a single loop function. The loop operations are performed repeatedly until a specific condition is met.

The following sections describe the different loop instructions available for you to use and show an example of using loops within an assembly language program.

The loop instructions

In the signtest.s example program shown in the section "Using the sign flag," you created a loop by using a jump instruction and decreasing a register value. The IA-32 platform provides a simpler mechanism for performing loops in assembly language programs: the loop instruction family.

The loop instructions use the ECX register as a counter and automatically decrease its value as the loop instruction is executed. The following table describes the instructions in the loop family.

Instruction

Description

LOOP

Loop until the ECX register is zero

LOOPE/LOOPZ

Loop until either the ECX register is zero, or the ZF flag is not set

LOOPNE/LOOPNZ

Loop until either the ECX register is zero, or the ZF flag is set

The LOOPE/LOOPZ and LOOPNE/LOOPNZ instructions provide the additional benefit of monitoring the Zero flag.

The format for each of these instructions is

loop address

where address is a label name for a location in the program code to jump to. Unfortunately, the loop instructions support only an 8-bit offset, so only short jumps can be performed.

Before the loop starts, you must set the value for the number of iterations to perform in the ECX register. This usually looks something like the following:

< code before the loop >
   movl $100, %ecx
loop1:
   < code to loop through >
   loop loop1
   < code after the loop >

Be careful with the code inside the loop. If the ECX register is modified, it will affect the operation of the loop. Use extra caution when implementing function calls within the loop, as functions can easily trash the value of the ECX register without you knowing it.

An added benefit of the loop instructions is that they decrease the value of the ECX register without affecting the EFLAGS register flag bits. When the ECX register reaches zero, the Zero flag is not set.

A loop example

As a simple example to demonstrate how the LOOP instruction works, here's the loop.s program:

# loop.s - An example of the loop instruction
.section .data
output:
   .asciz "The value is: %d
"
.section .text
.globl _start
_start:
   movl $100, %ecx
   movl $0, %eax
loop1:
   addl %ecx, %eax
   loop loop1
   pushl %eax
   pushl $output
   call printf
   add $8, %esp
   movl $1, %eax
   movl $0, %ebx
   int $0x80

The loop.s program computes the arithmetic series of the number stored in the ECX register and then displays it on the console (using the printf function, so remember to link with the C library and the dynamic linker). The LOOP instruction is used to continually loop through the ADD function until the value of ECX is zero.

Preventing LOOP catastrophes

There is one common problem with the LOOP instruction that sometimes bites assembly language programmers. What if you used the loop.s example program and set the ECX register to zero? Try it and see what happens. Here's the output when I tried it:

$ ./loop
The value is: −2147483648
$

That is obviously not the correct answer. So what happened? The answer lies in the way the LOOP instruction behaves. When the LOOP instruction is executed, it first decreases the value in ECX by one, and then it checks to see whether it is zero. Using this logic, if the value of ECX is already zero before the LOOP instruction, it will be decreased by one, making it −1. Because this value is not zero, the LOOP instruction continues on its way, looping back to the defined label. The loop will eventually exit when the register overflows, and the incorrect value is displayed.

To correct this problem, you need to check for the special condition when the ECX register contains a zero value. Fortunately, Intel has provided a special instruction just for that purpose. If you remember from the "Conditional Branches" section earlier, the JCXZ instruction is used to perform a conditional branch if the ECX register is zero. This is exactly what we need to solve this problem.

The betterloop.s program uses the JCXZ instruction to provide some rudimentary error-checking for the application:

# betterloop.s - An example of the loop and jcxz instructions
.section .data
output:
   .asciz "The value is: %d
"
.section .text
.globl _start
_start:
   movl $0, %ecx
movl $0, %eax
   jcxz done
loop1:
   addl %ecx, %eax
   loop loop1
done:
   pushl %eax
   pushl $output
   call printf
   movl $1, %eax
   movl $0, %ebx
   int $0x80

The betterloop.s program adds a single instruction, the JCXZ instruction, before the loop starts, and a single label to reference the ending instruction codes. Now if the ECX register contains a zero value, the JCXZ instruction catches it, and immediately goes to the output section. Running the program demonstrates that indeed it solves the problem:

$ ./betterloop
The value is: 0
$

Duplicating High-Level Conditional Branches

If you program in C, C++, Java, or any other high-level language, you probably use a lot of conditional statements that look completely different from the assembly language ones. You can mimic the high-level language functions using the assembly language code you learned in this chapter.

The easiest way to learn how to code high-level functions in assembly language is to see how the assembler does it. The following sections walk through disassembling C language functions to show how they are performed using assembly language.

if statements

The most common conditional statement used in high-level languages is the if statement. The following program, ifthen.c, demonstrates how this is commonly used in C programs:

/* ifthen.c – A sample C if-then program */
#include <stdio.h>

int main()
{
   int a = 100;
   int b = 25;
   if (a > b)
   {
      printf("The higher value is %d
", a);
   } else
      printf("The higher value is %d
", b);
return 0;
}

Because the purpose of this exercise is to see how the code will be converted to assembly language, the actual C program is pretty trivial—just a simple comparison of two known values. You can view the generated assembly language code by using the -S parameter of the GNU compiler:

$ gcc -S ifthen.c
$ cat ifthen.s
        .file   "ifthen.c"
        .section        .rodata
.LC0:
        .string "The higher value is %d
"
        .text
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $24, %esp
        andl    $-16, %esp
        movl    $0, %eax
        subl    %eax, %esp
        movl    $100, −4(%ebp)
        movl    $25, −8(%ebp)
        movl    −4(%ebp), %eax
        cmpl    −8(%ebp), %eax
        jle     .L2
        movl    −4(%ebp), %eax
        movl    %eax, 4(%esp)
        movl    $.LC0, (%esp)
        call    printf
        jmp     .L3
.L2:
movl    −8(%ebp), %eax
        movl    %eax, 4(%esp)
        movl    $.LC0, (%esp)
        call    printf
.L3:
        movl    $0, (%esp)
        call    exit
        .size   main, .-main
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.3.2 (Debian)"
$

That's a lot of assembly code for a simple C function! Now you can see why I wanted to keep the C code simple. Now we can walk through the code step by step to see what it is doing.

The first section of the code:

pushl   %ebp
movl    %esp, %ebp
subl    $24, %esp
andl    $-16, %esp
movl    $0, %eax
subl    %eax, %esp

stores the EBP register so it can be used as a pointer to the local stack area in the program. The stack pointer, ESP, is then manually manipulated to make room for putting local variables on the stack.

The next section of the code creates the two variables used in the If statement:

movl    $100, −4(%ebp)
movl    $25, −8(%ebp)

The first instruction manually moves the value for the a variable into a location on the stack (4 bytes in front of the location pointed to by the EBP register). The second instruction manually moves the value for the b variable into the next location on the stack (8 bytes in front of the location pointed to by the EBP register). This technique, commonly used in functions, is discussed in Chapter 11. Now that both variables are stored on the stack, it's time to execute the if statement:

movl    −4(%ebp), %eax
cmpl    −8(%ebp), %eax
jle     .L2

First, the value for the a variable is moved to the EAX register, and then that value is compared to the value for the b variable, still in the local stack. Instead of looking for the if condition a > b, the assembly language code is looking for the opposite, a <= b. If the statement evaluates to "true," the jump to the .L2 label is made, which is the "else" part of the If statement:

.L2:
    movl    −8(%ebp), %eax
    movl    %eax, 4(%esp)
    movl    $.LC0, (%esp)
    call    printf

This is the code to print the answer for the b variable, which was contained in the else part of the If statement. First the b variable value is retrieved and manually placed on the stack, and then the location of the output text (located at the .LC0 label) is placed on the stack. With both elements on the stack, the printf C function is called to display the answer. The code then proceeds to the ending instructions.

If the JLE instruction was false, then a is not less than or equal to b, and the jump is not performed. Instead, the "then" part of the If statement is performed:

movl    −4(%ebp), %eax
movl    %eax, 4(%esp)
movl    $.LC0, (%esp)
call    printf
jmp     .L3

Here, the a variable is loaded onto the stack, along with the output text. Then the printf C function is called to display the answer, and execution jumps to the .L3 label. Finally, all roads load to the exit C function:

.L3:
    movl    $0, (%esp)
    call    exit
    .size   main, .-main
    .section        .note.GNU-stack,"",@progbits
    .ident  "GCC: (GNU) 3.3.2 (Debian)"

It is pretty easy to see the if-then logic contained in the assembly language code. You can apply the same logic to any if-then situation you need in your assembly language programs.

At first it may appear that the implementation of the if-then logic in assembly language is backwards. It might seem to be easier for the jump instruction to evaluate for the "true" condition to jump to the "then" section. There is a reason why the opposite situation is used, which is shown in the "Optimizing Branch Instructions" section later in this chapter.

The assembly language code used to implement an if statement looks like the following:

if:
   <condition to evaluate>
   jxx  else        ; jump to the else part if the condition is false
<code to implement the "then" statements>
jmp end             ;jump to the end
else:
   < code to implement the "else" statements>
end:

Of course, this was a trivial example of an If statement. In a real production program, the condition to evaluate becomes much more complicated. In these situations, evaluating the condition for the if statement becomes just as crucial as the If statement code itself.

Instead of a single conditional jump instruction, there may be several, with each one evaluating a separate part of the if condition. For example, the C language if statement

if (eax < ebx) || (eax == ecx) then

creates the following assembly language code:

if:
   cmpl %eax, %ebx
   jle else
   cmpl %eax, %ecx
   jne else
then:
   < then logic code>
   jmp end
else:
   < else logic code >
end:

This If statement condition required two separate CMP instructions. Because the logical operator is an OR, if either CMP instruction evaluates to true, the program jumps to the else label. If the logical operator is an AND, you would need to use an intermediate label to ensure that both CMP instructions evaluate to true.

for loops

The next statement to tackle is for loops. Here's the sample C program used to start us off, the for.c program:

/* for.c – A sample C for program */
#include <stdio.h>

int main()
{
   int i = 0;
   int j;
   for (i = 0; i < 1000; i++)
   {
      j = i * 5;
      printf("The answer is %d
", j);
   }
return 0;
}

Again, this uses a pretty simplistic C program to demonstrate how for-next loops are implemented in assembly language code. Here's the assembly code generated by the GNU compiler:

$ gcc -S for.c
$ cat for.s
        .file   "for.c"
        .section        .rodata
.LC0:
        .string "The answer is %d
"
        .text
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $24, %esp
        andl    $-16, %esp
        movl    $0, %eax
        subl    %eax, %esp
        movl    $0, −4(%ebp)
        movl    $0, −4(%ebp)
.L2:
        cmpl    $999, −4(%ebp)
        jle     .L5
        jmp     .L3
.L5:
        movl    −4(%ebp), %edx
        movl    %edx, %eax
        sall    $2, %eax
        addl    %edx, %eax
        movl    %eax, −8(%ebp)
        movl    −8(%ebp), %eax
        movl    %eax, 4(%esp)
        movl    $.LC0, (%esp)
        call    printf
        leal    −4(%ebp), %eax
        incl    (%eax)
        jmp     .L2
.L3:
        movl    $0, (%esp)
        call    exit
        .size   main, .-main
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.3.2 (Debian)"
$

Similar to the if statement code, the for statement code first does some housekeeping with the ESP and EBP registers, manually setting the EBP register to the start of the stack, and making room for the variables used in the function. The for statement starts with the .L2 label:

.L2:
    cmpl    $999, −4(%ebp)
    jle     .L5
    jmp     .L3

The condition set in the for statement is set at the beginning of the loop. In this case, the condition is to determine whether the variable is less than 1,000. If the condition is true, execution jumps to the .L5 label, where the for loop code is. When the condition is false, execution jumps to the .L3 label, which is the ending code.

The For loop code is as follows:

.L5:
    movl    −4(%ebp), %edx
    movl    %edx, %eax
sall    $2, %eax
    addl    %edx, %eax
    movl    %eax, −8(%ebp)
    movl    −8(%ebp), %eax
    movl    %eax, 4(%esp)
    movl    $.LC0, (%esp)
    call    printf

The first variable location (the i variable in the C code) is moved to the EDX register, and then moved to the EAX register. The next two instructions are mathematical operations (which are covered in detail in Chapter 8). The CALL instruction performs a left shift of the EAX register two times. This is equivalent to multiplying the number in the EAX register by 4. The next instruction adds the EDX register value to the EAX register value. Now the EAX register contains the original value multiplied by 5 (tricky).

After the value has been multiplied by 5, it is stored in the location reserved for the second variable (the j variable in the C code). Finally, the value is placed on the stack, along with the location of the output text, and the printf C function is called.

The next part of the code gets back to the for statement function:

leal    −4(%ebp), %eax
incl    (%eax)
jmp     .L2

The LEA instruction has not been discussed yet. It loads the effective memory address of the declared variable into the register specified. Thus, the memory location of the first variable (i) is loaded into the EAX register. The next instruction uses the indirect addressing mode to increment the value pointed to by the EAX register by one. This in effect adds one to the i variable. After that, execution jumps back to the start of the for loop, where the I value will be tested to determine whether it is less than 1,000, and the whole process is performed again.

From this example you can see the framework for implementing for loops in assembly language. The pseudocode looks something like this:

for:
   <condition to evaluate for loop counter value>
   jxx forcode    ; jump to the code of the condition is true
   jmp end        ; jump to the end if the condition is false
forcode:
  < for loop code to execute>
  <increment for loop counter>
   jmp for        ; go back to the start of the For statement
end:

The while loop code uses a format similar to the For loop code. Try creating a test while loop in a C program and viewing the generated assembly code. It will look similar to the for loop code shown here.

Optimizing Branch Instructions

Branch instructions greatly impact the performance of applications. Most modern processors (including ones in the IA-32 family) utilize instruction prefetch caches to increase performance. As the program is run, the instruction prefetch cache is filled with sequential instructions.

As described in Chapter 2, the out-of-order engine attempts to execute instructions as soon as possible, even if earlier instructions in the program have not yet been executed. Branch instructions, however, create great havoc in the out-of-order engine. The following sections describe how modern Pentium processors handle branches, and what you can do to improve the performance of your assembly language programs.

Branch prediction

When a branch instruction is encountered, the processor out-of-order engine must determine the next instruction to be processed. The out-of-order unit utilizes a separate unit called the branch prediction front end to determine whether or not a branch should be followed. The branch prediction front end employs different techniques in its attempt to predict branch activity. When creating assembly language code that includes conditional branches, you should be aware of this processor feature.

Unconditional branches

With unconditional branches, the next instruction is not difficult to determine, but depending on how long of a jump there was, the next instruction may not be available in the instruction prefetch cache. This is demonstrated in Figure 6-4.

Figure 6-4

Figure 6.4. Figure 6-4

When the new instruction location is determined in memory, the out-of-order engine must first determine if the instruction is available in the prefetch cache. If not, the entire prefetch cache must be cleared, and reloaded with instructions from the new location. This can be costly to the performance of the application.

Conditional branches

Conditional branches present an even greater challenge to the processor. For each unconditional branch, the branch prediction unit must determine if the branch should be taken or not. Usually, when the out-of-order engine is ready to execute the conditional branch, not enough information is available to know for certain which direction the branch will take.

Instead, the branch prediction algorithms attempt to guess which path a particular conditional branch will take. This is done using rules and learned history. Three main rules are implemented by the branch prediction algorithms:

  • Backward branches are assumed to be taken.

  • Forward branches are assumed to be not taken.

  • Branches that have been previously taken are taken again.

Using normal programming logic, the most often seen use of backward branches (branches that jump to previous instruction codes) is in loops. For example, the code snippet

movl $100, $ecx
loop1:
   addl %cx, %eax
   decl %ecx
   jns loop1

will jump 100 times back to the loop1 label, but fall through to the next instruction only once. The first branching rule will always assume that the backwards branch will be taken. Out of the 101 times the branch is executed, it will only be wrong once.

Forward branches are a little trickier. The branch prediction algorithm assumes that most of the time conditional branches that go forward are not taken. In programming logic, this assumes that the code immediately following the jump instruction is most likely to be taken, rather than the jump that moves over the code. This situation is seen in the following code snippet:

movl    −4(%ebp), %eax
    cmpl    −8(%ebp), %eax
    jle     .L2
    movl    −4(%ebp), %eax
    movl    %eax, 4(%esp)
    movl    $.LC0, (%esp)
    call    printf
    jmp     .L3
.L2:
    movl    −8(%ebp), %eax
    movl    %eax, 4(%esp)
    movl    $.LC0, (%esp)
call    printf
.L3:

Does this look familiar? It is the code snippet from the analysis of the C program If statement. The code following the JLE instruction handles the "then" part of the If statement. From a branch prediction point of view, we can now see the reason why the JLE instruction was used instead of a JG instruction. When the compiler created the assembly language code, it attempted to maximize the code's performance by guessing that the "then" part of the If statement would be more likely to be taken than the "else" part. Because the processor branch prediction unit assumes forward jumps to not be taken, the "then" code would already be in the instruction prefetch cache, ready to be executed.

The final rule implies that branches that are performed multiple times are likely to follow the same path the majority of the time. The Branch Target Buffer (BTB) keeps track of each branch instruction performed by the processor, and the outcome of the branch is stored in the buffer area.

The BTB information overrides either of the two previous rules for branches. For example, if a backward branch is not taken the first time it is encountered, the branch prediction unit will assume it will not be taken any subsequent times, rather than assume that the backwards branch rule would apply.

The problem with the BTB is that it can become full. As the BTB becomes full, looking up branch results takes longer, and performance for executing the branch decreases.

Optimizing tips

While the processor tries its best to optimize how it handles branches, you can incorporate a few tricks into your assembly language programs to help it along. The following sections describe some of the branch optimization tricks recommended by Intel for use on the Pentium family of processors.

Eliminate branches

The most obvious way to solve branch performance problems is to eliminate the use of branches whenever possible. Intel has helped in this by providing some specific instructions.

In Chapter 5, "Moving Data," the CMOV instructions were discussed. These instructions were specifically designed to help the assembly language programmer avoid using branches to set data values. An example of using a CMOV instruction is as follows:

movl value, %ecx
cmpl %ebx, %ecx
cmova %ecx, %ebx

The CMOVA instruction checks the results from the CMP instruction. If the unsigned integer value in the ECX register is above the unsigned integer value in the EBX register, the value in the ECX register is placed in the EBX register. This functionality enabled us to create the cmovtest.s program, which determined the largest number in a series without a bunch of jump instructions.

Sometimes duplicating a few extra instructions can eliminate a jump. This small instruction overhead will easily fit into the instruction prefetch cache, and make up for the performance hit of the jump itself. A classic example of this is the situation in which a branch can occur within a loop:

loop:
   cmp data(, %edi, 4), %eax
   je part2
   call function1
   jmp looptest
part2:
   call function2
looptest:
   inc %edi
   cmpl $10, %edi
   jnz loop

The loop calls one of two functions, depending on the value read from the data array. After the function is called, a jump is made to the end of the loop to increase the index value of the array and loop back to the start of the loop. Each time the first function is called, the JMP instruction must be evaluated to jump forward to the looptest label. Because this is a forward branch, it will not be predicted to be taken, and a performance penalty will result.

To change this, you can modify the code snippet to look like the following:

loop:
   cmp data(, %edi, 4), %eax
   je part2
   call function1
   inc %edi
   cmp $10, %edi
   jnz loop
   jmp end
part2:
   call function2
   inc %edi
   cmp $10, %edi
   jnz loop
end:

Instead of using the forward branch within the loop, the looptest code was duplicated within the first function code section, eliminating one forward jump from the code.

Code predictable branches first

You can exploit the branch prediction unit rules to increase the performance of your application. As seen in the If statement code presented, placing code that is most likely to be taken as the fall-through of the jump forward statement increases the likelihood that it will be in the instruction prefetch cache when needed. Allow the jump instruction to jump to the less likely used code segments.

For code using backward branches, try to use the backward branch path as the most taken path. When implementing loops this is not usually a problem. However, in some cases you may have to alter the program logic to accomplish this.

Figure 6-5 sums up both of these scenarios.

Figure 6-5

Figure 6.5. Figure 6-5

Unroll loops

While loops are generally covered by the backward branch rule, there is still a performance penalty even when they are predicted correctly. A better rule-of-thumb to use is to eliminate small loops whenever possible.

The problem appears in loop overhead. Even a simple loop requires a counter that must be checked for each iteration, and a jump instruction that must be evaluated. Depending on the number of program logic instructions within the loop, this can be a huge overhead.

For smaller loops, unrolling the loop can solve this problem. Unrolling the loop means to manually code each of the instructions multiple times instead of using the loop to go back over the same instruction. The following code is an example of a small loop that can be unrolled:

movl values, %ebx
   movl $1, %edi
loop:
   movl values(, %edi, 4), %eax
   cmp %ebx, %eax
   cmova %eax, %ebx
   inc %edi
   cmp $4, %edi
   jne loop

This was the main loop from the cmovtest.s program in Chapter 5. Instead of looping through the instructions to look for the largest value four times, you can unroll the loop into a series of four moves:

movl values, %ebx
movl $values, %ecx
movl (%ecx), %eax
cmp %ebx, %eax
cmova %eax, %ebx
movl 4(%ecx), %eax
cmp %ebx, %eax
cmova %eax, %ebx
movl 8(%ecx), %eax
cmp %ebx, %eax
cmova %eax, %ebx
movl 12(%ecx), %eax
cmp %ebx, %eax
cmova %eax, %ebx

While the number of instructions has greatly increased, the processor will be able to feed all of them directly into the instruction prefetch cache, and zip through them in no time.

Be careful when unrolling loops though, as it is possible to unroll too many instructions, and fill the prefetch cache. This will force the processor to constantly fill and empty the prefetch cache.

Summary

This chapter presented instructions to help you program logic into your assembly language programs. Just about every assembly language program needs the capability to branch to other parts of the instruction code depending on data values, or to loop through sections of code a specific number of times.

The IA-32 platform provides several instructions for coding branching and looping functions. There are two different types of branch functions: unconditional branches and conditional branches. Unconditional branches are performed regardless of external values or events, whereas conditional branches rely on an external value or event to branch.

There are three types of unconditional branches: jumps, calls, and interrupts. The unconditional jump is the most basic form of execution control. The JMP instruction forces the instruction pointer to change to the destination location, and the processor executes the next instruction at that spot. Calls are similar to jumps, but support the capability to return to the location after the call. The return location is stored in the stack area, and the called function must restore the stack back to its original condition before returning to the calling area. Software interrupts are used to provide access to low-level kernel functions in the operating system. Both Microsoft Windows and Linux provide system calls via software interrupts. The Linux system calls are available at software interrupt 0x80.

Conditional branches rely on the values of the EFLAGS register. Specifically, the carry, overflow, parity, sign, and Zero flags are used to affect the conditional branch. Specific branch instructions monitor specific flag bits, such as the JC instruction jumps when the carry flag is set, and the JZ instruction jumps when the Zero flag is set. The CMP instruction is helpful in comparing two values and setting EFLAGS bits for the conditional jump instruction.

Loops provide a method for you to easily replicate code functions without having to duplicate a lot of code. Just as in high-level languages, loops enable you to perform tasks a specific number of times by using a counter value that is decreased for each iteration. The LOOP instruction automatically uses the ECX register as the counter, and decreases and tests it during each iteration.

You can duplicate high-level language conditional functions such as If-then and For statements using common assembly language jumps and loops. To see how C functions are coded, you can use the -S parameter of the GNU compiler to view the generated assembly language code.

When working with Pentium processors, you can use some optimization techniques to increase the performance of your assembly language code. Pentium processors use the instruction prefetch cache, and attempt to load instructions as quickly as possible into the cache. Unfortunately, branch instructions can have a detrimental effect on the prefetch cache. As branches are detected in the cache, the out-of-order engine attempts to predict what path the branch is most likely to take. If it is wrong, the instruction prefetch cache is loaded with instructions that are not used, and processor time is wasted. To help solve this problem, you should be aware of how the processor predicts branches, and attempt to code your branches in the same manner. Also, eliminating branches whenever possible will greatly speed things up. Finally, examining loops and converting them into a sequential series of operations enables the processor to load all of the instructions into the prefetch cache, and not have to worry about branching for the loop.

The next chapter discusses how the processor deals with various types of numbers. Just as in high-level languages, there are multiple ways to represent numbers, and there are multiple ways that those numbers are handled in assembly language programs. Knowing all the different types of number formats can help you when working with mathematically intensive operations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset