Chapter 4. A Sample Assembly Language Program

With all of your development tools in order, it's time to start learning about assembly language programming. Assembly language programs use a common template and format (specific to the assembler used), which you can develop and use for all of your applications.

This chapter walks you through a basic assembly language program template for the GNU assembler. The first section of the chapter describes the common items found in assembly language programs, and how they can be used to define a common template. The next section shows a sample program, and how to assemble and run it. Next you will learn how to debug the sample program using the GNU debugger. The last section of this chapter demonstrates how to incorporate C library functions into your assembly language programs.

The Parts of a Program

As shown in Chapter 1, "What Is Assembly Language?," the assembly language program consists of defined sections, each of which has a different purpose. The three most commonly used sections are as follows:

  • The data section

  • The bss section

  • The text section

The text section is required in all assembly language programs. It is where the instruction codes are declared within the executable program. The data and bss sections are optional, but often used within a program. The data section declares data elements that are declared with an initial value. These data elements are used as variables within the assembly language program. The bss section declares data elements that are instantiated with a zero (or null) value. These data elements are most often used as buffer areas within the assembly language program.

The following sections describe how to declare the different sections in an assembly language program written for the GNU assembler, which is the assembler used throughout this book.

Defining sections

The GNU assembler declares sections using the .section declarative statement. The .section statement takes a single argument, the type of section it is declaring. Figure 4-1 shows the layout of an assembly language program.

Figure 4-1

Figure 4.1. Figure 4-1

Figure 4-1 demonstrates the normal way the sections are placed in the program. The bss section should always be placed before the text section, but the data section can be moved to follow the text section, although that is not the standard. Besides being functional, your assembly language programs should also be easily readable. Keeping all of the data definitions together at the beginning of the source code makes it easier for other programmers to pick up your work and understand it.

Defining the starting point

When the assembly language program is converted to an executable file, the linker must know what the starting point is in your instruction code. For simple programs with only a single instruction path, finding the starting point is not usually a problem. However, in more complex programs that use several functions scattered throughout the source code, finding where the program starts can be an issue.

To solve this problem, the GNU assembler declares a default label, or identifier, that should be used for the entry point of the application. The _start label is used to indicate the instruction from which the program should start running. If the linker cannot find this label, it will produce an error message:

$ ld -o badtest badtest.o
ld: warning: cannot find entry symbol _start; defaulting to 08048074
$

As you can see from the linker output, if the linker cannot find the _start label, it will attempt to find the starting point of the program, but for complex programs there is no guarantee that it will guess correctly.

You can use a different label besides _start as the starting point. You can use the -e parameter of the linker to define what the new starting point is called.

Besides declaring the starting label in the application, you also need to make the entry point available for external applications. This is done with the .globl directive.

The .globl directive declares program labels that are accessible from external programs. If you are writing a bunch of utilities that are being used by external assembly or C language programs, each function section label should be declared with a .globl directive.

Armed with this information, you can create a basic template for all your assembly language programs. The template should look something like this:

.section.data

       <  initialized data here>

.section .bss

       < uninitialized data here>

.section .text
.globl _start
_start:

    <instruction code goes here>

With this template in hand, you are ready to start coding assembly language programs. The next section walks through a simple application that shows how to build an application from the assembly language program source code.

Creating a Simple Program

Now it is time to create a simple assembly language application to demonstrate how all of the pieces fit together. To start off, a simple application that centers on a single instruction code is created. The CPUID instruction code is used to gather information about the processor on which the program is running. You can extract vendor and model information from the processor and display it for your customers to see.

The following sections describe the CPUID instruction and show how to implement an assembly language program to utilize it.

The CPUID instruction

The CPUID instruction is one assembly language instruction that is not easily performed from a high-level language application. It is a low-level instruction that queries the processor for specific information, and returns the information in specific registers.

The CPUID instruction uses a single register value as input. The EAX register is used to determine what information is produced by the CPUID instruction. Depending on the value of the EAX register, the CPUID instruction will produce different information about the processor in the EBX, ECX, and EDX registers. The information is returned as a series of bit values and flags, which must be interpreted to their proper meaning.

The following table shows the different output options available for the CPUID instruction.

EAX Value

CPUID Output

0

Vendor ID string, and the maximum CPUID option value supported

1

Processor type, family, model, and stepping information

2

Processor cache configuration

3

Processor serial number

4

Cache configuration (number of threads, number of cores, and physical properties)

5

Monitor information

80000000h

Extended vendor ID string and supported levels

80000001h

Extended processor type, family, model, and stepping information

80000002h - 80000004h

Extended processor name string

The sample program created in this chapter utilizes the zero option to retrieve the simple Vendor ID string from the processor. When the value of zero is placed in the EAX register, and the CPUID instruction is executed, the processor returns the Vendor ID string in the EBX, EDX, and ECX registers as follows:

  • EBX contains the low 4 bytes of the string.

  • EDX contains the middle 4 bytes of the string.

  • ECX contains the last 4 bytes of the string.

The string values are placed in the registers in little-endian format; thus, the first part of the string is placed in the lower bits of the register. Figure 4-2 shows how this works.

Figure 4-2

Figure 4.2. Figure 4-2

The sample program takes the register values and displays the information to the customer in a human-readable format. The next section presents the sample program.

Not all processors in the IA-32 platform utilize the CPUID instruction the same way. In a real application, you should perform a few tests to ensure that the processor supports the CPUID instruction. To keep things simple, the example program presented in this chapter does not perform any of these tests. It's possible that you may be using a processor that does not support the CPUID instruction, although most modern processors do support it (including Intel Pentium processors, Cyrix processors, and AMD processors).

The sample program

Armed with your knowledge about how the CPUID instruction works, it's time to start writing a simple program to utilize that information. This program is a simple application to check the Vendor ID string that is produced by the CPUID instruction. Here's the sample program, cpuid.s:

#cpuid.s Sample program to extract the processor Vendor ID
.section .data
output:
   .ascii "The processor Vendor ID is 'xxxxxxxxxxxx'
"
.section .text
.globl _start
_start:
   movl $0, %eax
   cpuid
movl $output, %edi
   movl %ebx, 28(%edi)
   movl %edx, 32(%edi)
   movl %ecx, 36(%edi)
   movl $4, %eax
   movl $1, %ebx
   movl $output, %ecx
   movl $42, %edx
   int $0x80
   movl $1, %eax
   movl $0, %ebx
   int $0x80

This program uses quite a few different assembly language instructions. For now, don't worry too much about what they are; that will be described in detail in subsequent chapters. For now, concentrate on how the instructions are placed in the program, the flow of how they operate, and how the source code file is converted into an executable program file. So that you're not totally lost, here's a brief explanation of what's going on in the source code.

First, in the data section, a string value is declared:

output:
   .ascii "The processor Vendor ID is 'xxxxxxxxxxxx'
"

The .ascii declarative is used to declare a text string using ASCII characters. The string elements are predefined and placed in memory, with the starting memory location denoted by the label output. The x's are used as placeholders in the memory area reserved for the data variable. When the vendor ID string is extracted from the processor, it will be placed in the data at those memory locations.

You should recognize the next section of the program from the template. It declares the instruction code section, and the normal starting label of the application:

.section .text
.globl _start
_start:

The first thing the program does is load the EAX register with a value of zero, and then run the CPUID instruction:

movl $0, %eax
cpuid

The zero value in EAX defines the CPUID output option (the Vendor ID string in this case). After the CPUID instruction is run, you must collect the response that is divided up between the three output registers:

movl $output, %edi
movl %ebx, 28(%edi)
movl %edx, 32(%edi)
movl %ecx, 36(%edi)

The first instruction creates a pointer to use when working with the output variable declared in memory. The memory location of the output label is loaded into the EDI register. Next, the contents of the three registers containing the Vendor ID string pieces are placed in the appropriate locations in the data memory, based on the EDI pointer. The numbers outside the parentheses represent the location relative to the output label where the data is placed. This number is added to the address in the EDI register to determine what address the register's value is written to. This process replaces the x's that were used as placeholders with the actual Vendor ID string pieces (note that the Vendor ID string was divided into the registers in the strange order EBX, EDX, and ECX).

When all of the Vendor ID string pieces are placed in memory, it's time to display the information:

movl $4, %eax
movl $1, %ebx
movl $output, %ecx
movl $42, %edx
int $0x80

This program uses a Linux system call (int $0x80) to access the console display from the Linux kernel. The Linux kernel provides many preset functions that can be easily accessed from assembly applications. To access these kernel functions, you must use the int instruction code, which generates a software interrupt, with a value of 0x80. The specific function that is performed is determined by the value of the EAX register. Without this kernel function, you would have to send each output character yourself to the proper I/O address of the display. The Linux system calls are a great time-saver for assembly language programmers.

The complete list of Linux system calls, and how to use them, is discussed in Chapter 12, "Using Linux System Calls."

The Linux write system call is used to write bytes to a file. Following are the parameters for the write system call:

  • EAX contains the system call value.

  • EBX contains the file descriptor to write to.

  • ECX contains the start of the string.

  • EDX contains the length of the string.

If you are familiar with UNIX, you know that just about everything is handled as a file. The standard output (STDOUT) represents the display terminal of the current session, and has a file descriptor of 1. Writing to this file descriptor displays the information on the console screen.

The bytes to display are defined as a memory location to read the information from, and the number of bytes to display. The ECX register is loaded with the memory location of the output label, which defines the start of the string. Because the size of the output string is always the same, we can hard-code the size value in the EDX register.

After the Vendor ID information is displayed, it's time to cleanly exit the program. Again, a Linux system call can help. By using system call 1 (the exit function), the program is properly terminated, and returns to the command prompt. The EBX register contains the exit code value returned by the program to the shell. This can be used to produce different results in a shell script program, depending on situations within the assembly language program. A value of zero indicates the program executed successfully.

Building the executable

With the assembly language source code program saved as cpuid.s, you can build the executable program using the GNU assembler and GNU linker as follows:

$ as -o cpuid.o cpuid.s
$ ld -o cpuid cpuid.o
$

The output from these commands is not too exciting (unless of course you had some typos in your code). The first step uses the as command to assemble the assembly language source code into the object code file cpuid.o. The second step uses ld to link that object code file into the executable file cpuid.

If you did have a typo in the source code, the assembler will indicate the line in which the typo is located:

$ as -o cpuid.o cpuid.s
cpuid.s: Assembler messages:
cpuid.s:15: Error: no such instruction: `mavl %edx,32(%edi)'
$

Running the executable

After the linker generates the executable program file, it is ready to be run. Here's a sample output from my MEPIS system running on a Pentium 4 processor:

$ ./cpuid
The processor Vendor ID is 'GenuineIntel'
$

Excellent! The program ran as expected! One of the benefits of Linux is that some distributions will run on most any old piece of junk you might have sitting around. Here's the output from an old 200MHz PC with a Cyrix 6x86MX processor on which I ran Mandrake Linux 6.0:

$ ./cpuid
The processor Vendor ID is 'CyrixInstead'
$

You gotta love the humor of system engineers.

Assembling using a compiler

Because the GNU Common Compiler (gcc) uses the GNU assembler to compile C code, you can also use it to assemble and link your assembly language program in a single step. While this is not a common method to use, it is available when necessary.

There is one problem when using gcc to assemble your programs. While the GNU linker looks for the _start label to determine the beginning of the program, gcc looks for the main label (you might recognize that from C or C++ programming). You must change both the _start label and the .globl directive defining the label in your program to look like the following:

.section .text
.globl main
main:

After doing that, it is a snap to assemble and link programs:

$ gcc -o cpuid cpuid.s
$ ./cpuid
The processor Vendor ID is 'GenuineIntel'
$

Debugging the Program

In this simple example, unless you introduced some typing errors in the source code, the program should have run with the expected results. Unfortunately, that is not always the case in assembly language programming.

In more complicated programs, it is easy to make a mistake when assigning registers and memory locations, or trying special instruction codes to handle complex data issues. When this happens, it is good to have a debugger handy to step through the program and watch how the data is handled.

This section shows how to use the GNU debugger to walk through the sample program, watching how the registers and memory location are changed throughout the process.

Using gdb

In order to debug the assembly language program, you must first reassemble the source code using the -gstabs parameter:

$ as -gstabs -o cpuid.o cpuid.s
$ ld -o cpuid cpuid.o
$

As with the first time it was assembled, the source code assembles with no error or warning messages. By specifying the -gstabs parameter, extra information is assembled into the executable program file to help gdb walk through the source code. While the executable program file created with the -gstabs parameter still runs and behaves just like the original program, it is not a wise idea to use the -gstabs parameter unless you are specifically debugging an application.

Because the -gstabs parameter adds additional information to the executable program file, the resulting file becomes larger than it needs to be just to run the application. For this example program, assembling without the -gstabs parameter produces the following file:

-rwxr-xr-x    1 rich     rich          771 2004-07-13 07:32 cpuid

When assembling with the -gstabs parameter, the program file becomes the following:

-rwxr-xr-x    1 rich     rich         1099 2004-07-13 07:20 cpuid

Notice that the file size went from 771 bytes to 1,099 bytes. Although the difference is trivial for this example, imagine what happens with a 10,000-line assembly language program! Again, it is best to not use the debugging information if it is not necessary.

Stepping through the program

Now that the executable program file contains the necessary debugging information, you can run the program within gdb:

$ gdb cpuid
GNU gdb 6.0-debian
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-linux"...
(gdb)

The GNU debugger starts, with the program loaded into memory. You can run the program from within gdb using the run command:

(gdb) run
Starting program: /home/rich/palp/chap04/cpuid
The processor Vendor ID is 'GenuineIntel'

Program exited normally.
(gdb)

As you can see from the output, the program ran within the debugger just as it did from the command line. That's not especially exciting. Now it's time to freeze the program as it starts, and step through each line of source code individually.

To do that, you must set a breakpoint. Breakpoints are places in the program code where you want the debugger to stop running the program and let you look at things. There are several different options you can use when setting a breakpoint. You can choose to stop execution at any of the following:

  • A label

  • A line number in the source code

  • A data value when it reaches a specific value

  • A function after it is performed a specific number of times

For this simple example, we will set a breakpoint at the beginning of the instruction codes, and watch the program as it progresses through the source code.

When specifying breakpoints in assembly language programs, you must specify the location relative to the nearest label. Because this sample program has only one label in the instruction code section, every breakpoint must be specified from _start. The format of the break command is

break *label+offset

where label is the label in the source code to reference, and offset is the number of lines from the label where execution should stop.

To set a breakpoint at the first instruction, and then start the program, you would use the following commands:

(gdb) break *_start
Breakpoint 1 at 0x8048075: file cpuid.s, line 11.
(gdb) run
Starting program: /home/rich/palp/chap04/cpuid
The processor Vendor ID is 'GenuineIntel'

Program exited normally.
(gdb)

The breakpoint was specified using the *_start parameter, which specifies the first instruction code after the _start label. Unfortunately, when the program is run, it ignores the breakpoint, and runs through the entire program. This is a well-known bug in the current version of gdb. It has been around for a while, but hopefully it will be fixed soon.

To work around this problem, you have to include a dummy instruction as the first instruction code element after the _start label. In assembly, the dummy instruction is called NOP, for no operation.

If you modify the cpuid.s source code by adding a NOP instruction immediately after the _start label, it should look like this:

_start:
   nop
   movl $0, %eax
   cpuid

After adding the NOP instruction, you can create a breakpoint at that location, signified as _start+1. Now, after assembling with the -gstabs parameter (and don't forget to link the new object code file), you can try out the debugger again:

(gdb) break *_start+1
Breakpoint 1 at 0x8048075: file cpuid.s, line 12.
(gdb) run
Starting program: /home/rich/palp/chap04/cpuid

Breakpoint 1, _start () at cpuid.s:12
12         movl $0, %eax
Current language:  auto; currently asm
(gdb)

Perfect! The program started and then paused at (what use to be) the first instruction code. Now you can step your way through the program using either the next or step commands:

(gdb) next
_start () at cpuid.s:13
13         cpuid
(gdb) next
_start () at cpuid.s:14
14         movl $output, %edi
(gdb) step
_start () at cpuid.s:15
15         movl %ebx, 28(%edi)
(gdb) step
_start () at cpuid.s:16
16 @code last w/screen:movl %edx, 32(%edi)

Each next or step command executes the next line of source code (and tells you what line number that is). Once you have walked through the section you were interested in seeing, you can continue to run the program as normal using the cont command:

(gdb) cont
Continuing.
The processor Vendor ID is 'GenuineIntel'

Program exited normally.
(gdb)

The debugger picks up from where it was stopped and finishes running the program as normal.

While it is good to walk through the program slowly, it is even better to be able to examine data elements as you are walking. The debugger provides a method for you to do that, as described in the next section.

Viewing the data

Now that you know how to stop the program at specific locations, it's time to examine the data elements at each stop. Several different gdb commands are used to examine the different types of data elements.

The two most common data elements to examine are registers and memory locations used for variables. The commands used for displaying this information are shown in the following table.

Data Command

Description

info registers

Display the values of all registers

print

Display the value of a specific register or variable from the program

x

Display the contents of a specific memory location

The info registers command is great for seeing how all of the registers are affected by an instruction:

(gdb) s
_start () at cpuid.s:13
13         cpuid
(gdb) info registers
eax            0x0      0
ecx            0x0      0
edx            0x0      0
ebx            0x0      0
esp            0xbffffd70       0xbffffd70
ebp            0x0      0x0
esi            0x0      0
edi            0x0      0
eip            0x804807a        0x804807a
eflags         0x346    838
cs             0x23     35
ss             0x2b     43
ds             0x2b     43
es             0x2b     43
fs             0x0      0
gs             0x0      0
(gdb) s
_start () at cpuid.s:14
14         movl $output, %edi
(gdb) info registers
eax            0x2      2
ecx            0x6c65746e       1818588270
edx            0x49656e69       1231384169
ebx            0x756e6547       1970169159
esp            0xbffffd70       0xbffffd70
ebp            0x0      0x0
esi            0x0      0
edi            0x0      0
eip            0x804807c        0x804807c
eflags         0x346    838
cs             0x23     35
ss             0x2b     43
ds             0x2b     43
es             0x2b     43
fs             0x0      0
gs             0x0      0
(gdb)

This output shows that before the CPUID instruction is executed, the EBX, ECX, and EDX registers all contain zero. After the CPUID instruction, they contain the values from the Vendor ID string.

The print command can also be used to display individual register values. Including a modifier can modify the output format of the print command:

  • print/d to display the value in decimal

  • print/t to display the value in binary

  • print/x to display the value in hexadecimal

An example of the print command would be the following:

(gdb) print/x $ebx
$9 = 0x756e6547
(gdb) print/x $edx
$10 = 0x49656e69
(gdb) print/x $ecx
$11 = 0x6c65746e
(gdb)

The x command is used to display the values of specific memory locations. Similar to the print command, the x command output can be modified by a modifier. The format of the x command is

x/nyz

where n is the number of fields to display, y is the format of the output, and can be

  • c for character

  • d for decimal

  • x for hexadecimal

and z is the size of the field to be displayed:

  • b for byte

  • h for 16-bit word (half-word)

  • w for 32-bit word

The following example uses the x command to display the memory locations at the output label:

(gdb) x/42cb &output
0x80490ac <output>:84 'T'  104 'h' 101 'e' 32 ' '  112 'p' 114 'r' 111 'o'99 'c'
0x80490b4 <output+8>:101 'e' 115 's' 115 's' 111 'o' 114 'r' 32 ' '  86 'V' 101 'e'
0x80490bc <output+16>:110 'n' 100 'd' 111 'o' 114 'r' 32 ' ' 73 'I'  68 'D' 32 ' '
0x80490c4 <output+24>:105 'i' 115 's' 32 ' ' 39 ''' 71 'G'  101 'e' 110 'n'117 'u'
0x80490cc <output+32>:105 'i' 110 'n' 101 'e' 73 'I' 110 'n' 116 't' 101 'e'108 'l'
0x80490d4 <output+40>:39 ''' 10 '
'
(gdb)

This command displays the first 42 bytes of the output variable (the ampersand sign is used to indicate that it is a memory location) in character mode (which also shows the decimal values as well). This feature is invaluable when tracking instructions that manipulate memory locations.

Using C Library Functions in Assembly

The cpuid.s program used the Linux system calls to display the Vendor ID string information on the console. There are other ways to perform this function without using the system calls.

One method is to use the standard C library functions that are well known to C programmers. It is easy to tap into that resource to utilize many common C functions.

This section describes how to utilize C library functions within your assembly language programs. First, the common printf C function is described, and a new version of the cpuid.s program is shown using the printf function. Then, the next section shows how to assemble and link programs that use C library functions.

Using printf

The original cpuid.s program used Linux system calls to display the results. If you have the GNU C compiler installed on your system, you can just as easily use the common C functions that you are probably already familiar with.

The C libraries contain many of the functions that are common to C programs, such as printf and exit. For this version of the program, the Linux system calls are replaced with equivalent C library calls. Here's the cpuid2.s program:

#cpuid2.s View the CPUID Vendor ID string using C library calls
.section .data
output:
    .asciz "The processor Vendor ID is '%s'
"
.section .bss
    .lcomm buffer, 12
.section .text
.globl _start
_start:
    movl $0, %eax
    cpuid
    movl $buffer, %edi
    movl %ebx, (%edi)
    movl %edx, 4(%edi)
    movl %ecx, 8(%edi)
    pushl $buffer
    pushl $output
    call printf
    addl $8, %esp
    pushl $0
    call exit

The printf function uses multiple input parameters, depending on the variables to be displayed. The first parameter is the output string, with the proper codes used to display the variables:

output:
    .asciz "The processor Vendor ID is '%s'
"

Notice that this uses the .asciz directive instead of .ascii. The printf function expects a null-terminated string as the output string. The .asciz directive adds the null character to the end of the defined string.

The next parameter used is the buffer that will contain the Vendor ID string. Because the value of the buffer does not need to be defined, it is declared in the bss section as a 12-byte buffer area using the .lcomm directive:

.section .bss
    .lcomm buffer, 12

After the CPUID instruction is run, the registers containing the Vendor ID string pieces are placed in the buffer variable in the same way that they were in the original cpuid.s program.

To pass the parameters to the printf C function, you must push them onto the stack. This is done using the PUSHL instruction. The parameters are placed on the stack in reverse order from how the printf function retrieves them, so the buffer value is placed first, followed by the output string value. After that, the printf function is called using the CALL instruction:

pushl $buffer
pushl $output
call printf
addl $8, %esp

The ADDL instruction is used to clear the parameters placed on the stack for the printf function. The same technique is used to place a zero return value on the stack for the C exit function to use.

Linking with C library functions

When you use C library functions in your assembly language program, you must link the C library files with the program object code. If the C library functions are not available, the linker will fail:

$ as -o cpuid2.o cpuid2.s
$ ld -o cpuid2 cpuid2.o
cpuid2.o: In function `_start':
cpuid2.o(.text+0x3f): undefined reference to `printf'
cpuid2.o(.text+0x46): undefined reference to `exit'
$

In order to link the C function libraries, they must be available on your system. On Linux systems, there are two ways to link C functions to your assembly language program. The first method is called static linking. Static linking links function object code directly into your application executable program file. This creates huge executable programs, and wastes memory if multiple instances of the program are run at the same time (each instance has its own copy of the same functions).

The second method is called dynamic linking. Dynamic linking uses libraries that enable programmers to reference the functions in their applications, but not link the function codes in the executable program file. Instead, dynamic libraries are called at the program's runtime by the operating system, and can be shared by multiple programs.

On Linux systems, the standard C dynamic library is located in the file libc.so.x, where x is a value representing the version of the library. On my MEPIS system, this is the file libc.so.5. This library file contains the standard C functions, including printf and exit.

This file is automatically linked to C programs when using gcc. You must manually link it to your program object code for the C functions to operate. To link the libc.so file, you must use the -l parameter of the GNU linker. When using the -l parameter, you do not need to specify the complete library name. The linker assumes that the library will be in a file:

/lib/libx.so

where the x is the library name specified on the command-line parameter—in this case, the letter c. Thus, the command to link the program would be as follows:

$ ld -o cpuid2 -lc cpuid2.o
$ ./cpuid2
bash: ./cpuid2: No such file or directory
$

Well, that's interesting. The program object code linked with the standard C functions library file just fine, but when I tried to run the resulting executable file, the preceding error message was generated.

The problem is that the linker was able to resolve the C functions, but the functions themselves were not included in the final executable program (remember that we used a dynamically linked library). The linker assumed that the necessary library files would be found at runtime. Obviously, that was not the case in this instance.

To solve this problem, you must also specify the program that will load the dynamic library at runtime. For Linux systems, this program is ld-linux.so.2, normally found in the /lib directory. To specify this program, you must use the -dynamic-linker parameter of the GNU linker:

$ ld -dynamic-linker /lib/ld-linux.so.2 -o cpuid2 -lc cpuid2.o
$ ./cpuid2
The processor Vendor ID is 'GenuineIntel'
$

There, that's much better. Now when the executable program is run, it uses the ld-linux.so.2 dynamic loader program to find the libc.so library, and the program runs just fine.

It is also possible to use the gcc compiler to assemble and link the assembly language program and C library functions. In fact, in this case it's a lot easier. The gcc compiler automatically links in the necessary C libraries without you having to do anything special.

First, remember that to compile assembly language programs with gcc, you must change the _start label to main. After that, all you need to do is compile the source code with a single command:

$ gcc -o cpuid2 cpuid2.s
$ ./cpuid2
The processor Vendor ID is 'GenuineIntel'
$

The GNU compiler automatically linked the proper C library functions for you.

Summary

When creating your assembly language programs, it is a good idea to have a common program template for the assembler you are using. The template can be used as a starting point for all programs that are created with the assembler.

The template used with the GNU assembler requires specific sections to be defined. The GNU assembler uses sections to divide the different data areas within the program. The data section contains data that is placed in specific memory locations, referenced by labels. The program can refer to the data memory area by the label, and modify the memory locations as necessary. The bss section is used to contain uninitialized data elements, such as working buffers. This is ideal for creating large buffer areas. The text section is used to hold the actual instruction codes for the program. Once this area is created, it cannot be changed by the program.

The final piece of the template should define the starting point in your programs. The GNU assembler uses the _start label to declare the location of the first instruction to process. You can use a different label, but the label must then be specified with the –e parameter in the linker command. To make the _start label accessible to run, you must also define it as a global label. This is done using the .globl directive in the source code.

With a template ready, you can start creating programs. This chapter created a simple test program using the CPUID instruction to extract the Vendor ID string from the processor. The program was assembled using the GNU assembler, and linked using the GNU linker.

After the program was tested, the GNU debugger was used to show how to debug assembly language programs. The programs must be assembled using the -gstabs parameter, so the debugger can match instruction codes with source code lines. Also remember to include a NOP instruction immediately after the _start label if you need to stop the program execution before the first instruction code.

The GNU debugger enables you to walk through the program code line by line, watching the values of registers and memory locations along the way. This is an invaluable tool when trying to hunt down logic problems in algorithms, or even typos where the wrong register is used in an instruction.

Finally, the sample program was modified to show how to utilize C functions within assembly language programs. The printf and exit functions were used to display data and cleanly exit the program. To use C functions, the assembly language program must be linked with the C libraries on the host system. The best way to do that is to use the C dynamic libraries. Linking using dynamic libraries requires another command-line parameter for the linker, the -dynamic-linker parameter. This specifies the program used by the operating system to dynamically find and load the library files.

This ends the introduction to the assembly language section. It is hoped that you now have a good idea of what assembly language is, and how it will be beneficial to your high-level language applications. The next section of the book shows the basics of assembly language programming. The next chapter tackles the sometimes difficult task of manipulating data within assembly language programs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset