With all of your development tools in order, it's time to start learning about assembly language programming. Assembly language programs use a common template and format (specific to the assembler used), which you can develop and use for all of your applications.
This chapter walks you through a basic assembly language program template for the GNU assembler. The first section of the chapter describes the common items found in assembly language programs, and how they can be used to define a common template. The next section shows a sample program, and how to assemble and run it. Next you will learn how to debug the sample program using the GNU debugger. The last section of this chapter demonstrates how to incorporate C library functions into your assembly language programs.
As shown in Chapter 1, "What Is Assembly Language?," the assembly language program consists of defined sections, each of which has a different purpose. The three most commonly used sections are as follows:
The text section is required in all assembly language programs. It is where the instruction codes are declared within the executable program. The data and bss sections are optional, but often used within a program. The data section declares data elements that are declared with an initial value. These data elements are used as variables within the assembly language program. The bss section declares data elements that are instantiated with a zero (or null) value. These data elements are most often used as buffer areas within the assembly language program.
The following sections describe how to declare the different sections in an assembly language program written for the GNU assembler, which is the assembler used throughout this book.
The GNU assembler declares sections using the .section
declarative statement. The .section
statement takes a single argument, the type of section it is declaring. Figure 4-1 shows the layout of an assembly language program.
Figure 4-1 demonstrates the normal way the sections are placed in the program. The bss
section should always be placed before the text section, but the data
section can be moved to follow the text section, although that is not the standard. Besides being functional, your assembly language programs should also be easily readable. Keeping all of the data definitions together at the beginning of the source code makes it easier for other programmers to pick up your work and understand it.
When the assembly language program is converted to an executable file, the linker must know what the starting point is in your instruction code. For simple programs with only a single instruction path, finding the starting point is not usually a problem. However, in more complex programs that use several functions scattered throughout the source code, finding where the program starts can be an issue.
To solve this problem, the GNU assembler declares a default label, or identifier, that should be used for the entry point of the application. The _start
label is used to indicate the instruction from which the program should start running. If the linker cannot find this label, it will produce an error message:
$ ld -o badtest badtest.o ld: warning: cannot find entry symbol _start; defaulting to 08048074 $
As you can see from the linker output, if the linker cannot find the _start
label, it will attempt to find the starting point of the program, but for complex programs there is no guarantee that it will guess correctly.
You can use a different label besides
_start
as the starting point. You can use the -e parameter of the linker to define what the new starting point is called.
Besides declaring the starting label in the application, you also need to make the entry point available for external applications. This is done with the .globl
directive.
The .globl
directive declares program labels that are accessible from external programs. If you are writing a bunch of utilities that are being used by external assembly or C language programs, each function section label should be declared with a .globl
directive.
Armed with this information, you can create a basic template for all your assembly language programs. The template should look something like this:
.section.data < initialized data here> .section .bss < uninitialized data here> .section .text .globl _start _start: <instruction code goes here>
With this template in hand, you are ready to start coding assembly language programs. The next section walks through a simple application that shows how to build an application from the assembly language program source code.
Now it is time to create a simple assembly language application to demonstrate how all of the pieces fit together. To start off, a simple application that centers on a single instruction code is created. The CPUID
instruction code is used to gather information about the processor on which the program is running. You can extract vendor and model information from the processor and display it for your customers to see.
The following sections describe the CPUID
instruction and show how to implement an assembly language program to utilize it.
The CPUID
instruction is one assembly language instruction that is not easily performed from a high-level language application. It is a low-level instruction that queries the processor for specific information, and returns the information in specific registers.
The CPUID
instruction uses a single register value as input. The EAX
register is used to determine what information is produced by the CPUID
instruction. Depending on the value of the EAX
register, the CPUID
instruction will produce different information about the processor in the EBX
, ECX
, and EDX
registers. The information is returned as a series of bit values and flags, which must be interpreted to their proper meaning.
The following table shows the different output options available for the CPUID
instruction.
EAX Value | CPUID Output |
---|---|
0 | Vendor ID string, and the maximum CPUID option value supported |
1 | Processor type, family, model, and stepping information |
2 | Processor cache configuration |
3 | Processor serial number |
4 | Cache configuration (number of threads, number of cores, and physical properties) |
5 | Monitor information |
80000000h | Extended vendor ID string and supported levels |
80000001h | Extended processor type, family, model, and stepping information |
80000002h - 80000004h | Extended processor name string |
The sample program created in this chapter utilizes the zero option to retrieve the simple Vendor ID string from the processor. When the value of zero is placed in the EAX
register, and the CPUID
instruction is executed, the processor returns the Vendor ID string in the EBX
, EDX
, and ECX
registers as follows:
EBX
contains the low 4 bytes of the string.
EDX
contains the middle 4 bytes of the string.
ECX
contains the last 4 bytes of the string.
The string values are placed in the registers in little-endian format; thus, the first part of the string is placed in the lower bits of the register. Figure 4-2 shows how this works.
The sample program takes the register values and displays the information to the customer in a human-readable format. The next section presents the sample program.
Not all processors in the IA-32 platform utilize the
CPUID
instruction the same way. In a real application, you should perform a few tests to ensure that the processor supports theCPUID
instruction. To keep things simple, the example program presented in this chapter does not perform any of these tests. It's possible that you may be using a processor that does not support theCPUID
instruction, although most modern processors do support it (including Intel Pentium processors, Cyrix processors, and AMD processors).
Armed with your knowledge about how the CPUID
instruction works, it's time to start writing a simple program to utilize that information. This program is a simple application to check the Vendor ID string that is produced by the CPUID
instruction. Here's the sample program, cpuid.s
:
#cpuid.s Sample program to extract the processor Vendor ID .section .data output: .ascii "The processor Vendor ID is 'xxxxxxxxxxxx' " .section .text .globl _start _start: movl $0, %eax cpuid
movl $output, %edi movl %ebx, 28(%edi) movl %edx, 32(%edi) movl %ecx, 36(%edi) movl $4, %eax movl $1, %ebx movl $output, %ecx movl $42, %edx int $0x80 movl $1, %eax movl $0, %ebx int $0x80
This program uses quite a few different assembly language instructions. For now, don't worry too much about what they are; that will be described in detail in subsequent chapters. For now, concentrate on how the instructions are placed in the program, the flow of how they operate, and how the source code file is converted into an executable program file. So that you're not totally lost, here's a brief explanation of what's going on in the source code.
First, in the data section, a string value is declared:
output: .ascii "The processor Vendor ID is 'xxxxxxxxxxxx' "
The .ascii
declarative is used to declare a text string using ASCII characters. The string elements are predefined and placed in memory, with the starting memory location denoted by the label output
. The x's are used as placeholders in the memory area reserved for the data variable. When the vendor ID string is extracted from the processor, it will be placed in the data at those memory locations.
You should recognize the next section of the program from the template. It declares the instruction code section, and the normal starting label of the application:
.section .text .globl _start _start:
The first thing the program does is load the EAX
register with a value of zero, and then run the CPUID
instruction:
movl $0, %eax cpuid
The zero value in EAX
defines the CPUID
output option (the Vendor ID string in this case). After the CPUID
instruction is run, you must collect the response that is divided up between the three output registers:
movl $output, %edi movl %ebx, 28(%edi) movl %edx, 32(%edi) movl %ecx, 36(%edi)
The first instruction creates a pointer to use when working with the output
variable declared in memory. The memory location of the output
label is loaded into the EDI
register. Next, the contents of the three registers containing the Vendor ID string pieces are placed in the appropriate locations in the data memory, based on the EDI
pointer. The numbers outside the parentheses represent the location relative to the output
label where the data is placed. This number is added to the address in the EDI
register to determine what address the register's value is written to. This process replaces the x's that were used as placeholders with the actual Vendor ID string pieces (note that the Vendor ID string was divided into the registers in the strange order EBX
, EDX
, and ECX
).
When all of the Vendor ID string pieces are placed in memory, it's time to display the information:
movl $4, %eax movl $1, %ebx movl $output, %ecx movl $42, %edx int $0x80
This program uses a Linux system call (int $0x80
) to access the console display from the Linux kernel. The Linux kernel provides many preset functions that can be easily accessed from assembly applications. To access these kernel functions, you must use the int
instruction code, which generates a software interrupt, with a value of 0x80
. The specific function that is performed is determined by the value of the EAX
register. Without this kernel function, you would have to send each output character yourself to the proper I/O address of the display. The Linux system calls are a great time-saver for assembly language programmers.
The complete list of Linux system calls, and how to use them, is discussed in Chapter 12, "Using Linux System Calls."
The Linux write
system call is used to write bytes to a file. Following are the parameters for the wr
ite system call:
EAX
contains the system call value.
EBX
contains the file descriptor to write to.
ECX
contains the start of the string.
EDX
contains the length of the string.
If you are familiar with UNIX, you know that just about everything is handled as a file. The standard output (STDOUT) represents the display terminal of the current session, and has a file descriptor of 1. Writing to this file descriptor displays the information on the console screen.
The bytes to display are defined as a memory location to read the information from, and the number of bytes to display. The ECX
register is loaded with the memory location of the output
label, which defines the start of the string. Because the size of the output string is always the same, we can hard-code the size value in the EDX
register.
After the Vendor ID information is displayed, it's time to cleanly exit the program. Again, a Linux system call can help. By using system call 1 (the exit function), the program is properly terminated, and returns to the command prompt. The EBX
register contains the exit code value returned by the program to the shell. This can be used to produce different results in a shell script program, depending on situations within the assembly language program. A value of zero indicates the program executed successfully.
With the assembly language source code program saved as cpuid.s
, you can build the executable program using the GNU assembler and GNU linker as follows:
$ as -o cpuid.o cpuid.s $ ld -o cpuid cpuid.o $
The output from these commands is not too exciting (unless of course you had some typos in your code). The first step uses the as
command to assemble the assembly language source code into the object code file cpuid.o
. The second step uses ld
to link that object code file into the executable file cpuid
.
If you did have a typo in the source code, the assembler will indicate the line in which the typo is located:
$ as -o cpuid.o cpuid.s cpuid.s: Assembler messages: cpuid.s:15: Error: no such instruction: `mavl %edx,32(%edi)' $
After the linker generates the executable program file, it is ready to be run. Here's a sample output from my MEPIS system running on a Pentium 4 processor:
$ ./cpuid The processor Vendor ID is 'GenuineIntel' $
Excellent! The program ran as expected! One of the benefits of Linux is that some distributions will run on most any old piece of junk you might have sitting around. Here's the output from an old 200MHz PC with a Cyrix 6x86MX processor on which I ran Mandrake Linux 6.0:
$ ./cpuid The processor Vendor ID is 'CyrixInstead' $
You gotta love the humor of system engineers.
Because the GNU Common Compiler (gcc
) uses the GNU assembler to compile C code, you can also use it to assemble and link your assembly language program in a single step. While this is not a common method to use, it is available when necessary.
There is one problem when using gcc
to assemble your programs. While the GNU linker looks for the _start
label to determine the beginning of the program, gcc
looks for the main
label (you might recognize that from C or C++ programming). You must change both the _start
label and the .globl
directive defining the label in your program to look like the following:
.section .text .globl main main:
After doing that, it is a snap to assemble and link programs:
$ gcc -o cpuid cpuid.s $ ./cpuid The processor Vendor ID is 'GenuineIntel' $
In this simple example, unless you introduced some typing errors in the source code, the program should have run with the expected results. Unfortunately, that is not always the case in assembly language programming.
In more complicated programs, it is easy to make a mistake when assigning registers and memory locations, or trying special instruction codes to handle complex data issues. When this happens, it is good to have a debugger handy to step through the program and watch how the data is handled.
This section shows how to use the GNU debugger to walk through the sample program, watching how the registers and memory location are changed throughout the process.
In order to debug the assembly language program, you must first reassemble the source code using the -gstabs
parameter:
$ as -gstabs -o cpuid.o cpuid.s $ ld -o cpuid cpuid.o $
As with the first time it was assembled, the source code assembles with no error or warning messages. By specifying the -gstabs
parameter, extra information is assembled into the executable program file to help gdb
walk through the source code. While the executable program file created with the -gstabs
parameter still runs and behaves just like the original program, it is not a wise idea to use the -gstabs
parameter unless you are specifically debugging an application.
Because the -gstabs
parameter adds additional information to the executable program file, the resulting file becomes larger than it needs to be just to run the application. For this example program, assembling without the -gstabs
parameter produces the following file:
-rwxr-xr-x 1 rich rich 771 2004-07-13 07:32 cpuid
When assembling with the -gstabs
parameter, the program file becomes the following:
-rwxr-xr-x 1 rich rich 1099 2004-07-13 07:20 cpuid
Notice that the file size went from 771 bytes to 1,099 bytes. Although the difference is trivial for this example, imagine what happens with a 10,000-line assembly language program! Again, it is best to not use the debugging information if it is not necessary.
Now that the executable program file contains the necessary debugging information, you can run the program within gdb
:
$ gdb cpuid GNU gdb 6.0-debian Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-linux"... (gdb)
The GNU debugger starts, with the program loaded into memory. You can run the program from within gdb
using the run
command:
(gdb) run Starting program: /home/rich/palp/chap04/cpuid The processor Vendor ID is 'GenuineIntel' Program exited normally. (gdb)
As you can see from the output, the program ran within the debugger just as it did from the command line. That's not especially exciting. Now it's time to freeze the program as it starts, and step through each line of source code individually.
To do that, you must set a breakpoint. Breakpoints are places in the program code where you want the debugger to stop running the program and let you look at things. There are several different options you can use when setting a breakpoint. You can choose to stop execution at any of the following:
A label
A line number in the source code
A data value when it reaches a specific value
A function after it is performed a specific number of times
For this simple example, we will set a breakpoint at the beginning of the instruction codes, and watch the program as it progresses through the source code.
When specifying breakpoints in assembly language programs, you must specify the location relative to the nearest label. Because this sample program has only one label in the instruction code section, every breakpoint must be specified from _start
. The format of the break command is
break *label+offset
where label
is the label in the source code to reference, and offset
is the number of lines from the label where execution should stop.
To set a breakpoint at the first instruction, and then start the program, you would use the following commands:
(gdb) break *_start Breakpoint 1 at 0x8048075: file cpuid.s, line 11. (gdb) run Starting program: /home/rich/palp/chap04/cpuid The processor Vendor ID is 'GenuineIntel' Program exited normally. (gdb)
The breakpoint was specified using the *_start
parameter, which specifies the first instruction code after the _start
label. Unfortunately, when the program is run, it ignores the breakpoint, and runs through the entire program. This is a well-known bug in the current version of gdb
. It has been around for a while, but hopefully it will be fixed soon.
To work around this problem, you have to include a dummy instruction as the first instruction code element after the _start
label. In assembly, the dummy instruction is called NOP
, for no operation.
If you modify the cpuid.s
source code by adding a NOP
instruction immediately after the _start
label, it should look like this:
_start: nop movl $0, %eax cpuid
After adding the NOP
instruction, you can create a breakpoint at that location, signified as _start+1
. Now, after assembling with the -gstabs
parameter (and don't forget to link the new object code file), you can try out the debugger again:
(gdb) break *_start+1 Breakpoint 1 at 0x8048075: file cpuid.s, line 12. (gdb) run Starting program: /home/rich/palp/chap04/cpuid Breakpoint 1, _start () at cpuid.s:12 12 movl $0, %eax Current language: auto; currently asm (gdb)
Perfect! The program started and then paused at (what use to be) the first instruction code. Now you can step your way through the program using either the next
or step
commands:
(gdb) next _start () at cpuid.s:13 13 cpuid (gdb) next _start () at cpuid.s:14 14 movl $output, %edi (gdb) step _start () at cpuid.s:15 15 movl %ebx, 28(%edi) (gdb) step _start () at cpuid.s:16 16 @code last w/screen:movl %edx, 32(%edi)
Each next
or step
command executes the next line of source code (and tells you what line number that is). Once you have walked through the section you were interested in seeing, you can continue to run the program as normal using the cont
command:
(gdb) cont Continuing. The processor Vendor ID is 'GenuineIntel' Program exited normally. (gdb)
The debugger picks up from where it was stopped and finishes running the program as normal.
While it is good to walk through the program slowly, it is even better to be able to examine data elements as you are walking. The debugger provides a method for you to do that, as described in the next section.
Now that you know how to stop the program at specific locations, it's time to examine the data elements at each stop. Several different gdb
commands are used to examine the different types of data elements.
The two most common data elements to examine are registers and memory locations used for variables. The commands used for displaying this information are shown in the following table.
Data Command | Description |
---|---|
info registers | Display the values of all registers |
Display the value of a specific register or variable from the program | |
x | Display the contents of a specific memory location |
The info registers
command is great for seeing how all of the registers are affected by an instruction:
(gdb) s _start () at cpuid.s:13 13 cpuid (gdb) info registers eax 0x0 0 ecx 0x0 0 edx 0x0 0 ebx 0x0 0 esp 0xbffffd70 0xbffffd70 ebp 0x0 0x0 esi 0x0 0 edi 0x0 0 eip 0x804807a 0x804807a eflags 0x346 838 cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x0 0 (gdb) s _start () at cpuid.s:14 14 movl $output, %edi (gdb) info registers eax 0x2 2 ecx 0x6c65746e 1818588270 edx 0x49656e69 1231384169 ebx 0x756e6547 1970169159 esp 0xbffffd70 0xbffffd70 ebp 0x0 0x0 esi 0x0 0 edi 0x0 0 eip 0x804807c 0x804807c eflags 0x346 838 cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x0 0 (gdb)
This output shows that before the CPUID
instruction is executed, the EBX
, ECX
, and EDX
registers all contain zero. After the CPUID
instruction, they contain the values from the Vendor ID string.
The print command can also be used to display individual register values. Including a modifier can modify the output format of the print command:
print/d
to display the value in decimal
print/t
to display the value in binary
print/x
to display the value in hexadecimal
An example of the print
command would be the following:
(gdb) print/x $ebx $9 = 0x756e6547 (gdb) print/x $edx $10 = 0x49656e69 (gdb) print/x $ecx $11 = 0x6c65746e (gdb)
The x
command is used to display the values of specific memory locations. Similar to the print
command, the x
command output can be modified by a modifier. The format of the x
command is
x/nyz
where n
is the number of fields to display, y
is the format of the output, and can be
c
for character
d
for decimal
x
for hexadecimal
and z
is the size of the field to be displayed:
b
for byte
h
for 16-bit word (half-word)
w
for 32-bit word
The following example uses the x
command to display the memory locations at the output
label:
(gdb) x/42cb &output 0x80490ac <output>:84 'T' 104 'h' 101 'e' 32 ' ' 112 'p' 114 'r' 111 'o'99 'c' 0x80490b4 <output+8>:101 'e' 115 's' 115 's' 111 'o' 114 'r' 32 ' ' 86 'V' 101 'e' 0x80490bc <output+16>:110 'n' 100 'd' 111 'o' 114 'r' 32 ' ' 73 'I' 68 'D' 32 ' ' 0x80490c4 <output+24>:105 'i' 115 's' 32 ' ' 39 ''' 71 'G' 101 'e' 110 'n'117 'u' 0x80490cc <output+32>:105 'i' 110 'n' 101 'e' 73 'I' 110 'n' 116 't' 101 'e'108 'l' 0x80490d4 <output+40>:39 ''' 10 ' ' (gdb)
This command displays the first 42 bytes of the output variable (the ampersand sign is used to indicate that it is a memory location) in character mode (which also shows the decimal values as well). This feature is invaluable when tracking instructions that manipulate memory locations.
The cpuid.s
program used the Linux system calls to display the Vendor ID string information on the console. There are other ways to perform this function without using the system calls.
One method is to use the standard C library functions that are well known to C programmers. It is easy to tap into that resource to utilize many common C functions.
This section describes how to utilize C library functions within your assembly language programs. First, the common printf
C function is described, and a new version of the cpuid.s
program is shown using the printf
function. Then, the next section shows how to assemble and link programs that use C library functions.
The original cpuid.s
program used Linux system calls to display the results. If you have the GNU C compiler installed on your system, you can just as easily use the common C functions that you are probably already familiar with.
The C libraries contain many of the functions that are common to C programs, such as printf
and exit
. For this version of the program, the Linux system calls are replaced with equivalent C library calls. Here's the cpuid2.s
program:
#cpuid2.s View the CPUID Vendor ID string using C library calls .section .data output: .asciz "The processor Vendor ID is '%s' " .section .bss .lcomm buffer, 12 .section .text .globl _start _start: movl $0, %eax cpuid movl $buffer, %edi movl %ebx, (%edi) movl %edx, 4(%edi) movl %ecx, 8(%edi) pushl $buffer pushl $output call printf addl $8, %esp pushl $0 call exit
The printf
function uses multiple input parameters, depending on the variables to be displayed. The first parameter is the output string, with the proper codes used to display the variables:
output: .asciz "The processor Vendor ID is '%s' "
Notice that this uses the .asciz
directive instead of .ascii
. The printf
function expects a null-terminated string as the output string. The .asciz
directive adds the null character to the end of the defined string.
The next parameter used is the buffer that will contain the Vendor ID string. Because the value of the buffer does not need to be defined, it is declared in the bss section as a 12-byte buffer area using the .lcomm
directive:
.section .bss .lcomm buffer, 12
After the CPUID
instruction is run, the registers containing the Vendor ID string pieces are placed in the buffer
variable in the same way that they were in the original cpuid.s
program.
To pass the parameters to the printf
C function, you must push them onto the stack. This is done using the PUSHL
instruction. The parameters are placed on the stack in reverse order from how the printf
function retrieves them, so the buffer value is placed first, followed by the output string value. After that, the printf
function is called using the CALL
instruction:
pushl $buffer pushl $output call printf addl $8, %esp
The ADDL
instruction is used to clear the parameters placed on the stack for the printf
function. The same technique is used to place a zero return value on the stack for the C exit
function to use.
When you use C library functions in your assembly language program, you must link the C library files with the program object code. If the C library functions are not available, the linker will fail:
$ as -o cpuid2.o cpuid2.s $ ld -o cpuid2 cpuid2.o cpuid2.o: In function `_start': cpuid2.o(.text+0x3f): undefined reference to `printf' cpuid2.o(.text+0x46): undefined reference to `exit' $
In order to link the C function libraries, they must be available on your system. On Linux systems, there are two ways to link C functions to your assembly language program. The first method is called static linking. Static linking links function object code directly into your application executable program file. This creates huge executable programs, and wastes memory if multiple instances of the program are run at the same time (each instance has its own copy of the same functions).
The second method is called dynamic linking. Dynamic linking uses libraries that enable programmers to reference the functions in their applications, but not link the function codes in the executable program file. Instead, dynamic libraries are called at the program's runtime by the operating system, and can be shared by multiple programs.
On Linux systems, the standard C dynamic library is located in the file libc.so.x
, where x
is a value representing the version of the library. On my MEPIS system, this is the file libc.so.5
. This library file contains the standard C functions, including printf
and exit
.
This file is automatically linked to C programs when using gcc
. You must manually link it to your program object code for the C functions to operate. To link the libc.so
file, you must use the -l
parameter of the GNU linker. When using the -l
parameter, you do not need to specify the complete library name. The linker assumes that the library will be in a file:
/lib/libx.so
where the x
is the library name specified on the command-line parameter—in this case, the letter c
. Thus, the command to link the program would be as follows:
$ ld -o cpuid2 -lc cpuid2.o $ ./cpuid2 bash: ./cpuid2: No such file or directory $
Well, that's interesting. The program object code linked with the standard C functions library file just fine, but when I tried to run the resulting executable file, the preceding error message was generated.
The problem is that the linker was able to resolve the C functions, but the functions themselves were not included in the final executable program (remember that we used a dynamically linked library). The linker assumed that the necessary library files would be found at runtime. Obviously, that was not the case in this instance.
To solve this problem, you must also specify the program that will load the dynamic library at runtime. For Linux systems, this program is ld-linux.so.2
, normally found in the /lib
directory. To specify this program, you must use the -dynamic-linker
parameter of the GNU linker:
$ ld -dynamic-linker /lib/ld-linux.so.2 -o cpuid2 -lc cpuid2.o $ ./cpuid2 The processor Vendor ID is 'GenuineIntel' $
There, that's much better. Now when the executable program is run, it uses the ld-linux.so.2
dynamic loader program to find the libc.so
library, and the program runs just fine.
It is also possible to use the gcc
compiler to assemble and link the assembly language program and C library functions. In fact, in this case it's a lot easier. The gcc
compiler automatically links in the necessary C libraries without you having to do anything special.
First, remember that to compile assembly language programs with gcc
, you must change the _start
label to main
. After that, all you need to do is compile the source code with a single command:
$ gcc -o cpuid2 cpuid2.s $ ./cpuid2 The processor Vendor ID is 'GenuineIntel' $
The GNU compiler automatically linked the proper C library functions for you.
When creating your assembly language programs, it is a good idea to have a common program template for the assembler you are using. The template can be used as a starting point for all programs that are created with the assembler.
The template used with the GNU assembler requires specific sections to be defined. The GNU assembler uses sections to divide the different data areas within the program. The data section contains data that is placed in specific memory locations, referenced by labels. The program can refer to the data memory area by the label, and modify the memory locations as necessary. The bss section is used to contain uninitialized data elements, such as working buffers. This is ideal for creating large buffer areas. The text section is used to hold the actual instruction codes for the program. Once this area is created, it cannot be changed by the program.
The final piece of the template should define the starting point in your programs. The GNU assembler uses the _start
label to declare the location of the first instruction to process. You can use a different label, but the label must then be specified with the –e
parameter in the linker command. To make the _start
label accessible to run, you must also define it as a global label. This is done using the .globl
directive in the source code.
With a template ready, you can start creating programs. This chapter created a simple test program using the CPUID
instruction to extract the Vendor ID string from the processor. The program was assembled using the GNU assembler, and linked using the GNU linker.
After the program was tested, the GNU debugger was used to show how to debug assembly language programs. The programs must be assembled using the -gstabs
parameter, so the debugger can match instruction codes with source code lines. Also remember to include a NOP
instruction immediately after the _start
label if you need to stop the program execution before the first instruction code.
The GNU debugger enables you to walk through the program code line by line, watching the values of registers and memory locations along the way. This is an invaluable tool when trying to hunt down logic problems in algorithms, or even typos where the wrong register is used in an instruction.
Finally, the sample program was modified to show how to utilize C functions within assembly language programs. The printf
and exit
functions were used to display data and cleanly exit the program. To use C functions, the assembly language program must be linked with the C libraries on the host system. The best way to do that is to use the C dynamic libraries. Linking using dynamic libraries requires another command-line parameter for the linker, the -dynamic-linker
parameter. This specifies the program used by the operating system to dynamically find and load the library files.
This ends the introduction to the assembly language section. It is hoped that you now have a good idea of what assembly language is, and how it will be beneficial to your high-level language applications. The next section of the book shows the basics of assembly language programming. The next chapter tackles the sometimes difficult task of manipulating data within assembly language programs.