Chapter 13. Using Inline Assembly

Now that you know the basics of assembly language programming, it's time to start putting those concepts to practical use. One very common use of assembly language programming is to code assembly functions within higher-level languages, such as C and C++. There are a couple of different ways to do this. This chapter describes how to place assembly language functions directly within C and C++ language programs. This technique is called inline assembly.

The chapter begins by describing how C and C++ programs use functions, and how the functions are converted to assembly language code by the compiler. Next, the basic inline assembly format is discussed, including how to incorporate simple assembly functions. After that, the extended inline assembly format is described. This format enables you to incorporate more complex assembly language functions within the C or C++ programs. Finally, the chapter explains how to define macros using complex inline assembly language functions within C programs.

What Is Inline Assembly?

In a standard C or C++ program, code is entered in the C or C++ syntax in a text source code file. The source code file is then compiled into assembly language code using the compiler. After that step, the assembly language code is linked with any required libraries to produce an executable program (see Chapter 3, "The Tools of the Trade").

In the Linux world, the GNU compiler (gcc) is used to create the executable program from the text source code file. Normally, the step of converting the code to assembly language is hidden from the programmer. But as shown in Chapter 3, you can use the -S option of the GNU compiler to view the actual assembly language code generated from the source code.

A common programming technique in C and C++ programming is to create separate standalone functions within the source code file. These functions perform individual processes that can be called multiple times from the main program. When a C or C++ program is divided into functions, the compiler compiles each function into separate assembly functions (see Chapter 11, "Using Functions"). The functions are still contained within the same assembly language file, but as separate functions. To see what is produced, you can still use the -S option to compile the program and view the generated assembly language code.

To demonstrate this, the cfunctest.c program uses separate functions within a simple C language program:

/* cfunctest.c – An example of functions in C */
#include <stdio.h>

float circumf(int a)
{
        return 2 * a * 3.14159;
}

float area(int a)
{
        return a * a * 3.14159;
}

int main()
{
        int x = 10;
        printf("Radius: %d
", x);
        printf("Circumference: %f
", circumf(x));
        printf("Area: %f
",area(x));
        return 0;
}

The two functions are defined as having a single integer value for the input, and producing a double-precision floating-point value as the output. The mathematical calculations are performed within the individual functions, separate from the main program code. The functions can be called as many times as required within the main program without having to write additional code.

To view the assembly language code generated by the compiler, compile using the -S option:

$ gcc –S cfunctest.c

This command creates the file cfunctest.s, which looks like this:

.file   "cfunctest.c"
        .version        "01.01"
gcc2_compiled.:
                .section        .rodata
        .align 8
.LC0:
        .long   0xf01b866e,0x400921f9
.text
        .align 16
.globl circumf
        .type    circumf,@function
circumf:
pushl   %ebp
        movl    %esp, %ebp
        subl    $4, %esp
        movl    8(%ebp), %eax
        addl    %eax, %eax
        pushl   %eax
        fildl   (%esp)
        popl    %eax
        fldl    .LC0
        fmulp   %st, %st(1)
        fstps   −4(%ebp)
        flds    −4(%ebp)
        movl    %ebp, %esp
        popl    %ebp
        ret
.Lfe1:
        .size    circumf,.Lfe1-circumf
                .section        .rodata
        .align 8
.LC2:
        .long   0xf01b866e,0x400921f9
.text
        .align 16
.globl area
        .type    area,@function
area:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $4, %esp
        movl    8(%ebp), %eax
        imull   8(%ebp), %eax
        pushl   %eax
        fildl   (%esp)
        popl    %eax
        fldl    .LC2
        fmulp   %st, %st(1)
        fstps   −4(%ebp)
        flds    −4(%ebp)
        movl    %ebp, %esp
        popl    %ebp
        ret
.Lfe2:
        .size    area,.Lfe2-area
                .section        .rodata
.LC4:
        .string "Radius: %d
"
.LC5:
        .string "Circumference: %f
"
.LC6:
        .string "Area: %f
"
.text
        .align 16
.globl main
.type    main,@function
main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        movl    $10, −4(%ebp)
        subl    $8, %esp
        pushl   −4(%ebp)
        pushl   $.LC4
        call    printf
        addl    $16, %esp
        subl    $4, %esp
        subl    $8, %esp
        pushl   −4(%ebp)
        call    circumf
        addl    $12, %esp
        leal    −8(%esp), %esp
        fstpl   (%esp)
        pushl   $.LC5
        call    printf
        addl    $16, %esp
        subl    $4, %esp
        subl    $8, %esp
        pushl   −4(%ebp)
        call    area
        addl    $12, %esp
        leal    −8(%esp), %esp
        fstpl   (%esp)
        pushl   $.LC6
        call    printf
        addl    $16, %esp
        movl    $0, %eax
        movl    %ebp, %esp
        popl    %ebp
        ret
.Lfe3:
        .size    main,.Lfe3-main
        .ident  "GCC: (GNU) 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk)"

By now you should be able to understand the assembly language code generated by the compiler. The two C functions were created as separate assembly language functions, set apart from the main program code. The main program uses the standard C style function format to pass the input parameter to the functions (by placing the input value onto the top of the stack). The CALL instruction is used to invoke the functions from the main program.

In this simple example, the assembly code generated to implement the functions was fairly trivial. However, in a more complex application, you may not want the compiler to generate the assembly language code, or you may want to use assembly language instructions that the compiler is incapable of producing (such as the CPUID instruction).

If you want to directly control what assembly language code is generated to implement a function, you can do one of three things:

  • Implement the function from scratch in assembly language code and call it from the C program.

  • Create the assembly language version of the C code using the -S option, modify the assembly language code as necessary, and then link the assembly code to create the executable.

  • Create the assembly language code for the functions within the original C code and compile it using the standard C compiler.

The first option is discussed in Chapter 14, "Calling Assembly Libraries." The second option is discussed in Chapter 15, "Optimizing Routines." The third option is exactly how inline assembly language programming works. This method enables you to create assembly language functions within the C or C++ source code itself, without having to link additional libraries or programs. It gives you greater control over how certain functions are implemented at the assembly language level of the final program.

Basic Inline Assembly Code

Creating inline assembly code is not much different from creating assembly functions, except that it is done within a C or C++ program. This section describes how to create basic inline assembly code functions that can implement simple assembly language code within C or C++ programs.

The asm format

The GNU C compiler uses the asm keyword to denote a section of source code that is written in assembly language. The basic format of the asm section is as follows:

asm( "assembly code" );

The assembly code contained within the parentheses must be in a specific format:

  • The instructions must be enclosed in quotation marks.

  • If more than one instruction is included, the newline character must be used to separate each line of assembly language code. Often, a tab character is also included to help indent the assembly language code to make lines more readable.

The second rule is required because the compiler takes the assembly code in the asm section verbatim and places it within the assembly code generated for the program. Each assembly language instruction must be on a separate line—thus, the requirement to include the newline character.

Some assemblers also require instructions to be indented by a tab character to distinguish them from labels. The GNU assembler does not require this, but many programmers include the tab character for consistency.

These requirements can create some confusing-looking assembly code in the source code, but it helps make things sane in the generated assembly language code.

A sample basic inline assembly section could look like this:

asm ("movl $1, %eax
	movl $0, %ebx
	int $0x80");

This example uses three instructions: two MOVL instructions to place a one value in the EAX register and a zero value in the EBX register, and the INT instruction to perform the Linux system call.

This format can get somewhat messy when using a lot of assembly instructions. Most programmers place instructions on separate lines. When doing this, each instruction must be enclosed in quotation marks:

asm ( "movl $1, %eax
	"
      "movl $0, %ebx
	"
      "int $0x80");

This format is much easier to read when trying to debug an application. The asm section can be placed anywhere within the C or C++ source code. The following asmtest.c program demonstrates how the asm section would look in an actual program:

/* asmtest.c - An example of using an asm section in a program*/
#include <stdio.h>

int main()
{
   int a = 10;
   int b = 20;
   int result;
   result = a * b;
   asm ( "nop");
   printf("The result is %d
", result);
   return 0;
}

The assembly language instruction used in the asm statement (the NOP instruction) does not do anything in the C program, but will appear in the assembly language code generated by the compiler. To generate the assembly language code for this program, use the -S option of the gcc command. The generated assembly code file should look like this:

.file   "asmtest.c"
        .section        .rodata
.LC0:
        .string "The result is %d
"
        .text
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $24, %esp
        andl    $-16, %esp
        movl    $0, %eax
        subl    %eax, %esp
        movl    $10, −4(%ebp)
        movl    $20, −8(%ebp)
movl    −4(%ebp), %eax
        imull   −8(%ebp), %eax
        movl    %eax, −12(%ebp)
#APP
        nop

#NO_APP
        movl    −12(%ebp), %eax
              movl    %eax, 4(%esp)
        movl    $.LC0, (%esp)
        call    printf
        movl    $0, %eax
        leave
        ret
        .size   main, .-main
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.3.2 (Debian)"

The generated code uses the normal C style function prologue and the LEAVE instruction to implement the standard epilogue (see Chapter 11). Within the prologue and epilogue code is the code generated by the C source code, and within that is a section identified by the #APP and #NO APP symbols. This section contains the inline assembly code specified by the asm section. Note how the code is placed using the newline and tab characters specified.

Using global C variables

Just implementing assembly language code itself won't be able to accomplish much. To do any real work, there must be a way to pass data into and out of the inline assembly language function.

The basic inline assembly code can utilize global C variables defined in the application. The word to remember here is "global." Only globally defined variables can be used within the basic inline assembly code. The variables are referenced by the same names used within the C program.

The globaltest.c program demonstrates how to do this:

/* globaltest.c - An example of using C variables */
#include <stdio.h>

int a = 10;
int b = 20;
int result;

int main()
{
        asm ( "pusha
	"
              "movl a, %eax
	"
              "movl b, %ebx
	"
              "imull %ebx, %eax
	"
              "movl %eax, result
	"
              "popa");
        printf("the answer is %d
", result);
        return 0;
}

The a, b, and result variables are defined as global variables in the C program, and are used within the asm section of the code. Note that the values are used as memory locations within the assembly language code, and not as immediate data values. The variables can also be used elsewhere in the C program as normal.

Remember that the data variables must be declared as global. You cannot use local variables within the asm section.

The generated assembly code from the compiler looks like this:

.file   "globaltest.c"
.globl a
        .data
        .align 4
        .type   a, @object
        .size   a, 4
a:
        .long   10
.globl b
        .align 4
        .type   b, @object
        .size   b, 4
b:
        .long   20
        .section        .rodata
.LC0:
        .string "The result is %d
"
        .text
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
subl    $8, %esp
        andl    $-16, %esp
        movl    $0, %eax
        subl    %eax, %esp
#APP
        pusha
        movl a, %eax
        movl b, %ebx
        imull %ebx, %eax
        movl %eax, result
        popa
#NO_APP
        movl    result, %eax
        movl    %eax, 4(%esp)
        movl    $.LC0, (%esp)
        call    printf
        movl    $0, %eax
        leave
        ret
        .size   main, .-main
        .comm   result,4,4
.section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.3.2 (Debian)"

Notice how the a and b variables are declared in the .data section and assigned the proper values. The result variable, because it is not initialized in the C code, is declared as a .comm value.

One other important feature is shown in this example program. Notice the PUSHA instruction at the start of the assembly language code, and the POPA instruction at the end. It is important to remember to store the initial values of the registers before entering your code, and then restore them when you are done. It's quite possible that the compiler will use those registers for other values within the compiled C source code. If you modify them in your asm section, unpredictable things may occur.

Using the volatile modifier

When creating inline assembly code in your application, you must be aware of what the compiler may do to it during the compile operation. In a normal C or C++ application, the compiler may attempt to optimize the generated assembly code to increase performance. This is usually done by eliminating functions that are not used, sharing registers between values that are not concurrently used, and rearranging code to facilitate better flow of the program.

Sometimes optimization is not a good thing with inline assembly functions. It is possible that the compiler may look at the inline code and attempt to optimize it as well, possibly producing undesirable effects.

If you want the compiler to leave your hand-coded inline assembly function alone, you can just say so! The volatile modifier can be placed in the asm statement to indicate that no optimization is desired on that section of code. The format of the asm statement using the volatile modifier is as follows:

asm volatile ("assembly code");

The assembly code within the statement uses the standard rules it would use without the volatile modifier. Nor does the addition of the volatile modifier change the requirement to store and retrieve the register values within the inline assembly code.

Using an alternate keyword

The asm keyword used to identify the inline assembly code section may be altered if necessary. The ANSI C specifications use the asm keyword for something else, preventing you from using it for your inline assembly statements. If you are writing code using the ANSI C conventions, you must use the __asm__ keyword instead of the normal asm keyword.

The assembly code section within the statement does not change, just the asm keyword, as shown in the following example:

____asm__ ("pusha
	"
       "movl a, %eax
	"
       "movl b, %ebx
	"
       "imull %ebx, %eax
	"
       "movl %eax, result
	"
       "popa");

The __asm__ keyword can also be modified using the __volatile__ modifier.

Extended ASM

The basic asm format provides an easy way to create assembly code, but it has its limitations. For one, all input and output values have to use global variables from the C program. In addition, you have to be extremely careful not to change the values of any registers within the inline assembly code.

The GNU compiler provides an extended format for the asm section that helps solve these problems. The extended format provides additional options that enable you to more precisely control how the inline assembly language code is generated within the C or C++ language program. This section describes the extended asm format.

Extended ASM format

Because the extended asm format provides additional features to use, they must be included in the new format. The format of the extended version of asm looks like this:

asm ("assembly code" : output locations : input operands : changed registers);

This format consists of four parts, each separated by a colon:

  • Assembly code: The inline assembly code using the same syntax used for the basic asm format

  • Output locations: A list of registers and memory locations that will contain the output values from the inline assembly code

  • Input operands: A list of registers and memory locations that contain input values for the inline assembly code

  • Changed registers: A list of any additional registers that are changed by the inline code

Not all of the sections are required to be present in the extended asm format. If no output values are associated with the assembly code, the section must be blank, but two colons must still separate the assembly code from the input operands. If no registers are changed by the inline assembly code, the last colon may be omitted.

The following sections describe how to use the extended asm format.

Specifying input and output values

In the basic asm format, input and output values are incorporated using the C global variable name within the assembly code. Things are a little different when using the extended format.

In the extended format, you can assign input and output values from both registers and memory locations. The format of the input and output values list is

"constraint"(variable)

where variable is a C variable declared within the program. In the extended asm format, both local and global variables can be used. The constraint defines where the variable is placed (for input values) or moved from (for output values). This is what defines whether the value is placed in a register or a memory location.

The constraint is a single-character code. The constraint codes are shown in the following table.

Constraint

Description

a

Use the %eax, %ax, or %al registers.

b

Use the %ebx, %bx, or %bl registers.

c

Use the %ecx, %cx, or %cl registers.

d

Use the %edx, %dx, or $dl registers.

S

Use the %esi or %si registers.

D

Use the %edi or %di registers.

r

Use any available general-purpose register.

q

Use either the %eax, %ebx, %ecx, or %edx register.

A

Use the %eax and the %edx registers for a 64-bit value.

f

Use a floating-point register.

t

Use the first (top) floating-point register.

u

Use the second floating-point register.

m

Use the variable's memory location.

o

Use an offset memory location.

V

Use only a direct memory location.

i

Use an immediate integer value.

n

Use an immediate integer value with a known value.

g

Use any register or memory location available.

In addition to these constraints, output values include a constraint modifier, which indicates how the output value is handled by the compiler. The output modifiers that can be used are shown in the following table.

Output Modifier

Description

+

The operand can be both read from and written to.

=

The operand can only be written to.

%

The operand can be switched with the next operand if necessary.

&

The operand can be deleted and reused before the inline functions complete.

The easiest way to see how the input and output values work is to see some examples. This example:

asm ("assembly code" : "=a"(result) : "d"(data1), "c"(data2));

places the C variable data1 into the EDX register, and the variable data2 into the ECX register. The result of the inline assembly code will be placed into the EAX register, and then moved to the result variable.

Using registers

If the input and output variables are assigned to registers, the registers can be used within the inline assembly code almost as normal. I use the word "almost" because there is one oddity to deal with.

In extended asm format, to reference a register in the assembly code you must use two percent signs instead of just one (the reason for this will be discussed a little later). This makes the code a little odd looking, but not too much different.

The regtest1.c program demonstrates using registers within the extended asm format:

/* regtest1.c - An example of using registers */
#include <stdio.h>

int main()
{
   int data1 = 10;
   int data2 = 20;
   int result;

   asm ("imull %%edx, %%ecx
	"
        "movl %%ecx, %%eax"
        : "=a"(result)
        : "d"(data1), "c"(data2));

   printf("The result is %d
", result);
   return 0;
}

This time, the C variables are declared as local variables, which you couldn't do with the basic asm format. Each C variable is assigned to a specific register. The output register is modified with the equal sign to indicate that it can only be written to by the assembly code (this is required for all output values in the inline code).

When the C program is compiled, the compiler automatically generates the assembly code necessary to place the C variables in the appropriate registers to implement the inline assembly code. You can see what is generated again by using the -S option. The inline code generated looks like this:

movl    $10, −4(%ebp)
        movl    $20, −8(%ebp)
        movl    −4(%ebp), %edx
        movl    −8(%ebp), %ecx
#APP
        imull %edx, %ecx
movl %ecx, %eax
#NO_APP
        movl    %eax, −12(%ebp)

The compiler moved the data1 and data2 values onto the stack spaces reserved for the C variables. The values were then loaded into the EDX and ECX registers required by the inline assembly code. The resulting output in the EAX register was then moved to the result variable location on the stack.

You don't always need to specify the output value in the inline assembly section. Some assembly instructions already assume that the input values contain the output values.

The MOVS instructions include the output location within the input values. The movstest.c program demonstrates this:

/* movstest.s - An example of instructions with only input values */
#include <stdio.h>

int main()
{
   char input[30] = {"This is a test message.
"};
   char output[30];
   int length = 25;

   asm volatile ("cld
	"
       "rep movsb"
      :
      : "S"(input), "D"(output), "c"(length));

   printf("%s", output);
   return 0;
}

The movstest.c program specifies the required three input values for the MOVS instruction as input values. The location of the string to copy is placed in the ESI register, the location of the destination is placed in the EDI register, and the length of the string to copy is placed in the ECX register (remember to include the terminating null character in the string length).

The output value is already defined as one of the input values, so no output values are specifically defined in the extended format. Because no specific output values are defined, it is important to use the volatile keyword; otherwise, the compiler may remove the asm section as unnecessary, as it doesn't produce an output.

Using placeholders

In the regtest1.c example, the input values were placed in specific registers declared in the inline assembly section, and the registers were specifically utilized in the assembly instructions. While this worked fine for just a few input values, for functions that require a lot of input values this is a somewhat tedious way in which to use them.

To help you out, the extended asm format provides placeholders that can be used to reference input and output values within the inline assembly code. This enables you to declare input and output values in any register or memory location that is convenient for the compiler.

The placeholders are numbers, preceded by a percent sign. Each input and output value listed in the inline assembly code is assigned a number based on its location in the listing, starting with zero. The placeholders can then be used in the assembly code to represent the values.

For example, the following inline code:

asm ("assembly code"
     : "=r"(result)
     : "r"(data1), "r"(data2));

will produce the following placeholders:

  • %0 will represent the register containing the result variable value.

  • %1 will represent the register containing the data1 variable value.

  • %2 will represent the register containing the data2 variable value.

Notice that the placeholders provide a method for utilizing both registers and memory locations within the inline assembly code. The placeholders are used in the assembly code just as the original data types would be:

imull %1, %2
movl %2, %0

Remember that you must declare the input and output values as the proper storage elements (registers or memory) required by the assembly instructions in the inline code. In this example, both of the input values were required to be loaded into registers for the IMULL instruction.

To demonstrate using placeholders, the regtest2.c program performs the same function as the regtest1.c program, but enables the compiler to choose which registers to use:

/* regtest2.c - An example of using placeholders */
#include <stdio.h>

int main()
{
   int data1 = 10;
   int data2 = 20;
   int result;

   asm ("imull %1, %2
	"
        "movl %2, %0"
        : "=r"(result)
        : "r"(data1), "r"(data2));

   printf("The result is %d
", result);
   return 0;
}

The regtest2.c program uses the r constraint when defining the input and output values, using registers for all of the data requirements. The compiler selects the registers used when the assembly language code for the program is generated. You can see this by viewing the generated assembly code with the -S option:

movl    $10, −4(%ebp)
        movl    $20, −8(%ebp)
        movl    −4(%ebp), %edx
        movl    −8(%ebp), %eax
#APP
        imull %edx, %eax
        movl %eax, %eax
#NO_APP
        movl    %eax, −12(%ebp)

My compiler elected to do something interesting when the assembly code was generated. It used the EDX register to hold the data1 value, and the EAX register to hold the data2 value, as we would normally expect. The interesting part is that it noticed that the result was generated after the input values were finished being used, so it assigned the result variable to the EAX register as well. My poorly constructed inline assembly code still performed the MOVL instruction, but it just moved the EAX register to itself.

You can watch the running program in the debugger to see if the MOVL instruction is really executed. To generate an executable that can be used in the debugger, you can use the -gstabs option with the gcc compiler:

$ gcc -gstabs -o regtest2 regtest2.c

When the executable is created, it can then be run in the debugger:

$ gdb -q regtest2
(gdb) break *main
Breakpoint 1 at 0x8048364: file regtest2.c, line 4.
(gdb) run
Starting program: /home/rich/palp/chap13/regtest2

Breakpoint 1, main () at regtest2.c:4
4       {
(gdb) s
5          int data1 = 10;
(gdb) s
6          int data2 = 20;
(gdb) s
9          asm ("imull %1, %2
	"
(gdb) s
14         printf("The result is %d
", result);
(gdb) info reg
eax            0xc8     200
ecx            0x1      1
edx            0xa      10

To set a breakpoint in a C program, you can specify either the line number to start or a function label. This example set the breakpoint at the main() function label, or the start of the program.

One thing you may notice as you are stepping through the program is that the asm section is considered a single statement by the debugger. You can step into the asm section using the stepi debugger command and execute each instruction separately.

The registers listing shows that after the asm section, the data1 value was loaded into the EDX register, and the EAX register was used as the result variable.

Referencing placeholders

As you saw in the regtest2.c program, I needlessly used a MOVL instruction to produce the output value in the proper variable. Sometimes it is beneficial to use the same variable as both an input value and an output value. To do this, you must define the input and output values differently in the extended asm section.

If an input and output value in the inline assembly code share the same C variable from the program, you can specify that using the placeholders as the constraint value. This can create some odd-looking code, but it comes in handy to reduce the number of registers required in the code.

To fix the inline code from the regtest2.c program, you could write the following:

asm ("imull %1, %0"
     : "=r"(data2)
     : "r"(data1), "0"(data2));

The 0 tag signals the compiler to use the first named register for the output value data2. The first named register is defined in the second line, which assigns a register to the data2 input variable. This ensures that the same register will be used to hold the input and output values. Of course, the result will be placed in the data2 value when the inline code is complete.

The regtest3.c program demonstrates this:

/* regtest3.c - An example of using placeholders for a common value */
#include <stdio.h>

int main()
{
   int data1 = 10;
   int data2 = 20;

   asm ("imull %1, %0"
        : "=r"(data2)
        : "r"(data1), "0"(data2));

   printf("The result is %d
", data2);
   return 0;
}

The regtest3.c program uses the data2 value as both an input value and the output value.

Alternative placeholders

If you are working with a lot of input and output values, the numeric placeholders can quickly become confusing. To help keep things sane, the GNU compiler (starting with version 3.1) enables you to declare alternative names as placeholders.

The alternative name is defined within the sections in which the input and output values are declared. The format is as follows:

%[name]"constraint"(variable)

The name value defined becomes the new placeholder identifier for the variable in the inline assembly code, as shown in the following example:

asm ("imull %[value1], %[value2]"
     : [value2] "=r"(data2)
     : [value1] "r"(data1), "0"(data2));

The alternative placeholder names are used in the same way as the normal placeholders were, as demonstrated in the following alttest.c program:

/* alttest.c - An example of using alternative placeholders */
#include <stdio.h>

int main()
{
   int data1 = 10;
   int data2 = 20;

   asm ("imull %[value1], %[value2]"
        : [value2] "=r"(data2)
        : [value1] "r"(data1), "0"(data2));

   printf("The result is %d
", data2);
   return 0;
}

Changed registers list

You may have noticed in the examples presented so far that I have not used the changed registers list in the extended asm format, even though it is obvious that each of the programs contained registers that were changed.

The compiler assumes that registers used in the input and output values will change, and handles that accordingly. You do not need to include these values in the changed registers list. In fact, if you do, it will produce an error message, as demonstrated in the following badregtest.c program:

/* badregtest.c - An example of incorrectly using the changed registers list */
#include <stdio.h>

int main()
{
   int data1 = 10;
   int result = 20;

   asm ("addl %1, %0"
        : "=d"(result)
        : "c"(data1), "0"(result)
        : "%ecx", "%edx");

   printf("The result is %d
", result);
   return 0;
}

The badregtest.c program specifies that the result variable should be loaded into the EDX register and the data1 variable into the ECX register. The changed registers list incorrectly specifies that the ECX and EDX registers change within the inline code. Note that the registers are listed in the changed registers list using the full register names, not just a single letter as with the input and output register definitions. Using the percent sign with the register name is optional.

When you try to compile this program, an error will be produced:

$ gcc -o badregtest badregtest.c
badregtest.c: In function 'main':
badregtest.c:8: error: can't find a register in class 'DREG' while reloading 'asm'
$

The compiler already knew that the EDX register was used as a register, and it could not properly handle the request for the changed register list.

The proper use of the changed register list is to notify the compiler if your inline assembly code uses any additional registers that were not initially declared as input or output values. The compiler must know about these registers so it knows to avoid using them, as demonstrated in the changedtest.c program:

/* changedtest.c – An example of setting registers in the changed registers list */
#include <stdio.h>

int main()
{
   int data1 = 10;
   int result = 20;

   asm ("movl %1, %%eax
	"
        "addl %%eax, %0"
        : "=r"(result)
        : "r"(data1), "0"(result)
        : "%eax");

   printf("The result is %d
", result);
   return 0;
}

In the changedtest.c program, the inline assembly code uses the EAX register as an intermediate location to store a data value. Because the register was not declared as an input or output value, it must be included in the changed registers list.

Now that the compiler knows that the EAX register is not available, it will work around that. The input and output values were declared using the r constraint, which enables the compiler to select the registers to use. Looking at the generated assembly language code, you can see which registers were selected:

movl    $10, −4(%ebp)
        movl    $20, −8(%ebp)
        movl    −4(%ebp), %ecx
        movl    −8(%ebp), %edx
#APP
        movl %ecx, %eax
        addl %eax, %edx
#NO_APP
        movl    %edx, %eax

The code for moving the C variables into registers uses the ECX and EDX registers (remember that in the regtest2.c program it used the EAX and EDX registers). The compiler purposely avoided using the EAX register, as it was declared as being used in the inline assembly code.

There is one oddity with the changed registers list: If you use any memory locations within the inline assembly code that are not defined in the input or output values, that must be tagged as being corrupted as well. The word "memory" is used in the changed registers list to flag the compiler that memory locations were altered within the inline assembly code.

Using memory locations

Although using registers in the inline assembly language code is faster, you can also directly use the memory locations of the C variables. The m constraint is used to reference memory locations in the input and output values. Remember that you still have to use registers for the assembly instructions that require them, so you may have to define intermediate registers to hold the data. The memtest.c program demonstrates this:

/* memtest.c - An example of using memory locations as values */
#include <stdio.h>

int main()
{
   int dividend = 20;
   int divisor = 5;
   int result;

   asm("divb %2
	"
       "movl %%eax, %0"
       : "=m"(result)
       : "a"(dividend), "m"(divisor));

   printf("The result is %d
", result);
   return 0;
}

The asm section loads the dividend value into the EAX register as required by the DIV instruction. The divisor is kept in a memory location, as is the output value. The generated assembly code looks like the following:

movl    $20, −4(%ebp)
        movl    $5, −8(%ebp)
        movl    −4(%ebp), %eax
#APP
        divb −8(%ebp)
        movl %eax, −12(%ebp)
#NO_APP

The values are loaded into memory locations (in the stack), with the dividend value also moved to the EAX register. When the result is determined, it is moved into its memory location on the stack, instead of to a register.

Because this example uses the DIVB instruction, it will only work with dividend values less than 65,536 and divisor values less than 256. If you want to use larger values, you must modify the inline assembly language code to use the DIVW or DIVL instructions.

Using floating-point values

Because of the way the FPU uses registers as a stack, things are a little different when using floating-point values in inline assembly language coding. You must be more careful about how the FPU registers are handled by the inline code.

You may have noticed that three different constraints dealt with the FPU register stack:

  • f references any available floating-point register

  • t references the top floating-point register

  • u references the second floating-point register

When retrieving output values from the FPU, you cannot use the f constraint; you must declare the t or u constraints to specify the FPU register in which the output value will be, as shown in the following example:

asm( "fsincos"
     : "=t"(cosine), "=u"(sine)
     : "0"(radian));

The FSINCOS instruction places the output in the first two registers in the FPU stack. You must be sure to specify the correct register for the correct output value. Because the input value must also be in the ST(0) register, it uses the same register as the first output value, and is declared using the placeholder. The sincostest.c program demonstrates using this inline assembly code:

/* sincostest.c - An example of using two FPU registers */
#include <stdio.h>

int main()
{
float angle = 90;
   float radian, cosine, sine;

   radian = angle / 180 * 3.14159;

   asm("fsincos"
       :"=t"(cosine), "=u"(sine)
       :"0"(radian));

   printf("The cosine is %f, and the sine is %f
", cosine, sine);
   return 0;
}

The assembly language code generated by the compiler for this function looks like this:

flds    −8(%ebp)
#APP
        fsincos
#NO_APP
        fstps   −24(%ebp)
        movl    −24(%ebp), %eax
        movl    %eax, −12(%ebp)
        fstps   −24(%ebp)
        movl    −24(%ebp), %eax
        movl    %eax, −16(%ebp)

The radian variable is loaded into the FPU stack from the program stack using the FLDS instruction. After the FSINCOS instruction, the two output values are popped from the FPU stack using the FSTPS instruction and moved to their appropriate C variable location.

In the preceding example, because the compiler knows the output values are in the first two FPU registers, it pops the values, restoring the FPU stack to its previous condition. If you perform any operations within the FPU stack that are not cleared, you must specify the appropriate FPU registers in the changed registers list. The areatest.c program demonstrates this:

/* areatest.c - An example of using floating point regs */
#include <stdio.h>

int main()
{
   int radius = 10;
   float area;

   asm("fild %1
	"
       "fimul %1
	"
       "fldpi
	"
       "fmul %%st(1), %%st(0)"
       : "=t"(area)
       :"m"(radius)
       : "%st(1)");

   printf("The result is %f
", area);
   return 0;
}

The areatest.c program places the radius value into a memory location, and then loads that value into the top of the FPU stack with the FILD instruction. That value is multiplied by itself, with the result still in the ST(0) register. The pi value is then placed on top of the FPU stack, shifting the squared radius value down to the ST(1) position. The FMUL instruction is then used to multiply the two values within the FPU.

The output value is taken from the top of the FPU stack and assigned to the area C variable. Because the ST(1) register was used, but not assigned as an output value, it must be listed in the changed registers list so the compiler knows to clean it up afterward.

Handling jumps

The inline assembly language code can also contain labels to define locations in the inline assembly code. Normal assembly conditional and unconditional branches can be implemented to jump to the defined labels.

The jmptest.c program demonstrates this:

/* jmptest.c - An example of using jumps in inline assembly */
#include <stdio.h>

int main()
{
   int a = 10;
   int b = 20;
   int result;

   asm("cmp %1, %2
	"
       "jge greater
	"
       "movl %1, %0
	"
       "jmp end
"
       "greater:
	"
       "movl %2, %0
"
       "end:"
       :"=r"(result)
       :"r"(a), "r"(b));

   printf("The larger value is %d
", result);
   return 0;
}

The inline assembly code defines two labels within the instructions. The JGE instruction is used along with the CMP instruction to compare the two input values loaded into registers. The JMP instruction is used to unconditionally jump to the end of the inline assembly code.

The assembly code generated by the compiler contains the labels as well as the instructions:

movl    −4(%ebp), %edx
        movl    −8(%ebp), %eax
#APP
        cmp %edx, %eax
        jge greater
        movl %edx, %eax
        jmp end
greater:
        movl %eax, %eax
end:
#NO_APP
        movl    %eax, −12(%ebp)

There are two restrictions when using labels in inline assembly code. The first one is that you can only jump to a label within the same asm section. You cannot jump from one asm section to a label in another asm section.

The second restriction is somewhat more complicated. The jmptest.c program uses the labels greater and end. However, there is a potential problem with this. As you saw from the assembled code listing, the inline assembly labels are encoded into the final assembled code. This means that if you have another asm section in your C code, you cannot use the same labels again, or an error message will result due to duplicate use of labels. In addition, if you try to incorporate labels that use C keywords, such as function names or global variables, you will also generate errors.

There are two solutions to solve this. The easiest solution is to just use different labels within different asm sections. If you are hand-coding each of the asm sections, this is a viable alternative.

If you are using the same asm sections (such as if you declare macros as explained in the "Using Inline Assembly Code" section later) you cannot alter the labels within the inline assembly code. The solution is to use local labels.

Both conditional and unconditional branches allow you to specify a number as a label, along with a directional flag to indicate which way the processor should look for the numerical label. The first occurrence of the label found will be taken. To demonstrate this, the jmptest2.c program can be used:

/* jmptest2.c - An example of using generic jumps in inline assembly */
#include <stdio.h>

int main()
{
   int a = 10;
   int b = 20;
   int result;

   asm("cmp %1, %2
	"
       "jge 0f
	"
       "movl %1, %0
	"
       "jmp 1f
 "
       "0:
	"
       "movl %2, %0
 "
       "1:"
       :"=r"(result)
       :"r"(a), "r"(b));

   printf("The larger value is %d
", result);
   return 0;
}

The labels have been replaced with 0: and 1:. The JGE and JMP instructions use the f modifier to indicate the label is forward from the jump instruction. To move backward, you must use the b modifier.

Using Inline Assembly Code

While you can place inline assembly code anywhere within the C program, most programmers utilize inline assembly code as macro functions. The C macro functions enable you to declare a single macro that contains a function. When the macro is referenced in the main program, the macro is expanded to the full function defined by the macro. This section shows how to create inline assembly macros in your C programs.

What are macros?

In C and C++ programs, macros are used to define anything from a constant value to complex functions. A macro is defined using the #define statement. The format of the #define statement is as follows:

#define NAME expression

By convention, the macro name NAME is always defined using uppercase letters (this is to ensure it will not conflict with C library functions). The expression value can be a numeric or string value that is constant.

If you have done much coding in C or C++, you are most likely familiar with defining constant macros. The constant macro assigns a specific value to a macro name. The macro name can then be used throughout the program to represent the value.

A macro can be defined as a numeric value, such as the following:

#define MAX_VALUE 1024

Whenever the macro MAX_VALUE is used in the program code, the compiler substitutes the value associated with it:

data = MAX_VALUE;
if (size > MAX_VALUE)

The macro value is not treated like a variable in that it cannot be altered. It remains a constant value throughout the program. It can, however, be used in numeric equations:

data = MAX_VALUE / 4;

Another aspect of the C macro is the macro functions. These are described in the next section.

C macro functions

While constant macros come in handy for defining values, macro functions can be utilized to save typing time throughout the program. An entire function can be assigned to a macro at the beginning of the program and used everywhere in it.

The macro function defines input and output values, and then defines the function that processes the input values and produces the output values. The format of the macro function is as follows:

#define NAME(input values, output value) (function)

The input values are a comma-separated list of variables used for input to the function. The function defines how the input values are processed to produce the output value.

Macros are defined as a single line of text. With macro functions, that can create a very long line of text. To help make the macro more readable, a line continuation character (the backslash) can be used to split the function. Here's an example of a simple C macro function:

#define SUM(a, b, result) 
        ((result) = (a) + (b))

The macro SUM is defined as requiring two input values, and producing a single output value, which is the result of the addition of the two input values. Whenever the SUM() macro function is used in a program, the compiler expands it to the full macro function definition.

It is important to note that this is the opposite of standard C functions, which are used to save coding space. The compiler expands the full macro function before the code is assembled, creating a larger code.

An example of a C macro function is shown in the mactest1.c program:

/* mactest1.c - An example of a C macro function */
#include <stdio.h>

#define SUM(a, b, result) 
            ((result) = (a) + (b))

int main()
{
   int data1 = 5, data2 = 10;
   int result;
   float fdata1 = 5.0, fdata2 = 10.0;
   float fresult;

   SUM(data1, data2, result);
   printf("The result is %d
", result);
   SUM(1, 1, result);
   printf("The result is %d
", result);
   SUM(fdata1, fdata2, fresult);
   printf("The floating result is %f
", fresult);
   SUM(fdata1, fdata2, result);
   printf("The mixed result is %d
", result);
   return 0;
}

There are a few things of interest to note in the mactest1.c example program. First, note that the variables defined in the macro function are completely independent of the result variables defined in the program. You can use any variables in the SUM() macro function.

Second, note that the same SUM() macro function worked for integer input values, numeric input values, floating-point input values, and even mixed input and output values! You can see how versatile macro functions can be. Now it's time to apply that to the inline assembly functions.

If you want to see the code with the expanded macro lines, you can use the -E command-line option when compiling.

Creating inline assembly macro functions

Just as you can with the C macro functions, you can declare macro functions that include inline assembly code. The inline assembly code must use the extended asm format, so the proper input and output values can be defined. Because the macro function can be used multiple times in a program, you should also use numeric labels for any branches required in the assembly code.

An example of defining an inline assembly macro function is as follows:

#define GREATER(a, b, result) ({ 
    asm("cmp %1, %2
	" 
        "jge 0f
	" 
        "movl %1, %0
	" 
        "jmp 1f
 " 
        "0:
	" 
        "movl %2, %0
 " 
        "1:" 
        :"=r"(result) 
        :"r"(a), "r"(b)); })

The a and b input variables are assigned to registers so they can be used in the CMP instruction. The JGE and JMP instructions use numeric labels so the macro function can be used multiple times in the program without duplicating assembly labels. The result variable is copied from the register that contains the greater of the two input values. Note that the asm statement must be in a set of curly braces to indicate the start and end of the statement. Without them, the compiler will generate an error each time the macro is used in the C code.

The mactest2.c program demonstrates using this macro function in a C program:

/* mactest2.c - An example of using inline assembly macros in a program */
#include <stdio.h>

#define GREATER(a, b, result) ({ 
    asm("cmp %1, %2
	" 
        "jge 0f
	" 
        "movl %1, %0
	" 
        "jmp 1f
	" 
        "0:
	" 
        "movl %2, %0
	" 
        "1:" 
        :"=r"(result) 
        :"r"(a), "r"(b)); })

int main()
{
   int data1 = 10;
   int data2 = 20;
   int result;

   GREATER(data1, data2, result);
   printf("a = %d, b = %d    result: %d
", data1, data2, result);
data1 = 30;
   GREATER(data1, data2, result);
   printf("a = %d, b = %d    result: %d
", data1, data2, result);
   return 0;
}

Summary

This chapter discussed how to use assembly language code inside of C and C++ programs. The technique of inline assembly code enables you to place assembly language functions inside C or C++ programs, pass program variables to the assembly language code, and place output from the assembly language code into C program variables.

The C asm statement contains assembly language code that is transferred to the compiled assembly language program from the C program code. The asm statement has two formats. The basic asm format enables you to code assembly language instructions directly, using C global variables as input and output values.

The extended asm format provides advanced techniques for passing input values to the assembly code, and moving output values to the C program code. Any type of C data, such as local variables, can be passed to either registers or memory locations using the extended asm format. The input values can be assigned to specific registers or you can allow the compiler to assign the registers as necessary. Similarly, output values can be assigned to either registers or memory locations. Numerous features can be used to control how the variables are used within the inline assembly language code.

Inline assembly language code in the asm section is often defined using C macro functions. The C macro function uses a format that defines a function name, the input values used, and the output values used, along with the asm section function. Each time the macro function is called in the main program, the compiler expands the inline assembly language code.

The next chapter digs deeper into using assembly language in mixed programming environments. Besides inline assembly language code, you can create complete assembly language libraries that can be utilized by C and C++ programs. This technique is discussed and demonstrated in the next chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset