Tuesday, August 14. 2007

Gcc inline assembler howto summary

I found the Howto on the gcc inline assembler difficult to understand, so I wrote up the major parts here and created some kind of summary.

Basic asm in C/C++ program

Inlining asm allows you to insert machine instruction ("assembler code") in your C or C++ code that you compile with gcc. In the basic syntax version you do this with

asm("<asm instruction>");

Here, <asm instruction> are the assembler instructions the gnu assembler as understands. Separate multiple instructions by \n\t or ;. If your program uses asm as a name (eg. for a function), use __asm__ instead of asm.

gcc will just insert <asm instruction> at the position you placed them in the C/C++ code [1]. gcc won't be aware of any side effects <asm instruction> might have. For example, if you overwrite a register that gcc is using for some variable, you either intended this side effect, or you restore the original value in that register within <asm instruction>, or you use the extended syntax version to inform gcc of the clobbered registers.

Besides telling gcc which registers where clobbered by your <asm instruction>, the extended syntax version also allows you to refer to variables in your C/C++ program from within your assembler code [2]. The extended syntax is described below.

Assembler syntax

as uses AT&T syntax for its assembler instructions, which differ from the Intel syntax you may have seen. The major differences are the following:

  • Source-destination ordering.
    In AT&T syntax, the first operands are the source, the last operand is the destination (while in Intel syntax it is the other way round). To move the value from the register 'eax' to 'ebx', you write asm("movl %eax, %ebx")

  • Register naming.
    In AT&T syntax, register names are prefixed with %. So if you mean the register eax, you write %eax. (Intel syntax does not use a prefix.)

  • Immediate Operand.
    In AT&T syntax, immediate operands ("constants") are preceded by $, for hexadecimal constants $ is followed by 0x. So the number 13 is $13 in decimal notation and $0xD in hexadecimal notation. (Intel syntax does not use the $ prefix and uses an h suffix for numbers in hexadecimal notation.)

  • Operand size.
    In AT&T syntax, the operand size is noted as a suffix to the op-code. The suffixes are b, w, and l for byte (8-bit), word (16-bit) and long (32-bit) memory references. (Intel syntax uses the prefixes byte ptr, word ptr, and dword ptr to the operand.)

  • Memory operands.
    In AT&T syntax, the base register (the register whose value is used as a pointer) is enclosed in parenthesis ( and ), the offset is written in front of the base register and the scale is given within the paranthesis. See the examples below for clarification.

Examples

What you want to do... AT&T syntax Intel syntax
Move the value 1 into eax movl $1, %eax mov eax, 1
Move the value 0xff into ebx movl $0xff, %ebx mov ebx, 0ffh
Move the value at address ecx into eax movl (%ecx), %eax mov eax, [ecx]
Move the value at address ecx+3 into eax movl 3(%ecx), %eax mov eax, [ecx+3]
Move the value at address ebx+2*ecx into eax movl (%ebx,%ecx,2), %eax mov eax, [ebx+ecx*2h]
Move the value at address ebx+2*ecx-13 into eax movl -0x13(%ebx,%ecx,2), %eax mov eax, [ebx+ecx*2h-13]

Extended asm in C/C++ program

With the extended asm syntax you can refer to C/C++ variables from within your assembler code. As mentioned above, the basic format is:

asm("<asm template>"  : <output> : <input> : <clobbered>);

Here, <asm template> is a template for an assembler instruction, <output> is a comma-separated list of output operands, <input> is a comma-separated list of input operands, and <clobbered> is a comma-separated list of registers which are clobbered (used, overwritten) by <asm template> of which gcc will not assume it knows their value.

The <asm template> is similar to the string template you use in printf: it is like the assembler instruction(s) you want to emit but contains place holders %0, %1, %2, ... to refer to the first, second, third, ... variables mapped to a general purpose register mentioned in <output> and <input> (in the order they are mentioned). Just as in printf, if you need a literal %, you need to write %% in <asm template>, for example %%eax to refer to the register eax.

The syntax for the comma-separated elements of the <output> and <input> list is

"<constraint>" ( <variable> )

<constraint> refers to some constraint on the variable, and <variable> is the name of some C/C++ variable. The next subsection explains the constraints.

The comma-separated list of clobbered registers may contain the word memory, which indicates that <asm template> changed the memory and gcc should not assume that some value that it has written to memory previously is still the same, or the word cc, which indicates that the condition flags will be modified.

Constraints

The most frequently used constraints are:

Character Meaning
`r` Use a general purpose register (GPR) for this variable.
`a` Use `%eax` (or, depending on operand size, `%ax` or `%al`)
`b` Use `%ebx` (or, depending on operand size, `%bx` or `%bl`)
`c` Use `%ecx` (or, depending on operand size, `%cx` or `%cl`)
`d` Use `%edx` (or, depending on operand size, `%dx` or `%dl`)
`S` Use `%esi` (or, depending on operand size, `%si`)
`D` Use `%edi` (or, depending on operand size, `%di`)
`m` Operate directly in memory, don't use a register
`0`, `1`, ..., `9` This is the same as the first, second, ..., tenth variable.
Use this if a variable is both an output and an input variable as in asm("addl %0,%0" : "a" (var) : "0" (var)) that calculates var = var + var. Here 0 refers to the first variable, which is var in the <output> list.

Other constraints are o for memory locations with offsettable addresses, V for memory locations which are not offsettable, i for an immediate integer operand (a constant) including symbolic constants known only at assembler time, n for an immediate integer operand (a constant) with a known numeric value, g for any general purpose register, memory location, or immediate integer operand (a constant).

x86 specific constraints include q for registers a, b, c, or d, I and J for constants in the range 0-31 and 0-63 (for 32-bit and 64-bit shifts), K and L for the constants 0xff and 0xffff, M for 0, 1, 2, or 3 (shifts for lea instruction), N for a constant 0-255 (for out instruction), f for a floating point register, t for the top of stack of floating point registers, u for the second floating point register, and A for a or d registers (for 64-bit integer values stored in d (most significant bits) and a (least significant bits)).

The constraints (for output variables) may be prefixed by the constraint modifiers = and &. = marks a variable as write-only, that is, the variable is never read, only written. & marks a variable as "earlyclobber", that is the variable will be modified before all input operands are used. Thus, the earlyclobber (output) variable cannot lie at a location used by any input operand (because the input variable will be overwritten by the earlyclobber variable before being read).

Examples

int main(void)
{
        int foo = 10, bar = 15;
        __asm__ __volatile__("addl  %%ebx,%%eax"
                             :"=a"(foo)
                             :"a"(foo), "b"(bar)
                             );
        printf("foo+bar=%d\n", foo);
        return 0;
}

Here we instruct gcc to store foo in eax (constraint a) and bar in ebx (constraint b). We further declare that foo is only written (constraint modifier =). No registers are clobbered besides the declared output registers.

__asm__ __volatile__( "   lock       ;\n"
                      "   addl %1,%0 ;\n"
                      : "=m"  (my_var)
                      : "ir"  (my_int), "m" (my_var)
                      : /* no clobber-list */
                      );

This executes an atomic addition (because of the prepended lock). my_var is declared as write-only (constraint modifier =) and can reside at a memory location (constraint m). my_int is declared an integer (constraint i) and must reside in a register (constraint r).

__asm__ __volatile__( "decl %0; sete %1"
                      : "=m" (my_var), "=q" (cond)
                      : "m" (my_var) 
                      : "memory"
                      );

This decreases my_var (instruction dec) and sets cond if the resulting value is zero (instruction sete). my_var resides in memory, whilecondis ineax,ebx,ecx, oredx(constraint q`).

 __asm__ __volatile__( "btsl %1,%0"
                      : "=m" (ADDR)
                      : "Ir" (pos)
                      : "cc"
                      );

This sets a bit (instruction bts) in ADDR at position pos. ADDR is in memory (constraint m), and pos is in a register (constraint r) and an integer constant with value 0-31 (x86 constraint I). The instruction clobbers the condition flags (word cc in clobber list).

Further links and ressources

As mentioned above, this document is a compilation of the gcc inline assembler howto. Other sources on gcc inline assembler include Brennan’s Guide to Inline Assembly. The Linux Assembly projects shows how to make syscalls with assembler (the examples use the definitions mentioned in linasm-src.html) and also has a list of tutorials for assembler programs


[1] gcc does that before optimization, that is, your assembler code may be moved around or even deleted during optimization. To avoid this, add volatile (or __volatile__, if volatile is a name in your program) after asm (or __asm__).

[2] You can refer to global static variables with _varname within the assembler instruction. For all other variables, use the extended syntax.