A guide to inline assembly code in GCC

Posted May 3, 2016 15:16 UTC (Tue) by adobriyan (subscriber, #30858)
In reply to: A guide to inline assembly code in GCC by daney
Parent article: A guide to inline assembly code in GCC

> The compiler has knowledge about a subset of the hardware capabilities.

Well, compiler generates correct code somehow. This "somehow" should be enough for managing register interdependencies and stuff.

> Almost by definition, we write inline assembly to do things outside of what the compiler has knowledge about, because if the compiler knew, we could just write our code using the compiler input language.

> To do things outside of the set of things the compiler understands, we have to communicate some aspects of what we need and what we are doing back to the compiler so that it can interact with our inline assembly in a sensible manner. For this there are the constraints.

Constraints will be there and supplying additional information could be arranged. But for this? It's 2016.

static __always_inline unsigned long long rdtsc(void)
{
DECLARE_ARGS(val, low, high);

asm volatile("rdtsc" : EAX_EDX_RET(val, low, high));

return EAX_EDX_VAL(val, low, high);
}

If you dig out through one level of macros the "low" and "high" are "unsigned long" but they were "unsigned int" earlier and gcc generated dummy MOV instruction. Not much of a loss, of course, but knowledge about RDTSC clearing the upper half of a register will make this code efficient automatically. It would look something like:

#ifdef CONFIG_X86_64
uint64_t rdx, rax;
asm ((rdx, rax) = rdtsc();)
return (rdx <<32) | rax;
#else
uint32_t edx, eax;
asm ((edx, eax) = rdtsc();)
return ((uint64_t)edx <<32) | eax;
#endif

A guide to inline assembly code in GCC

Posted May 3, 2016 17:33 UTC (Tue) by nybble41 (subscriber, #55106) [Link]

> Well, compiler generates correct code somehow.

The compiler generates correct code for the opcodes it uses. It knows nothing about any other opcodes which may appear in inline asm statements—to the compiler these are just opaque strings to be inserted directly into the listing passed to the assembler, which in turn only knows how to convert the opcodes into machine code, not what they actually do. Everything the compiler knows about how any given inline asm statement interacts with the rest of the program is based on the constraints specified by the programmer.

For RDTSC there is an inline intrinsic which is portable to at least GCC, Clang, and Visual C++: __rdtsc(). For GCC and Clang you need to #include <x86intrin.h> and for Visual C++ you need:

> #include <intrin.h>
> #pragma intrinsic(__rdtsc)

In all cases you simply write __rdtsc() and get a 64-bit integer, no inline asm required. Similar intrinsics exist for many other useful opcodes.

A guide to inline assembly code in GCC

Posted May 3, 2016 18:48 UTC (Tue) by khim (subscriber, #9252) [Link]

Well, compiler generates correct code somehow.

Well, yeah.

This "somehow" should be enough for managing register interdependencies and stuff.

Sure. But it's not called "somehow". It's called… drumroll… constraints. All the things which you could use are quite literally described in the same language you are using to describe your inline assembler. Here is description of x86 CPU. And here is an arm.

Compiler's CPU model does not have instructions like rdtsc or cpuid thus to generate correct code compiler most obtain that information… "somehow".

If you dig out through one level of macros the "low" and "high" are "unsigned long" but they were "unsigned int" earlier and gcc generated dummy MOV instruction. Not much of a loss, of course, but knowledge about RDTSC clearing the upper half of a register will make this code efficient automatically. It would look something like:
#ifdef CONFIG_X86_64 uint64_t rdx, rax; asm ((rdx, rax) = rdtsc();) return (rdx <<32) | rax; #else uint32_t edx, eax; asm ((edx, eax) = rdtsc();) return ((uint64_t)edx <<32) | eax; #endif

Sure. But that's not called "assembler" at this point. These are intrinsic. There are many of them and when they work - you should use them. But if they don't… then they don't—compiler must be teached to use the capabilities it knows nothing about.

If the constraints which so irritate you would have been were invented for the sole purpose of writing asm—it would have been strange. But the fact that you must use the same method which is used to teach compiler about CPU in the first place does not really looks surprising to me. I mean: what else would you use?