LFCS: Building the kernel with Clang

Posted May 5, 2011 2:28 UTC (Thu) by nevets (subscriber, #11875)
Parent article: LFCS: Building the kernel with Clang

Here's a heads up on -pg -mfentry. The -mfentry on top of -pg uses a different mechanism other than mcount. It adds a call to __fentry__ instead of mcount, and uses this at the beginning of the function:

000000000000009e <atomic_long_add>:
      9e:       e8 00 00 00 00          callq  a3 <atomic_long_add+0x5>
                        9f: R_X86_64_PC32       __fentry__-0x4
      a3:       55                      push   %rbp
      a4:       48 89 e5                mov    %rsp,%rbp

Instead of:

00000000000000c4 <atomic_long_add>:
      c4:       55                      push   %rbp
      c5:       48 89 e5                mov    %rsp,%rbp
      c8:       e8 00 00 00 00          callq  cd <atomic_long_add+0x9>
                        c9: R_X86_64_PC32       mcount-0x4
      cd:       f0 48 01 3e             lock add %rdi,(%rsi)

It is currently only supported in gcc 4.6.0 and higher on x86, and not on the other platforms (that I know of). I will be converting Ftrace to use this when available which will also add a lot more features to function tracing.

This is just forward looking, but if LLVM is to be a competitor of gcc, it will definitely need to support this.

LFCS: Building the kernel with Clang

Posted May 6, 2011 1:35 UTC (Fri) by jzbiciak (guest, #5246) [Link] (2 responses)

If I read this correctly, the only real difference is that the __fentry__ call happens before pushing %rbp and establishing a stack frame, whereas with mcount, it's a true nested call.

Aside from a shorter stack depth and a different label, could you expand on what the advantages are? You mention that it will allow you to add a lot more features to Ftrace. (A link is fine if you have one handy.)

Thanks!

LFCS: Building the kernel with Clang

Posted May 6, 2011 2:07 UTC (Fri) by nevets (subscriber, #11875) [Link] (1 responses)

The first main benefit of this, is that you no longer need to have stack frames enabled. The -pg option with mcount requires stack frames. With -pg and -mfentry, you no longer have to have stack frames, which gives a bit of a performance boost.

The next part is that the callbacks to the function tracer can now get access to the registers. Because the stack frame is set up before mcount is called, you lose out on having the stack and registers holding function parameters by the time mcount is called. With the fentry right at the beginning of the function, you now have full access to the registers and stack frame as it was given to the function, which means we now have the possibility of tracing the data in the function parameters as well.

The third part and the most extreme, is that because fentry is called as the very first instruction of the function, we could possibly now "hijack" the function completely! That is, we could call a different function and return to the original caller without any issue. I could imagine crazy things with this feature.

Perhaps taking point 2 and 3 above, instead of a full hijack, we could also have the ability to modify the parameters. Not sure what usefulness that is besides rootkits and academia. But who knows?

As for a link for documentation of what ftrace could do with this? Sorry, but I don't know of the url that points into my head ;)

LFCS: Building the kernel with Clang

Posted May 6, 2011 3:11 UTC (Fri) by jzbiciak (guest, #5246) [Link]

Ah, ok. I guess the advantages weren't immediately obvious with the example you posted, in part because the __fentry__ version still had a stack frame, and the stack frame in both cases wasn't terribly exciting.

With a beefier function and beefier stack frame, the differences would become more noticeable. And if you compile with -fomit-frame-pointer in the __fentry__ version, I can see the differences growing further still, as you note.

In the atomic_add example, it wasn't obvious that mcount wouldn't let you do the things you say you might want to do with __fentry__. Your explanation makes the limitations of mcount clearer.

Thanks!