Finding a profiler that works, damnit
Posted Mar 24, 2010 0:30 UTC (Wed) by chantecode
In reply to: Finding a profiler that works, damnit
Parent article: KVM, QEMU, and kernel project management
but it *also* only seems to output proper callgraphs when programs (and all libraries...) are compiled with -fno-omit-frame-pointer.
Sure, how could it be another way? Without frame pointers you can't have reliable stacktraces. Or if you have a tip to go round this requirement I would be happy to implement it. The only arch I know that is able to walk the stack correctly without frame pointers is PowerPc.
Furthermore, even when having done that, I can't make heads or tails of the callgraph output report. It is almost 100% unintelligible. It looks nothing like any callgraph profile I've ever seen before.
The default output is a fractal statistical profile of the stacktraces, starting from the inner most caller (the origin of the event) to the outer most.
Let's look at a sample: http://tglx.de/~fweisbec/callchain_sample.txt
What this profile tells you is that the ls process, while entering the kernel, is part of 3.34 % of the total overhead, and the origin of this overhead is in the __lock_acquire function. Among all the callers of __lock_acquire() when it caused this overhead, lock_acquire() has been its caller 97 % of the time, and _raw_spin_lock() has been the caller of lock_acquire() 48 % of the time, etc...
This is why it is called a fractal profiling: each branch is a new profile on its own. _raw_spin_lock() is profiled relatively to its parent lock_acquire().
May be it doesn't look like other kind of callgraph profiler as you said. I just don't know as I haven't looked much at what other projects do. May be I took some inspiration from sysprof callgraphs, except sysprof does a outer most to inner most callgraph direction. The other direction we took for perf (from inner to outer) seems to me much more natural as we start from the hottest, deepest, most relevant origin to end up on the highest level origin.
But I can implement the other direction, shouldn't be that hard, I'm just not sure it will give us nice and useful results but it's worth the try, I think I'll put my hands on it soon.
BTW, we have a newt based TUI that can output the callgraph in a collapsed/expanded (toggable at will) fashion. May be that could better fit your needs.
An example here: http://tglx.de/~acme/perf-newt-callgraph.png
But you need to fetch the -tip tree for that as it's scheduled for 2.6.35
But please if you have suggestions to make our callgraphs better, tell us.
It would be very appreciated, we lack feedbacks in this area and it's still a young feature (although pretty functional).
I also found no way to convert to output to callgrind format.
You're right. I'll try to get this too once I get more time.
to post comments)