User: Password:
Subscribe / Log in / New account

Why trace the kernel?

Why trace the kernel?

Posted Jul 24, 2008 23:26 UTC (Thu) by pr1268 (subscriber, #24648)
Parent article: Tracing: no shortage of options

Forgive me for sounding ignorant, but why all the fuss about tracing the kernel? And more specifically, why is there DTrace "envy" (referring to a previous week's LWN article)?

My impressions are that in the Linux kernel, the code is already well-optimized and (relatively) error-free, so why is there such a demand as of late to insert all these probes like stuffing pins in a pincushion? Granted, I'm familiar with using tools like strace(1) (and similar tools, both open-source and proprietary), but I'm still unsure why this has become such an important issue with regards to the kernel lately.

And, what's the envy all about over Sun Microsystems' DTrace? Linux kernel devs are certainly capable of creating their own tracing tools (as this article explains).

Thanks in advance for any comments--I'm just trying to feed my curiosity by asking here.

(Log in to post comments)

Why trace the kernel?

Posted Jul 25, 2008 1:33 UTC (Fri) by corbet (editor, #1) [Link]

This kind of tracing is wanted by people trying to track down problems which happen in production environments. You need to see why the system is behaving poorly (or crashing) while operating under its real workload. That requires the ability to hook into the system almost anywhere without perturbing its operation. Tracing can be nice for normal kernel debugging, but it's the folks wondering why their monster trading systems are bogging down that really want it.

The "DTrace envy" is there (1) because Sun has been using it as one of its primary anti-Linux marketing tools, and (2) because what we have isn't as good. We'll get there.

Why trace the kernel?

Posted Jul 29, 2008 21:01 UTC (Tue) by oak (guest, #2786) [Link]

Another point is that unlike strace, SystemTap's system-wide[1], and being 
kernel side instead of going through the fragile (threads/signals...) & 
slow ptrace() interface it can be much faster, especially after it starts 
using the static markers / tracepoints.

[1] One can attach strace to multiple processes, but it really slows down 
the system and is pretty inconvenient (ptrace has side-effects).  If one 
wants to do full system tracing, currently LTT is much better alternative 
than SystemTap though due to its much smaller overhead (especially with 
multiple probes).  LTT and SystemTap are a bit for different purposes 

Currently neither LTT nor SystemTap provide user-space tracing like DTrace 
does, but I think Frysk does some of that.  Frysk doesn't do kernel 
tracing though and it would also profit from utrace as currently it uses 
ptrace().  (Note: ltrace tool is pretty useless for user-space tracing as 
it doesn't trace library->library calls and it doesn't support dlopen() 
which is used almost by anything a bit more complicated on Linux desktop).

Once one has system wide monitoring, one could have something like this:
- Or the RedHat Google SOC about getting BootChart type of visuals from 

PS. This is not just DTrace envy, it would be nice to have a GUIs like 
(Instruments uses DTrace, Shark doesn't) :-)

Btw. This is more about performance analysis framework stuff than tracing, 
but what is happening with perfmon2?  Is that going to be used e.g. by 

Why trace the kernel?

Posted Jul 30, 2008 1:05 UTC (Wed) by nix (subscriber, #2304) [Link]

Regarding your ltrace comments, yes, it's a kludge. Thankfully glibc 
provides LD_AUDIT which is much more capable, because the dynamic linker 
really can tell what dynamic calls are being made. It would be even better 
if there was even the slightest hint of documentation about it, but 
thankfully we don't need that as Jiri Olsa has done it for us: 

Why trace the kernel?

Posted Jul 30, 2008 14:37 UTC (Wed) by oak (guest, #2786) [Link]

Thanks, this looks useful!

(Although dynamic linker based things are faster than ptrace, they require 
restarting the monitored process to enabled the tracing so they are not 
very suitable for ad-hoc system monitoring.)

Why trace the kernel?

Posted Jul 30, 2008 22:22 UTC (Wed) by nix (subscriber, #2304) [Link]

That's true. (Particularly true in this case, where quite a bit of the 
magic is handled at startup time.)

Why trace the kernel?

Posted Aug 5, 2008 17:01 UTC (Tue) by compudj (subscriber, #43335) [Link]

Currently, LTTng implements support for user-space markers on x86 32 and 64 bits. It's a bit
slow, since it goes through a system call each time an event record must be recorded, and the
API is subject to change, but one can currently add markers to their userspace program or
library. See the package : It
depends on the LTTng patchset to enable/disable markers and to record data. The tarball
contains examples telling how to modify the makefiles and linker scripts to use markers in

Being the LTTng project lead, I dream about a simple in-kernel API to manage the performance
counters, which would aim at managing these limited resources for the various users (watchdog,
user-space perfmon-like API, in-kernel LTTng). The tracer is itself easily extensible and can
record new events which include performance counters either in an interrupt mode or at
specific events occuring on the system (system call entry/exit, interrupt handler entry/exit,
trap entry/exit...). I just need something to setup these counters and let them run free on
the system without changing them when switching from one task to another : this is something
really annoying when gathering system-wide information. I haven't looked at the perfmon code
lately, but I think that most of the user-space system call API is useless to an in-kernel
user like LTTng. A minimalistic perfmon would be welcome.


Posted Mar 18, 2009 5:57 UTC (Wed) by xoddam (subscriber, #2322) [Link]

Forgive me if it's an obtuse question, but if it's performance metrics you're after and not investigating a particularly thorny race condition, isn't oprofile enough?

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds