LWN.net Logo

On the value of static tracepoints

On the value of static tracepoints

Posted Apr 28, 2009 19:45 UTC (Tue) by compudj (subscriber, #43335)
Parent article: On the value of static tracepoints

As an answer to Andrew Morton (which I should have probably posted on LKML rather than here), I would say that one of the primary strength of a system-side kernel tracer is to give Linux users the ability to answer to this simple question : "Why am I not getting the expected performance or latency when I run such application or use such device on my system ?"

The answer to this question is rather easy when parts of the system are _actively_ eating up CPU time (oprofile is very good at system-wide profiling), but becomes less clear when the issue is a "worse-case latency" or involves delays caused by process "waiting time". Having a trace of wakeup dependencies and the identity of each thread consuming CPU time along with scheduler decisions are incredibly valuable in getting an overall view of the system's behavior.

If this has not been expressed clearly enough in the many presentations many of us have done in the past years, then I guess we are simply unable to reach the right audience. A good case study of the static tracepoint value has been presented in this paper 2 years ago. It presents how static tracing has been used to debug problems at Google, IBM and Autodesk.

Linux Kernel Debugging on Google-sized clusters at Ottawa Linux Symposium 2007

I have, in addition, personally been involved with and helped static tracer deployment at Google, IBM, Autodesk, Nokia, Ericsson, Siemens, Novell (SuSE Enterprise real-time), WindRiver, Montavista (Carrier Grade Linux distribution).

And if kernel developers still think that a kernel tracer is only valuable to kernel developers, then we have a big marketing job to do because they are just not getting the message : kernel tracing is _very_ valuable to Linux *users*.

P.S.: I did not reply on this topic on LKML because I think I have done my share of the explanation in the past 4 years, and I would just be repeating myself. *Linux users* have to speak up, not me.


(Log in to post comments)

On the value of static tracepoints

Posted Apr 29, 2009 2:22 UTC (Wed) by k8to (subscriber, #15413) [Link]

LKML is scary and forbidding.
You're suggesting i wander into a contentious topic and start opinionating?

I am a Linux user who does, among other things, systems performance tuning. I do this ad hoc and also as a significant portion of my job. Luckily I have the freedom to do portions of my work on Solaris, where dtrace is accessible. Unfortunately the Solaris internals are generally less well documented or at least less familiar than Linux to me.

On the value of static tracepoints

Posted Apr 29, 2009 5:22 UTC (Wed) by rahulsundaram (subscriber, #21946) [Link]

Andrew Morton has said before that he would like to hear from more users and perhaps you should just do that with some good information on why and how these features are useful. The worst you will get is some flames from other people. Not a huge deal, really.

On the value of static tracepoints

Posted Apr 29, 2009 9:46 UTC (Wed) by dunlapg (subscriber, #57764) [Link]

"kernel tracing is _very_ valuable to Linux *users*."

Isn't this exactly what the article says Andrew Morton is resistant to? If it's ultimately exposed to the end-user, then it will have to have a set of stable user-space tools to gather the information, which means it will essentially be a part of the ABI that has to be maintained, or which people will complain of if broken.

The Xen hypervisor has a binary-only static tracing facility that I use extensively for my development. The particular traces change on a regular basis as the code evolves; trying to maintain the same interface for user-land tools would be basically impossible. As it is, before each release I have to go through and make sure that all of the traces I need are still there and haven't been broken by someone else. I think it's worth my time as a developer dealing with the instability. But I wouldn't want that promise exposed to an end-user.

On the value of static tracepoints

Posted Apr 29, 2009 13:08 UTC (Wed) by compudj (subscriber, #43335) [Link]

I think think Andrew has no problem with exposing this information to trace analysis tools, and that eventually trace analysis tool developers will have to adapt these tools to follow kernel revisions. We just have to make sure we add version identifiers in the exported data and make the tools flexible enough so we don't end up breaking at each kernel version. As an example, maintaining LTTV for about 7 years has not required any tremendous effort to follow new kernel releases, given we made it flexible enough.

I think the main issue he raises here is that Ftrace looks like a gathering of single-purpose tracer which will be useful only to kernel developers (and probably only once, as he say). Maybe Andrew exaggerates a bit, but his main concern, which I think is plausible, is whether Ftrace approach is useful to the Linux end-users.

Kernel developers can replace some of the static tracepoints discussed above by dynamic instrumentation because they usually won't face the low performance impact requirements as users doing system-wide tracing on heavily-loaded production systems face (yes, people do this with LTTng). So the addition of such tracepoints for either a special-purpose tracer or for a tracer which does not care so much about slowing the system down because it only collects a specific subset of data can clearly be arguable. I think the main answer to this is to bring a high-performance, system-wide user-available tracer in Linux, so those tracepoints have a in-tree user which uses them extensively. LTTng happens to have been providing this out-of-tree for a few years now.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds