LWN: Comments on "Statistics and tracepoints"

Obligatory systemtap reference

mfedyk — Mon, 20 Sep 2010 03:51:58 +0000

personally, I don't like how systemtap dynamically generates kernel modules to do the tracing. I'd much rather have a lib of operations that get called from a domain specific language.

it's all about peace of mind when dealing with production systems.

Obligatory systemtap reference

sfink — Mon, 30 Aug 2010 20:07:31 +0000

...or the kernel devs could stop kicking systemtap in the gonads and adopt its tapset mechanism for abstracting away from the specific tracepoints as much or as little as you want. That at least provides a place where you could draw a compatibility line; I would not suggest that the current tapset library does that (or tries to).

I haven't used either perf or systemtap enough for my opinion to be relevant, but it really seems to me like the perf people are focused on a narrow audience that does not happen to include anyone who lives in userspace. Systemtap people *are* actively concerned with sysadmins, userspace developers, etc., and are working on the large and important set of user problems such as the API/ABI one described in this article. But stap's users and developers are getting scared off by the vague but generally negative attitude towards the project by the kernel developers.

Isn't it time for the perf community to come out and directly identify what they dislike about the systemtap approach, and state their plans for "the right way" to overcome the problems that systemtap is addressing?

There's obviously a fundamental difference between "log everything and analyze it afterward" vs "run analysis code online, possibly modifying what gets traced at runtime, and report only on digested results". Is that all it is? They're mutually compatible, and as a user I've had uses for both on different problems.

To be sure, the systemtap community could do a much better job of giving examples of problems that required their approach -- but why should they go to the effort of describing those if they're just going to be ignored anyway?

(My example: I needed to identify the source of a periodic 10ms latency in between invocations of my realtime-scheduled thread. I wrote a systemtap script to record the end time of my thread's wakeup, subtract that from the start time of the next wakeup, and if that was <3ms I would throw out the various traces I had logged in between. If it was greater, I'd remember those traces plus grab some more expensive stuff (stack traces). Numbers are from memory and guaranteed to be wrong.)

Statistics and tracepoints

dmk — Thu, 26 Aug 2010 20:03:21 +0000

Well, maybe an in-kernel API for counting things which would also be able to deliver trace-events to perf when userspace decides it wants them?

Or other way around: Any statistics API should probably be traceable... :)

Statistics and tracepoints

marineam — Thu, 26 Aug 2010 16:13:10 +0000

As a developer tracepoints sound like a powerful and sexy way to get detailed information on what the kernel is doing. As a sys-admin I already have my head full of how to deal with thousands of other pieces of software, now I have to learn another crazy tool just to get simple counters? And what about gathering long term trends? I'd much rather write a 1 minute cron job that reads a file in sys or proc and dumps a few numbers into RRDTool than writing a more complex application for listening to events from the kernel.

Both methods of gathering stats on things in the kernel are very useful and serve different needs to different people. Not everyone has the privilege of thinking like a kernel developer all the time. :-)

Statistics and tracepoints

rvfh — Thu, 26 Aug 2010 13:13:14 +0000

It seems that someone writing an application making use of tracepoints should make it prepared for tracepoint disappearance. Probably they want some config file that can evolve to follow the kernel under scrutiny.

Statistics and tracepoints

thedevil — Thu, 26 Aug 2010 05:31:28 +0000

"Past discussions have included suggestions for ways to mark tracepoints which are intended to be stable, but no conclusions have resulted. So the situation remains murky. It may well be that things will stay that way until some future kernel change breaks somebody's tools. Then the kernel community will be forced to choose between restoring compatibility for the broken tracepoints or overtly changing its longstanding promise not to break the user-space ABI (too often). It might be better to figure things out before they get to that point."

This reminds me of global inaction in the face of climate change. Am I obsessed?