User: Password:
|
|
Subscribe / Log in / New account

A SystemTap update

A SystemTap update

Posted Jan 29, 2009 4:04 UTC (Thu) by akpm (subscriber, #4826)
Parent article: A SystemTap update

um, which genius decided to make systemtap dependent upon two large
kernel patches (utrace and uprobes) which have dim-to-zero prospects
of ever being included in Linux?


(Log in to post comments)

A SystemTap update

Posted Jan 29, 2009 8:23 UTC (Thu) by eugeniy (subscriber, #24280) [Link]

SystemTap is not dependent on any patches, it works fine with unpatched kernel.

A SystemTap update

Posted Jan 29, 2009 10:08 UTC (Thu) by ctg (guest, #3459) [Link]

Me and a colleague were discussing just last night the thorny problem of how we work out which of many competing processes are using disk access - causing contention on the disk, so that everything queues up, and more and more disk access is caused... not really a lot of tools in linux todo that... (If we knew the worst offenders, then we could focus our effort on making them more efficient - having to put instrumentation in each process is really time consuming).

.. so reading this article was timely. Looks like systemtap would enable us to quickly home in on the big disk users..

.. the article quite clearly states that to get the best out of systemtap you need these patches, so when Mr Morton himself makes this sort of criticism, then its a bit of a concern.

Despite all that, I'm off to look at systemtap in a bit more detail (it's lack of ubiquity has put me off before), but the lack of decent tools for working out what is really going on in a complex system is pretty frustrating (I'm still suffering from the lack of the "W" flag in the output of ps(1) to show which processes are swapped out - I understand why it doesn't show that any more - but when your system goes into swap, it's useful to see which processes are being paged out.. I suspect systemtap might be able to help with this too).

A SystemTap update

Posted Jan 29, 2009 10:42 UTC (Thu) by mjw (subscriber, #16740) [Link]

The vfs tapset example in the article works without needing any additional user space hook patches.

Also take a look at some of the examples that come with Systemtap. disktop.stp probably does what you want:
http://sourceware.org/systemtap/examples/keyword-index.ht...

A SystemTap update

Posted Jan 29, 2009 17:05 UTC (Thu) by knobunc (subscriber, #4678) [Link]

See also iotop.

A SystemTap update

Posted Jan 29, 2009 20:07 UTC (Thu) by epb205 (guest, #50182) [Link]

Why doesn't ps show which processes are swapped out anymore? Is that somehow a security hole?

A SystemTap update

Posted Feb 3, 2009 22:33 UTC (Tue) by oak (guest, #2786) [Link]

No idea, but you can get the same information also from /proc/PID/smaps.
It's separate for each of the memory mapping the process has i.e. you may
need to write a small script to process the data.

If the process has stuff that's marked as swapped, but not anymore as
dirty, it's completely swapped out. For some reason kernel/SMAPS doesn't
think swapped pages to be anymore dirty which loses the distinction
between shared dirty and private dirty that SMAPS shows for pages still in
RAM.

A SystemTap update

Posted Jan 30, 2009 3:15 UTC (Fri) by SEJeff (subscriber, #51588) [Link]

Actually the block_dump feature of modern 2.6 linux kernels will show you which processes are writing to which devices. I wrote a proof of concept script to show them:

http://www.digitalprognosis.com/opensource/scripts/top-di...

The output looks like this:
root@desktopmonster:~# ./top-disk-users
COMMAND PID NUM ACTION DEVICE
banshee-1 23999 8 READ sda9
kjournald 2494 131 WRITE sda5
kjournald 5182 5 WRITE sda8
pdflush 228 15 WRITE sda5
pdflush 228 1 WRITE sda8
pdflush 228 32 WRITE sda9

A SystemTap update

Posted Jan 29, 2009 12:46 UTC (Thu) by eugeniy (subscriber, #24280) [Link]

Correction: patches are not needed for probing kernel. It looks like for userspace utrace is required. uprobes source, it seems, is included in systemtap.

A SystemTap update

Posted Jan 29, 2009 10:38 UTC (Thu) by mjw (subscriber, #16740) [Link]

As the article states there is no hard dependency, they are just used for deeper user space probing if wanted. And some of the utrace foundations have been going in, with the groundwork now upstream.

The last part of the article gives some idea of ways people are working on getting this functionality faster upstream, so they are included with more distributions by default. By splitting it up, providing other users, etc. One recent example is the utrace->ftrace engine proof of concept: http://lkml.org/lkml/2009/1/27/294

If you have any hints and tips for getting these things, or similar user space hooks that Systemtap can use, upstream faster that would be appreciated.

A SystemTap update

Posted Jan 29, 2009 13:28 UTC (Thu) by fuhchee (guest, #40059) [Link]

> [why is] systemtap dependent upon two large
> kernel patches (utrace and uprobes)

For probing user-space, there is apprx. no alternative: one needs a
kprobes-like infrastructure.

> which have dim-to-zero prospects of ever being included in Linux?

While skepticism may be warranted, we are making efforts to make this
code more palatable to the gatekeepers.

A SystemTap update

Posted Jan 29, 2009 16:15 UTC (Thu) by jejb (subscriber, #6654) [Link]

Actuallly, only the user space tracing aspect of systemtap is dependent on these. You can still do kernel space tracing without them.

We've spent quite a lot of effort explaining the problems with the utrace/uprobes dependency (especially the issues of having to pull the process symbol table into the kernel and of having the kernel actually execute the compiled code to do the traps). There is hope that we might be able to go with a lighter weight infrastructure that simply vectors traps to the user space stap runtime and does all the interpreting in user space. It's just we still haven't quite got system tap buy in yet.

A SystemTap update

Posted Jan 29, 2009 16:36 UTC (Thu) by fuhchee (guest, #40059) [Link]

> We've spent quite a lot of effort explaining the problems with the
> utrace/uprobes dependency

Can you provide some links to discussion about these specifics: ?

> (especially the issues of having to pull the
> process symbol table into the kernel

User-space symbol tables are made available to the systemtap module
only if it is required by the script - if it performs symbolic
address or backtrace type lookups.

> and of having the kernel actually
> execute the compiled code to do the traps

Like in dtrace, instrumentation is run within the kernel because
having user-space processes instrument each other is too disruptive.
We're looking for microsecond-level probe effect, not something
involving multiple context switches, indirect address space accesses,
and so on.

A SystemTap update

Posted Jan 29, 2009 16:55 UTC (Thu) by jejb (subscriber, #6654) [Link]

>> We've spent quite a lot of effort explaining the problems with the
>> utrace/uprobes dependency
>
> Can you provide some links to discussion about these specifics: ?

Um, just use a search ... if you search lkml for utrace you get the less polite version .. if you search the systemtap lists on the same thing, you get the more polite one.

>> (especially the issues of having to pull the
>> process symbol table into the kernel
>
> User-space symbol tables are made available to the systemtap module
> only if it is required by the script - if it performs symbolic
> address or backtrace type lookups.

Only if you buy the premise that the kernel has to be intimately involved in the trace instead of being a simple conduit for mediating it.

>> and of having the kernel actually
>> execute the compiled code to do the traps
>
> Like in dtrace, instrumentation is run within the kernel because
> having user-space processes instrument each other is too disruptive.
> We're looking for microsecond-level probe effect, not something
> involving multiple context switches, indirect address space accesses,
> and so on.

Well, this would be the classic illustration of the problems systemtap faces. Nothing on the above laundry list is impossible even if the kernel merely controls the traced process and lets userspace poke at it ... that, after all, is how gdb works. The brick wall is that kernel developers don't think this is at all a compelling argument and apparently systemtap people think it is.

A SystemTap update

Posted Jan 29, 2009 17:26 UTC (Thu) by fuhchee (guest, #40059) [Link]

> Um, just use a search

I asked because I recall no serious debate about the two specific items ("process symbol tables in the kernel" and "having kernel ... execute code ... to do the traps") you listed. Please humor fellow readers and give some links.

> > User-space symbol tables are made available to the systemtap module
> > only if it is required by the script

> Only if you buy the premise that the kernel has to be intimately involved
> in the trace instead of being a simple conduit for mediating it.

There are many possible details behind such a summary. If one wants dtrace-level introspection and manipulation, never mind going beyond it, some "intimate involvement" (kernel-side processing?) is necessary. Merely "mediating" (data copying?) is not sufficient, since the choice of data and the nature of the programmed reaction is itself variable.

> [...] that, after all, is how gdb works. [...]

The work involved in how gdb does its thing is several orders of magnitude heavier.

> The brick wall is that kernel developers don't think this is at all a
> compelling argument and apparently systemtap people think it is.

Individual kernel people don't need to buy into every argument for systemtap to bloom. We have promoted numerous "dual-use" kernel-side technologies that can stand on their own feet. For example, with utrace, if you believe that user-space instrumentation is plausible, you should support utrace and forthcoming ("froggy" or "ubs"-like) layers on top, for dispatching those events to a hypothetical user-space handler.

The details deserve more in-depth discussion.

A SystemTap update

Posted Feb 3, 2009 22:41 UTC (Tue) by oak (guest, #2786) [Link]

> Well, this would be the classic illustration of the problems systemtap
faces. Nothing on the above laundry list is impossible even if the kernel
merely controls the traced process and lets userspace poke at it ... that,
after all, is how gdb works.

As to why to do it in kernel... Doing it from user space is just too slow.
Try e.g. get backtraces to mallocs through ptrace and you notice how
infeasible this is from user-space (at least through the interface ptrace
offers). With the modern desktop apps that use malloc pretty heavily, the
programs become unusable slow (in addition to their usability, also their
functionality may suffer if they use timeouts for responses etc).

A SystemTap update

Posted Jan 30, 2009 10:31 UTC (Fri) by mjw (subscriber, #16740) [Link]

> Actuallly, only the user space tracing aspect of systemtap is dependent on these. You can still do kernel space tracing without them.

Correct.

> We've spent quite a lot of effort explaining the problems with the utrace/uprobes dependency (especially the issues of having to pull the process symbol table into the kernel and of having the kernel actually execute the compiled code to do the traps).

Could you post the problems you see?

How a tracing tool like systemtap processes and uses the symbol table is kind of orthogonal from utrace and uprobes. utrace and uprobes might make it easier to access them during runtime. But that isn't what Systemtap currently does. If you want a tracer to do these things dynamically at trace event time, or even push the whole thing towards user space in reaction to trace events and hand it off to a user space helper then that is certainly a design choice you can make (unlike tracers, debuggers do this for example since they don't mind suspending the tracee for a longer period). The article does hint at why "offloading" this to a user space helper might not be practical (see the vfs example and the explanation of what might happen if you try to offload something like that to a perl script). But those are tradeoffs you can make independent of the infrastructure you use in the kernel to handle events and trace point insertion.

> There is hope that we might be able to go with a lighter weight infrastructure that simply vectors traps to the user space stap runtime and does all the interpreting in user space.

Yes, there is nothing inherent in utrace or uprobes about how you handle trace events or how you use and insert vector traps into user space. That is the basic idea behind pushing them upstream, because they are useful apart from systemtap. They should also be useful for other tracers like connecting them to ftrace or lttng. You could even use them for a new debugger interface if you aren't interested in a no-overhead tracer. That is what the froggy project is exploring. It seems time to provide something better than the ptrace interface for debuggers.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds