Uprobes: not quite there yet

By Jonathan Corbet
January 20, 2010

Tracing support in Linux has made a great deal of progress over the course of the last year or so. One important feature still lacks support in the mainline kernel, though: seamless, combined tracing of user-space execution along with the kernel. The subsystem which is meant to support this feature - utrace - has run into a number of roadblocks on its way into the mainline. Now a higher-level feature, uprobes, has been proposed as a solution for dynamic probing of user-space programs. All told, the combination shows a lot of progress toward inclusion, but the resulting discussion suggests that there are still problems to be overcome before this code will be merged.

This version of uprobes is actually two independent modules which address the problem at different levels. The lower-level piece is called "UBP," for user-space break points; its job is to handle the actual placement of probes into user-space processes. The developers reasoned that there might be additional users of user-space probes in the future, so the facilities for the placement and removal of those probes were carved out separately.

On top of UBP is the actual uprobes code, which handles higher-level details. Uprobes arbitrates between multiple users of breakpoints, even if two users want to place a probe at the same location. It uses utrace to ensure that processes are not running in an area which is about to have a probe inserted, and deals with the case of multiple processes running the same code where some are being traced and others are not. The uprobe code is also in charge of actually calling the probe function when a probe is hit and recovering properly if that function behaves poorly.

This separation is the first point of contention; Peter Zijlstra (who has been the main reviewer of this code so far) sees uprobes as an unnecessary glue layer which could be eliminated. Peter would rather see any needed features pushed down into UBP, after which the higher-level code could be dropped. The uprobes developers disagree, though, saying that the functions implemented at that level are necessary and cannot really be eliminated. This part of the discussion kind of died out, but it doesn't look like the developers are inclined to make major changes here.

The next problem is with the implementation of the probes themselves. When a probe is placed in a user-space program, the instruction at the probed location is overwritten by a breakpoint. When the breakpoint is hit, the probe handler function is invoked; once it returns, the replaced instruction must be executed somehow. A simple implementation would put that instruction back into its original location, single-step through it, then restore the breakpoint once again. That approach fails, though, if there is a second process (or thread) running the probed code. If that second process executes through the probed area while the probe has been removed, the associated event will be lost.

So the uprobes developers took a separate approach, called "single-step out of line" or "execute out of line" (XOL). A separate region of memory is set up for the purpose of holding instructions which have been displaced by probe breakpoints. When one of those instructions is to be executed, it is run (again, in single-step mode) out of this separate area; after that, control returns after the probe location. This solution allows a probe to work with multiple processes at the same time.

The problem is this: the memory containing the XOL instructions must be in the probed process's address space. So the XOL code adds a virtual memory area (VMA) to the process, reserving a range of address space for this purpose. This works, but it strikes some observers as inelegant at best, and potentially disruptive at worst. Currently, the layout of a process's address space is almost entirely under the control of the process itself. The injection of a special kernel VMA can perturb the process's control of its address space, causing other VMAs to move or conflicting with an attempt by the process to place a VMA at a specific location. Debuggers are often known to distort application behavior (leading to "heisenbugs" which disappear when somebody attempts to observe them directly), but tracing, which is meant to work on production systems, should really minimize such distortions. Peter also dislikes the precedent of kernel code messing with a process's address space. Finally, on 32-bit systems, losing even a small amount of address space to a kernel function is likely to be unwelcome in a number of situations.

Solving this problem is not necessarily easy. Peter seems to favor emulating the displaced instruction, but that would require the implementation of a full instruction emulator in the kernel. That code would be large, architecture-specific, and error prone. There was some discussion of trying to run the instruction in kernel space, but doing that securely appears to be a challenging task. After an extended discussion, the prevailing opinion seemed to be something like that expressed by Pekka Enberg:

I guess we're looking at few megabytes of the address space for normal scenarios which doesn't seem too excessive... I don't like the idea but if the performance benefits are real (are they?), maybe it's a worthwhile trade-off.

In the end, perhaps the kernel developers will hold their noses and merge this approach, but chances are they'll need to talk about it for a while yet first.

The uprobes code comes with an ftrace plugin which provides an interface to user space for the placement and management of probes. The problem here is that the kernel developers have, for all practical purposes, decided that there will be no more ftrace plugins added to the kernel. New features are supposed to go through the perf events subsystem instead, which is seen as having a better-designed interface. So the current ftrace plugin will almost certainly have to be redone for perf events before this code can go in.

The ftrace plugin also associates user-space probes with specific process of interest. Peter argues that it makes more sense to hook probes onto executable files, then make the process association by way of the VMA structure when the file is mapped. Existing features in the kernel, perhaps supplemented with a simple hook or two, would make it easy for uprobes to find processes running code from a file and to deal with process comings and goings while the probes are in place. The uprobes developers have not said as much, as of this writing, but it seems likely that the API could be reworked in those terms.

Then, there is the nagging issue of the utrace layer, which has not yet found its way into the mainline. It has recently been added to linux-next, but there is some discomfort with that and it's not clear if it will remain there or not.

All of this may seem like a lot of obstacles to the merging of this code, but it also represents a step forward. The road into the mainline has been long for utrace; a final detour or two seems about par for the course. The existence of uprobes as an in-kernel user of utrace might help its cause, once uprobes itself passes muster. Assuming consensus on these issues can be reached, it should be possible to make a last round of changes and be quite close to getting the code merged - though it might be difficult to get this done for the 2.6.34 merge window. But, if things go well, we should have user-space probing not too much later than that.

Index entries for this article
Kernel	Tracing
Kernel	Uprobes
Kernel	Utrace

Uprobes: not quite there yet

Posted Jan 20, 2010 23:58 UTC (Wed) by chantecode (subscriber, #54535) [Link]

We are not telling people to develop plugins for perf _instead_ of ftrace. The former doesn't supersedes the latter. They are complementary. ftrace is not deprecated. What is deprecated (in most cases) is the old ftrace plugin system (struct tracer) for new tracing code, altough it's still relevant for older tracers like function, mmiotrace, etc... Instead we recommend to use the ftrace trace event API because it is much more powerful, and it also creates events directly usable by perf.

Thanks,

Frederic.

More than one process executing the same object code

Posted Jan 21, 2010 12:13 UTC (Thu) by epa (subscriber, #39769) [Link] (2 responses)

As I understand the article, a lot of the problems occur because more than one process might share the same page of object code (or 'text'), and so it's not possible to modify it without disrupting other processes that aren't being probed. But why not just require that a process being probed must have its own, unshared copy of the code it executes? On a modern system losing ten megabytes or so for a duplicate copy of your program and its shared libraries is not a big deal. Or perhaps some kind of copy-on-write system could be used for these pages, so only the one page modified to insert the probe need be unshared between processes?

I'm sure the kernel hackers have considered all this but the article doesn't mention why a simpler solution isn't possible.

More than one process executing the same object code

Posted Jan 21, 2010 13:24 UTC (Thu) by rvfh (guest, #31018) [Link]

I believe the problem also exists for multithreaded processes...

More than one process executing the same object code

Posted Feb 17, 2010 19:17 UTC (Wed) by oak (guest, #2786) [Link]

The problem with multithreaded programs is that if you disable breakpoint
to let program continue, you might miss the other threads passing that
instruction. SSOL solution leaves the breakpoint in place and
runs/emulates the instruction overwritten by the breakpoint from
elsewhere.

Uprobes: not quite there yet

Posted Jan 21, 2010 16:45 UTC (Thu) by bronson (subscriber, #4806) [Link] (1 responses)

Suddenly I'm reading the project's name as UpRobes. That might be even more descriptive. :)

Can anyone name an existing app that might break with the foreign memory area in its address space? Based on this article, the complaints about XOL sound awfully obscure.

Adding an emulator to the kernel just for this?? That would be an endless source of problems!

Uprobes: not quite there yet

Posted Feb 17, 2010 19:14 UTC (Wed) by oak (guest, #2786) [Link]

Even the SSOL approach one needs to "emulate" / "fix" those of the
instructions which use relative addressing, see Roland's old mail on the
subject:
http://sourceware.org/ml/systemtap/2007-q1/msg00571.html

And the instructions are of course architecture specific.

The good thing is that if one puts breakpoints just on function entry &
exit points, those usually use fairly small set of instructions. See e.g.
this user-space function tracing utility that uses SSOL with breakpoints:
http://repository.maemo.org/pool/fremantle/free/f/functra...

If breakpoints can be put anywhere, it's more of an issue. Good question
is how to test that emulation of all the required instructions for SSOL
works fine...

file tracing considered harmful???

Posted Jan 21, 2010 17:57 UTC (Thu) by faramir (subscriber, #2327) [Link] (1 responses)

If I understand the suggestion that executable files should have probes attached rather then processes, then I'm against it. What if I want to attach probes when a particular user (who has a use case that triggers problems) executes 'ls' or some other frequently executed program? What happens when everyone else executes 'ls'? I admit I really don't know how probes work, but it would worry me a lot if there wasn't some way to probe installed binaries without affecting all other users of that binary. Please tell me I shouldn't be worried.

meh

Posted Jan 21, 2010 19:02 UTC (Thu) by Tobu (subscriber, #24111) [Link]

The point of probes is that ls won't notice the extra few instructions. With some filtering at a privileged level, the prober won't see leaked information either.

Uprobes: not quite there yet

Posted Jan 21, 2010 19:41 UTC (Thu) by Tobu (subscriber, #24111) [Link]

Executables don't control their address space entirely, every library reserves some pages.

Could the extra address space be reserved once by a library everyone links to, the libc? Utrace would write the displaced instructions into that shared space, which the process has no control over. No VMA is introduced while the process is running.

Though that requires some coordination. It's more convenient to let the kernel do the reservation. But it can be done very early to not interfere with the process and its libraries.