Uprobes: not quite there yet
This version of uprobes is actually two independent modules which address the problem at different levels. The lower-level piece is called "UBP," for user-space break points; its job is to handle the actual placement of probes into user-space processes. The developers reasoned that there might be additional users of user-space probes in the future, so the facilities for the placement and removal of those probes were carved out separately.
On top of UBP is the actual uprobes code, which handles higher-level details. Uprobes arbitrates between multiple users of breakpoints, even if two users want to place a probe at the same location. It uses utrace to ensure that processes are not running in an area which is about to have a probe inserted, and deals with the case of multiple processes running the same code where some are being traced and others are not. The uprobe code is also in charge of actually calling the probe function when a probe is hit and recovering properly if that function behaves poorly.
This separation is the first point of contention; Peter Zijlstra (who has been the main reviewer of this code so far) sees uprobes as an unnecessary glue layer which could be eliminated. Peter would rather see any needed features pushed down into UBP, after which the higher-level code could be dropped. The uprobes developers disagree, though, saying that the functions implemented at that level are necessary and cannot really be eliminated. This part of the discussion kind of died out, but it doesn't look like the developers are inclined to make major changes here.
The next problem is with the implementation of the probes themselves. When a probe is placed in a user-space program, the instruction at the probed location is overwritten by a breakpoint. When the breakpoint is hit, the probe handler function is invoked; once it returns, the replaced instruction must be executed somehow. A simple implementation would put that instruction back into its original location, single-step through it, then restore the breakpoint once again. That approach fails, though, if there is a second process (or thread) running the probed code. If that second process executes through the probed area while the probe has been removed, the associated event will be lost.
So the uprobes developers took a separate approach, called "single-step out of line" or "execute out of line" (XOL). A separate region of memory is set up for the purpose of holding instructions which have been displaced by probe breakpoints. When one of those instructions is to be executed, it is run (again, in single-step mode) out of this separate area; after that, control returns after the probe location. This solution allows a probe to work with multiple processes at the same time.
The problem is this: the memory containing the XOL instructions must be in the probed process's address space. So the XOL code adds a virtual memory area (VMA) to the process, reserving a range of address space for this purpose. This works, but it strikes some observers as inelegant at best, and potentially disruptive at worst. Currently, the layout of a process's address space is almost entirely under the control of the process itself. The injection of a special kernel VMA can perturb the process's control of its address space, causing other VMAs to move or conflicting with an attempt by the process to place a VMA at a specific location. Debuggers are often known to distort application behavior (leading to "heisenbugs" which disappear when somebody attempts to observe them directly), but tracing, which is meant to work on production systems, should really minimize such distortions. Peter also dislikes the precedent of kernel code messing with a process's address space. Finally, on 32-bit systems, losing even a small amount of address space to a kernel function is likely to be unwelcome in a number of situations.
Solving this problem is not necessarily easy. Peter seems to favor emulating the displaced instruction, but that would require the implementation of a full instruction emulator in the kernel. That code would be large, architecture-specific, and error prone. There was some discussion of trying to run the instruction in kernel space, but doing that securely appears to be a challenging task. After an extended discussion, the prevailing opinion seemed to be something like that expressed by Pekka Enberg:
In the end, perhaps the kernel developers will hold their noses and merge this approach, but chances are they'll need to talk about it for a while yet first.
The uprobes code comes with an ftrace plugin which provides an interface to user space for the placement and management of probes. The problem here is that the kernel developers have, for all practical purposes, decided that there will be no more ftrace plugins added to the kernel. New features are supposed to go through the perf events subsystem instead, which is seen as having a better-designed interface. So the current ftrace plugin will almost certainly have to be redone for perf events before this code can go in.
The ftrace plugin also associates user-space probes with specific process of interest. Peter argues that it makes more sense to hook probes onto executable files, then make the process association by way of the VMA structure when the file is mapped. Existing features in the kernel, perhaps supplemented with a simple hook or two, would make it easy for uprobes to find processes running code from a file and to deal with process comings and goings while the probes are in place. The uprobes developers have not said as much, as of this writing, but it seems likely that the API could be reworked in those terms.
Then, there is the nagging issue of the utrace layer, which has not yet found its way into the mainline. It has recently been added to linux-next, but there is some discomfort with that and it's not clear if it will remain there or not.
All of this may seem like a lot of obstacles to the merging of this code, but
it also represents a step forward. The road into the mainline has been
long for utrace; a final detour or two seems about par for the course. The
existence of uprobes as an in-kernel user of utrace might help its cause,
once uprobes itself passes muster.
Assuming consensus on these issues can be reached, it should be possible to
make a last round of changes and be quite close to getting the code merged
- though it might be difficult to get this done for the 2.6.34 merge
window. But, if things go well, we should have user-space probing not too
much later than that.
Index entries for this article | |
---|---|
Kernel | Tracing |
Kernel | Uprobes |
Kernel | Utrace |
Posted Jan 20, 2010 23:58 UTC (Wed)
by chantecode (subscriber, #54535)
[Link]
Thanks,
Frederic.
Posted Jan 21, 2010 12:13 UTC (Thu)
by epa (subscriber, #39769)
[Link] (2 responses)
I'm sure the kernel hackers have considered all this but the article doesn't mention why a simpler solution isn't possible.
Posted Jan 21, 2010 13:24 UTC (Thu)
by rvfh (guest, #31018)
[Link]
Posted Feb 17, 2010 19:17 UTC (Wed)
by oak (guest, #2786)
[Link]
Posted Jan 21, 2010 16:45 UTC (Thu)
by bronson (subscriber, #4806)
[Link] (1 responses)
Can anyone name an existing app that might break with the foreign memory area in its address space? Based on this article, the complaints about XOL sound awfully obscure.
Adding an emulator to the kernel just for this?? That would be an endless source of problems!
Posted Feb 17, 2010 19:14 UTC (Wed)
by oak (guest, #2786)
[Link]
And the instructions are of course architecture specific.
The good thing is that if one puts breakpoints just on function entry &
If breakpoints can be put anywhere, it's more of an issue. Good question
Posted Jan 21, 2010 17:57 UTC (Thu)
by faramir (subscriber, #2327)
[Link] (1 responses)
Posted Jan 21, 2010 19:02 UTC (Thu)
by Tobu (subscriber, #24111)
[Link]
Posted Jan 21, 2010 19:41 UTC (Thu)
by Tobu (subscriber, #24111)
[Link]
Executables don't control their address space entirely, every library
reserves some pages. Could the extra address space be reserved once by a library everyone
links to, the libc? Utrace would write the displaced instructions into that
shared space, which the process has no control over. No VMA is introduced
while the process is running. Though that requires some coordination. It's more convenient to let the
kernel do the reservation. But it can be done very early to not interfere
with the process and its libraries.
Uprobes: not quite there yet
More than one process executing the same object code
More than one process executing the same object code
More than one process executing the same object code
to let program continue, you might miss the other threads passing that
instruction. SSOL solution leaves the breakpoint in place and
runs/emulates the instruction overwritten by the breakpoint from
elsewhere.
Uprobes: not quite there yet
Uprobes: not quite there yet
instructions which use relative addressing, see Roland's old mail on the
subject:
http://sourceware.org/ml/systemtap/2007-q1/msg00571.html
exit points, those usually use fairly small set of instructions. See e.g.
this user-space function tracing utility that uses SSOL with breakpoints:
http://repository.maemo.org/pool/fremantle/free/f/functra...
is how to test that emulation of all the required instructions for SSOL
works fine...
file tracing considered harmful???
The point of probes is that ls won't notice the extra few
instructions. With
some filtering at a privileged level, the prober won't see leaked
information either.
meh
Uprobes: not quite there yet