|
|
Subscribe / Log in / New account

Dynamic function tracing events

By Jonathan Corbet
February 15, 2018
For as long as the kernel has included tracepoints, developers have argued over whether those tracepoints are part of the kernel's ABI. Tracepoint changes have had to be reverted in the past because they broke existing user-space programs that had come to depend on them; meanwhile, fears of setting internal code in stone have made it difficult to add tracepoints to a number of kernel subsystems. Now, a new tracing functionality is being proposed as a way to circumvent all of those problems.

Whether tracepoints are part of the kernel ABI is not an insignificant issue. The kernel's ABI promise states that working programs will not be broken by updated kernels. It has become clear in the past that this promise extends to tracepoints, most notably in 2011 when a tracepoint change broke powertop and had to be reverted. Some kernel maintainers prohibit or severely restrict the addition of tracepoints to their subsystems out of fear that a similar thing could happen to them. As a result, the kernel lacks tracepoints that users would find useful.

This topic has found its way onto the agenda at a number of meetings, including the 2017 Maintainers Summit. At that time, a clever idea had been raised: rather than place tracepoints in sensitive locations, developers could just put markers that would have to be explicitly connected to and converted to tracepoints at run time. By adding some hoops to be jumped through, it was hoped, this new mechanism would not create any new ABI guarantees. Then things went quiet for a couple of months.

Recently, though, tracing maintainer Steve Rostedt surfaced with a variation on that proposal that he is calling "dynamically created function-based events". The details have changed, but the basic nature of the ABI dodge remains the same. The key detail that is different comes from the observation that the kernel already has a form of marker in place that the tracing code can make use of.

Kernel code is usually compiled with options that are normally used for code profiling. As a result, each function begins with a call to a function called mcount() (or __fentry()__ when a newer compiler is in use). When a user-space program is being profiled, mcount() tracks calls to each function and the time spent there. The kernel, though, replaces mcount() with its own version that supports features like function tracing. Most of the time, the mcount() calls are patched out entirely, but they can be enabled at run time when there is a need to trace calls into a specific function.

There are other possible uses for this function-entry hook. Rostedt's patch uses it to enable the creation of a tracepoint at the beginning of any kernel function at run time. With the tracefs control filesystem mounted, a new tracepoint can be created with a command like:

    echo 'SyS_openat(int dfd, string path, x32 flags, x16 mode)' \
    	 > /sys/kernel/tracing/function_events

This command requests the creation of a tracepoint at the entry to SyS_openat(), the kernel's implementation of the openat() system call. Four values will be reported from the tracepoint: the directory file descriptor (dfd), the given pathname (path), and the flags and mode arguments. This tracepoint will show up under events/functions and will look like any other tracepoint in the kernel. It can be queried, enabled, and disabled in the usual ways. Interestingly, path in this case points into user space, but the tracing system properly fetches and prints the data anyway.

There is evidently some work yet to be done: "I need to rewrite the function graph tracer, and be able to add dynamic events on function return.". But the core is seemingly in place and working. That leaves an important question, though: will it be enough to avoid creating a new set of ABI-guaranteed interfaces to the kernel? Mathieu Desnoyers worried that it might not:

Having those tools hook on function names/arguments will not make this magically go away. As soon as kernel code changes, widely used trace analysis tools will start breaking left and right, and we will be back to square one. Only this time, it's the internal function signature which will have become an ABI.

Linus Torvalds disagreed with this worry, though. The extra step required to hook into the kernel implies a different view of the status of that hook:

Everybody *understands* that this is like a debugger: if you have a gdb script that shows some information, and then you go around and change the source code, then *obviously* you'll have to change your debugger script too. You don't keep the source code static just to make your gdb script happy., That would be silly.

In contrast, the explicit tracepoints really made people believe that they have some long-term meaning.

If reality matches this view, then the new dynamic tracepoint mechanism could go a long way toward defusing the ABI issues. The number of new tracepoints being added to the kernel would be likely to drop, as developers could simply use the dynamic variety instead. When tracepoints are added in the future, it is relatively likely that they will be designed to support some sort of system-management tool and, thus, be viewed as a part of the ABI from the outset.

That assumes that this patch series is eventually merged, of course. There was some dissent from Alexei Starovoitov, who complained that the new interface adds little to what can already be had with kprobes. He also disliked the text-oriented interface, suggesting (unsurprisingly) that BPF should be used instead to extract specific bits of data from the kernel. Rostedt noted, though, that many developers are put off by the complexity of getting started with BPF and would prefer something simpler.

Rostedt said that he thought the interface would be useful, but that he would not continue its development if others did not agree: "If others think this would be helpful, I would ask them to speak up now". Thus far, few people have spoken. If the dynamic function tracing mechanism is indeed something that other developers would like to have available, they might want to make their feelings known.

Index entries for this article
KernelTracing/ABI issues


to post comments

Dynamic function tracing events

Posted Feb 16, 2018 12:21 UTC (Fri) by aggelos (subscriber, #41752) [Link] (1 responses)

I don't know... The narrative in Linus' mail seems somewhat make-believe to me.

Because the kind of person thinking "Ooh, this is a stable ABI" won't be doing interesting work anyway.

I mean, developers do not giddily anticipate the chance to limit the future flexibility of kernel development, they just want to build a useful tool. They will probably have their own conceptions on what is likely to change and break their script in the next few years. In the absence a stable ABI which provides the functionality they need, what choice do they have other than use the most stable-looking dynamic function tracepoint they can find and hope for the best?

Even better, the users of a successful tool which depends on dynamic tracepoints will not know or care about all this at all. All they'll see is that upgrading to kernel x.y.z breaks the tool they've been reliably working with for the past N years.

I mean, if this line of thought applies to dynamic function tracepoints, then why not have "experimental" syscalls in released kernels? After all, by the same logic, woundn't any reasonable developer realize that SYS_unstable_experimental_biohazard() can break at any time and is only there for evaluation purposes? Not sure what the narrative for the users of this software is. "It's not a regression, it's a necessary part of the development process" I guess?

Will be interesting to see how this plays out over the next few years. Perhaps experimental system calls will turn out to be a workable idea after all.

Dynamic function tracing events

Posted Feb 16, 2018 17:34 UTC (Fri) by zblaxell (subscriber, #26385) [Link]

Everything in Linux gets deprecated eventually. Even system calls.

The main difference is that system calls are designed for multi-decade lifetimes and it's usually possible to emulate them with new API, while tracepoints are sprinkled over whatever implementation details are causing someone pain this week, and those implementation details can disappear entirely one day when a better implementation comes along.

Certainly it's possible to make tools around everything you find in a Linux kernel, using everything from system calls to patching the running kernel binary from a privilege escalation exploit; however, if those tools aren't using the parts of Linux designed to last for decades, it seems insane to expect those tools to survive over time scales of decades.

Dynamic function tracing events

Posted Feb 19, 2018 2:48 UTC (Mon) by quotemstr (subscriber, #45331) [Link] (3 responses)

So what exactly happens when powertop depends on a kernel internal function and some kernel deletes that function or changes its signature? This outcome is inevitable. The trace point API stability issue is a social problem, not a technical one. IMHO, it's best to just declare trace points as non-stable API.

Dynamic function tracing events

Posted Feb 19, 2018 4:17 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

I personally would go even further and break them on purpose now and then. Just to make sure no users depend on them.

Dynamic function tracing events

Posted Mar 3, 2018 18:09 UTC (Sat) by nix (subscriber, #2304) [Link]

It seems to me that 'no users depend on them' is another way to say 'no users use them'.

Dynamic function tracing events

Posted Feb 20, 2018 12:21 UTC (Tue) by ballombe (subscriber, #9523) [Link]

As soon as there are no other way to implement a feature than to use an internal API, developers will use the internal API whatever advice to the contrary you give them.
powertop included code to deal with the potential API changes, however the code was not well tested and did not work in this instance.
powertop was a major boon for power management in linux. Not having it would have been worse overall.

Dynamic function tracing events

Posted Feb 22, 2018 0:34 UTC (Thu) by mkatiyar (guest, #75286) [Link]

So, with the tracing using mcount() as a hook, we can only trace the input/starting values to a function. right ? And not really some arbitrary values between a function. Am I correct ?

Dynamic function tracing events

Posted Feb 25, 2018 17:58 UTC (Sun) by bernat (subscriber, #51658) [Link]

Unfortunately, this doesn't replace well-thought static tracepoints. The later can expose more useful information than function arguments, notably when they are dealing with structs. For example, I recently used the following tracepoint to debug a problem on a production server: https://elixir.bootlin.com/linux/v4.15.6/source/net/ipv4/...

I wish there were more of them littered around that I could use to observe the kernel without relying on more heavy tools and bringing a compiler.


Copyright © 2018, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds