LTTng rejection, next generation
LTTng descends from some of the earliest dynamic tracing work done for Linux. Its distinguishing characteristics include integrated kernel- and user-space tracing, performance sufficient to deal with high-bandwidth event streams, and a well-developed set of capture and analysis tools. LTTng has always been maintained out of the mainline kernel tree, but it is packaged by a number of distributors and has base of dedicated users, some of whom have been happy to fund ongoing LTTng development work.
Had LTTng been merged years ago, the story may have been much simpler, but, for a number of reasons (including the simple fact that, for years, any sort of tracing capability was hard to sell to the kernel development community) that did not happen. So we have ended up with a number of projects in this area, including SystemTap (which also remains out-of-tree), and the in-tree ftrace and perf subsystems. Naturally, none of these solutions has proved entirely satisfactory so, while there has been a fair amount of pressure to consolidate the various tracing projects, that has tended not to happen.
That is not to say that there has been no progress at all. Some agreement has been reached on the format of tracepoints themselves; much of the work in that area was done by primary LTTng maintainer Mathieu Desnoyers. As a result, the number of tracepoints in the kernel has been growing rapidly, making kernel operations more visible to users in a number of ways. A lot of talk about merging more infrastructure has been heard over the years - said talk was often audible from a great distance at various conferences - but progress has been minimal. It seems to be easy for developers in this area to get bogged down on the details of ring buffers, event formats, and so on at the expense of producing an actual, usable solution.
To Mathieu, merging into the staging tree must have looked like an attractive way to push things forward. The relaxed rules for that tree would allow the code into the mainline where its visibility would increase, any remaining issues could be fixed up, and more users could be found. It all seemed to be working - some cleanup patches from new developers were posted - until Mathieu tried to add exports for some core kernel symbols so LTTng could access them. That attracted the attention of the core kernel developers who, to put it gently, were not impressed with what they saw.
In the end, Ingo Molnar vetoed the whole patch series and asked Greg Kroah-Hartman to remove the LTTng code from staging. Greg complied with that request, with the result that LTTng is, once again, no closer to merging into the mainline than it was before. This particular story line, it seems, has at least one more season to run yet.
What is it about LTTng that makes it unsuitable for merging into the mainline? It starts with a lot of duplicated infrastructure. Inevitably, LTTng brings in its own ring buffer to communicate events to user space, despite the fact that the two ring buffers used by perf and ftrace are already seen as being too many. There is a new instrumentation mechanism for system calls - something that the kernel already has. And, of course, there is a new user-space ABI to control all of this - again an unwelcome addition when there is strong pressure from some directions to merge the existing in-kernel tracing ABIs.
Duplicated infrastructure always tends to be hard to merge into the mainline; duplicated user-space ABIs, which must be supported forever, are even more so. It is thus not surprising that there is pushback against these patches, even without considering the highly contentious nature of the discussion around tracing work in general. Ingo claims to be receptive to merging the parts of LTTng that are better than what the kernel has now - after it has been unified with the existing infrastructure, of course - but, he says, Mathieu has been more interested in maintaining LTTng as a separate "brand" and has been unwilling to merge things in this way.
Mathieu's response has not done much to address those concerns. Duplicate infrastructure, he said, is fine as long as there is no agreement on how that infrastructure should work. Thus, he said, it is better to get his ring buffer into the mainline and to try to work out the differences there. He takes a similar approach to the ABI:
The points that are missed here are that (1) filesystems do, in fact, share the same ABI, and (2) there is indeed a cost to multiple ABIs for tracing. Those ABIs have to be maintained indefinitely and they fragment the efforts of tool developers who find themselves forced to choose one or the other. Unless he can produce a convincing proof that the existing kernel interfaces cannot possibly be extended to meet LTTng's needs, Mathieu will almost certainly not succeed in getting a new tracing ABI into the mainline.
Two notable conclusions were reached at the 2011 Kernel Summit. One was that
maintainers should say "no" more often and accept fewer new features into
the mainline; that would argue that Ingo and others are right to block the
addition of LTTng in its current form. But the other conclusion was that
code that has been shipped for years and that has real users should be
strongly considered for merging even if it has known technical
shortcomings. That, of course, would argue for merging LTTng, which
certainly meets those conditions. Given the players involved, that
conflict seems almost certain to be resolved with LTTng remaining an active
project out of
the mainline. Tune in next year for another episode of "As the tracing
world turns."
| Index entries for this article | |
|---|---|
| Kernel | Development tools/Kernel tracing |
| Kernel | Tracing |
