|
|
Subscribe / Log in / New account

How to unbreak LTTng

By Jonathan Corbet
April 20, 2020
Back in February, the kernel community discussed the removal of a couple of functions that could be used by loadable modules to gain access to symbols (functions and data structures) that were not meant to be available to them. That change was merged during the 5.7 merge window. This change will break a number of external modules that depended on the removed functions; since many of those modules are proprietary, this fact does not cause a great deal of anguish in the kernel community. But there are a few out-of-tree modules with GPL-compatible licenses that are also affected by this change; one of those is LTTng. Fixing LTTng may not be entirely straightforward.

LTTng is a tracing subsystem; to carry out that sort of task, it must be able to hook into the kernel in a number of fairly deep places. It is unsurprising that LTTng was accessing parts of the kernel that are not deemed suitable for export to modules in general. Losing access to kallsyms_on_each_symbol() deprived LTTng of the ability to find those addresses, thus breaking much of its functionality. That is not welcome news to those who work on — or use — LTTng.

LTTng developer Mathieu Desnoyers has responded to this change with a patch series exporting a number of new symbols; with those available, LTTng can do what it needs to do without using the rather more general kallsyms_on_each_symbol() function. For example, LTTng needs access to stack_trace_save_user() to be able to save user-space stack traces. It also needs access to functions like task_prio(), disk_name(), and get_pfn_blocks_mask(). LTTng obtains kernel information from tracepoints as well, of course, and that usage will increase as tracepoints replace some of the direct internal accesses that were used before. The patch set raises the number of arguments that can be passed to a BPF program from a tracepoint to an eye-opening 13 (to allow more information to be passed out via a specific tracepoint), but that change may prove to be unnecessary in the end.

Anybody who has watched the kernel community for any period of time can probably guess what sort of reception this patch series received. Christoph Hellwig was characteristically blunt: "Which part of every added export needs an in-tree user did you not get?" The kernel community as a whole is strongly resistant to the idea of adding any sort of support for code that is outside of the kernel repository. Much of that resistance comes from a dislike for proprietary kernel modules in general, but there is a bit more to it than that.

LTTng, being free software, should not be affected by any antipathy for proprietary kernel code. But, as Greg Kroah-Hartman explained, there are still reasons to avoid adding support for free, out-of-tree modules. Once those modules are supported in some way, they add constraints to what kernel developers can do. Internal kernel interfaces can be changed as needed; since all of the users of those interfaces are present in the same code base, they can be changed at the same time. If external modules have to be supported, though, it becomes harder to make such changes, since the users cannot be changed to match. Indeed, it becomes difficult to even know when a change might cause problems elsewhere.

Thus, Kroah-Hartman said:

We can't do anything for out-of-tree modules as they suddenly become "higher priority" than in-tree code if you have to not do specific changes or extra work for them. Which is not fair at all to the in-tree code developers at all.

This all suggests that there is not much of a path forward for LTTng. It is unable to function without access to kernel internals, and that access is being expressly denied.

There is, of course, one other option that was first raised by Steve Rostedt: "I guess we should be open to allowing LTTng modules in the kernel as well". If LTTng were actually a part of the mainline kernel, there would no longer be problems with giving access to the resources that it needs.

This is not a new idea. Numerous attempts have been made to get the LTTng code into the mainline kernel, without success. In the early days, before the kernel had any sort of tracing capability at all, adding that feature was a hard sell. Kernel developers now are heavily dependent on tracing for their own work and would strongly resist any attempt to take that capability away, but it was not that long ago that many of the same developers were unconvinced that tracing was needed at all. During that time, getting any tracing features into the kernel was not easy.

Over time, some low-level LTTng code found its way in, but LTTng as a whole has not followed. More recently, in 2011, LTTng was brought into the staging tree by Kroah-Hartman as a first step toward merging it. That move brought about a great deal of hostility, some of which seems familiar; a rather lengthy thread was set off by an attempt to export task_prio(), for example. In the end, LTTng was pushed back out of the staging tree — as it was before and has been ever since.

So LTTng would appear to be in a difficult position: unable to function outside of the kernel, and unable to be merged. Leaving LTTng broken would cause serious harm to a lot of users, though, and seems unlikely to advance the cause of Linux or free software in general. So perhaps the time has come for something to give. If a handful of symbols truly cannot be exported for this subsystem, perhaps some space could be found in the mainline for a widely used tracing subsystem, even if it somehow duplicates some of the functionality that is already there.

Index entries for this article
KernelDevelopment tools/Kernel tracing
KernelModules/Exported symbols
KernelTracing


to post comments

How to unbreak LTTng

Posted Apr 20, 2020 23:05 UTC (Mon) by cesarb (subscriber, #6266) [Link] (12 responses)

> unable to function outside of the kernel, and unable to be merged.

Back in the day, things like that would be distributed as a patch set against the upstream kernel, and users would be expected to recompile their kernel with it. Is that no longer an option?

How to unbreak LTTng

Posted Apr 21, 2020 1:05 UTC (Tue) by neilbrown (subscriber, #359) [Link] (11 responses)

> things like that would be distributed as a patch set against the upstream kernel, and users would be expected to recompile their kernel with it.

I suspect that today a more realistic approach is to ask the various distributors to apply that patch to their kernels.

I think it is worth reflecting for a moment on the motivation behind these changes. They seem to be coming from Android. The Android kernel has good reason to lock down the exported inferfaces so that phone vendors cannot 'abuse' them. I fully support that work, but don't think that it should necessarily impose what I do on my device or what a distro does with their supported kernel.
So if distros want to patch out these restrictions, that might make perfect sense.
Of course it could work the other way around - Android could patch in these restrictions. But we have a long history of trying to bring Android back to mainline - and requiring them to patch in the restrictions would hurt the momentum we have.

My preference would be that kallsyms_lookup_name(), could be exported or not depending on a CONFIG option. Android could set that to hide the function and LTTng wouldn't work on Android. Probably no loss there.
Other distros could set the config option the other way and LTTng would work fine on them.
When I build my own kernels, I can (of course) do whatever I want.
Problem solved.

How to unbreak LTTng

Posted Apr 21, 2020 11:50 UTC (Tue) by dyfrgi (guest, #122539) [Link] (10 responses)

> My preference would be that kallsyms_lookup_name(), could be exported or not depending on a CONFIG option. Android could set that to hide the function and LTTng wouldn't work on Android.

Is that really adequate? Wouldn't phone vendors just set the config flag? Though tbh I also don't understand why they wouldn't just patch the kernel. My understanding is that they all do it for hardware support anyway.

How to unbreak LTTng

Posted Apr 21, 2020 12:11 UTC (Tue) by smurf (subscriber, #17840) [Link] (1 responses)

The "all Android vendors hack their kernel" idea is on the way out. The idea is that vendor specific kernel modules (required to access vendor specific hardware) live in a /vendor partition. The device should otherwise use a generic Android kernel.

How to unbreak LTTng

Posted Apr 22, 2020 0:25 UTC (Wed) by neilbrown (subscriber, #359) [Link]

Quoting from https://lwn.net/Articles/813350/

> But that only holds if modules are restricted to the exported-symbol interface; if they start to reach into arbitrary parts of the kernel, all bets are off. Deacon doesn't say so, but it seems clear that some vendors are, at a minimum, thinking about doing exactly that.

So while I agree with what you say, I don't think it is relevant to my comment.
Thanks.

How to unbreak LTTng

Posted Apr 22, 2020 11:52 UTC (Wed) by pbonzini (subscriber, #60935) [Link]

Is they can set the flag they can also patch the kernel to export whatever symbol they need.

How to unbreak LTTng

Posted Apr 22, 2020 14:38 UTC (Wed) by Wol (subscriber, #4433) [Link] (6 responses)

> Is that really adequate? Wouldn't phone vendors just set the config flag?

Google can easily stop that. There's nothing to prevent vendors doing it, but Google can just say "if you do that, you can't call it Android".

Cheers,
Wol

How to unbreak LTTng

Posted Apr 22, 2020 16:33 UTC (Wed) by pizza (subscriber, #46) [Link] (4 responses)

> Google can easily stop that. There's nothing to prevent vendors doing it, but Google can just say "if you do that, you can't call it Android".

...And then those companies start complaining to their various national anti-trust bodies, and Google gets threatened with billion-dollar fines.

How to unbreak LTTng

Posted Apr 22, 2020 18:41 UTC (Wed) by Wol (subscriber, #4433) [Link]

Well, Google is just using Trademark Law in exactly the manner it was meant to be used. All those companies complaining won't get very far in any sane jurisdiction (yes, I know, since when can any country claim to have a sane jurisdiction ...)

Cheers,
Wol

How to unbreak LTTng

Posted Apr 22, 2020 20:26 UTC (Wed) by excors (subscriber, #95769) [Link] (2 responses)

Google already has a big list of "if you do that, you can't call it Android" requirements, including requirements on specific Linux kernel features: https://source.android.com/compatibility/android-cdd

(You can still freely use AOSP and ignore those requirements as long as you don't call it Android, and some large companies that compete with Google already do that, which makes it harder to argue that Google is being anti-competitive here.)

How to unbreak LTTng

Posted Apr 22, 2020 21:15 UTC (Wed) by pizza (subscriber, #46) [Link]

> which makes it harder to argue that Google is being anti-competitive here.

You forget that Google is obviously the only reason why $DomesticBusinessSector is not making lots of money, so any restrictions they impose upon folks using their stuff is clearly anticompetitive behavior that must be harshly punished.

(I'm not saying I agree with this, but many others do)

How to unbreak LTTng

Posted Apr 24, 2020 16:00 UTC (Fri) by ballombe (subscriber, #9523) [Link]

> You can still freely use AOSP and ignore those requirements as long as you don't call it Android, and some large companies that compete with Google already do that, which makes it harder to argue that Google is being anti-competitive here.)

True, however if you do not call it Android, you cannot ship the Google apps with it. So there are some real consequences.

How to unbreak LTTng

Posted Apr 22, 2020 16:38 UTC (Wed) by corbet (editor, #1) [Link]

That is already a part of the plan as I understand it. The kernel will become part of the generic system image, so it's provided by Google, not the device vendors, who are limited to providing kernel modules. They they will indeed not be able to set that config flag.

How to unbreak LTTng

Posted Apr 21, 2020 12:08 UTC (Tue) by ncultra (✭ supporter ✭, #121511) [Link] (6 responses)

All of this sounds reasonable at first glance, but there are a couple points that are troublesome to me. Hellwig's objection declares the requirement for any exported symbol to have an in-tree user. Yet, in-tree status is arbitrary and exclusive. LTTng is a good example: there is already a tracing subsystem in-tree, so let's exclude any duplicate functionality. Any one of a clique of long-time maintainers can essentially veto a patch series submitted for upstream consideration. In the end, the decision (or lack of decision) is up to a single individual. There is a universe of free, out-of-tree modules that are useful but disadvantaged by this reality. In-tree status is unrealistic for many free kernel modules. Secondly, this is about in-tree symbols that were removed, not simply rejected. It is therefore punitive in addition to exclusionary. This is the latest in a series of technically dubious policies driven by certain maintainers, notably Kroah-Hartman, to punish out-of-tree developers. Maintaining a driver or module for Linux is a unique experience, in a bad way, with the lack of a stable kernel API and, not for the first time, removal of existing symbols.

It gets worse when considering the purpose of a symbol such as kallsyms_on_each_symbol() which is to be a support to developers, and is critical for maintaining a driver or module that can be compiled and linked on multiple kernel versions. This is an accepted practice that is now made much more difficult. In my opinion, this philosophy that punishes out-of-tree developers (regardless of license choice) is counter-productive to the continued existence of Linux as we know it.

How to unbreak LTTng

Posted Apr 21, 2020 13:28 UTC (Tue) by mfuzzey (subscriber, #57966) [Link] (5 responses)

>LTTng is a good example: there is already a tracing subsystem in-tree, so let's exclude any duplicate functionality.

Generally excluding duplicate functionality seems to be a good idea for long term maintenance and code size surely.

When competing solutions exist the best bits of each can be, and often are it seems to me, merged to make a better in tree solution.

And in this particular case Steve Rostedt was quoted in the article as being open to including LTTng

> In-tree status is unrealistic for many free kernel modules

Why? For non-free ones sure but you explicitly said free modules.

> to punish out-of-tree developers.

Why? Sure some decisions may make there life harder but "punish" implies an "intention to hurt". What makes you think this is the case rather than just trying to do what is best for the kernel as a whole (even if that makes things harder for out of tree modules)?

> with the lack of a stable kernel API

The reasons for this are well documented and make sense to many. This is really part of the Linux philosophy today I'd say. No one forces anyone to use Linux.


How to unbreak LTTng

Posted Apr 21, 2020 14:09 UTC (Tue) by Paf (subscriber, #91811) [Link] (2 responses)

“ Why? For non-free ones sure but you explicitly said free modules.”

It’s (in many cases) a huge amount of ongoing work, requiring not only commitment and effort but persuasion of others that you have commitment and effort.

The idea is that not everything that exists is up to the standards for mainline inclusion, and not everything that exists is going to avoid duplication to the level that’s appropriate for the mainline.

This is “let a thousand flowers bloom” stuff - it’s about being open to a world of stuff beyond the domain of what’s good enough to be a permanent part of mainline. Hobby projects (that still have outside users), small stuff, differing approaches to existing functionality that are not clearly better in general (see LTTTNG for an example of that) but are preferred by some users, etc.

I recognize this is a sticky issue (well, sort of), but this is still a choice to be more closed.

How to unbreak LTTng

Posted Apr 28, 2020 21:35 UTC (Tue) by ecree (guest, #95790) [Link] (1 responses)

> It’s (in many cases) a huge amount of ongoing work

Maintaining an in-tree module is *way* less work than maintaining an out-of-tree one. Heck, most of the time you just have to say "Ack" on the patches that the person changing some kernel infrastructure _writes for you_ to keep your module working. Whereas out-of-tree you end up with elaborate compatibility scripts, that you have to keep updated without any outside help, unless you're happy for your module to either only build on the latest kernel, or be tied to 2.6.32 forever 'cos that's what was ubiquitous when you started the project.

If it's tight enough with the kernel to be a module (rather than a userspace executable, using the APIs that are actually designed to be stable), then it belongs in the upstream tree; and if it's not good enough quality to be upstream (even in staging or behind a 'default n' kconfig) then it doesn't belong anywhere.

How to unbreak LTTng

Posted Apr 28, 2020 22:06 UTC (Tue) by mpr22 (subscriber, #60784) [Link]

Modules include things like "device driver to operate an FPGA payload that has less than one monkeysphere of users, most of whom know the driver author personally".

How to unbreak LTTng

Posted Apr 21, 2020 16:10 UTC (Tue) by nilsmeyer (guest, #122604) [Link]

> Generally excluding duplicate functionality seems to be a good idea for long term maintenance and code size surely.

What about SELinux and AppArmor (and Smack and TOMOYO)? There are probably a lot of other examples. It often feels arbitrary to the outside observer,

How to unbreak LTTng

Posted Apr 22, 2020 14:42 UTC (Wed) by Wol (subscriber, #4433) [Link]

> Why? Sure some decisions may make there life harder but "punish" implies an "intention to hurt". What makes you think this is the case rather than just trying to do what is best for the kernel as a whole (even if that makes things harder for out of tree modules)?

Because there's evidence that some kernel maintainers, at least, do not make these decisions based on technical evidence, but on how much grief it's going to cause people that these maintainers perceive as "bad actors".

The fact that it causes a lot of grief to innocent bystanders does not get taken into account.

Cheers,
Wol


Copyright © 2020, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds