White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
Posted May 17, 2024 13:37 UTC (Fri) by bluca (subscriber, #118303)Parent article: White paper: Vendor Kernels, Bugs and Stability
https://fosstodon.org/@bluca/112455500079789967
Any OS booting with any released version of systemd and expecting it to mount a btrfs filesystem is now broken with kernel 6.8+.
But why, oh why do people use distro kernels and do not want to upgrade? Guess we'll never know.
Posted May 17, 2024 13:59 UTC (Fri)
by snajpa (subscriber, #73467)
[Link]
Posted May 17, 2024 14:16 UTC (Fri)
by Bigos (subscriber, #96807)
[Link] (45 responses)
Has the information about the deprecation not been advertised enough? It has been 2 years apparently.
Posted May 17, 2024 14:22 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (1 responses)
If it doesn't happen, and userspace compatibility is broken without regards, then it's really not surprising that people do not want to upgrade their kernels. And the "we do not break userspace" mantra that we always hear some kernel maintainers repeat is, at the very least, inaccurate.
Posted May 17, 2024 16:57 UTC (Fri)
by Tobu (subscriber, #24111)
[Link]
Looking at the commit, the removed options had been showing kernel warnings for the deprecation period.
In the rest of the file, usebackuproot and nologreplay are on the way out as well. It looks like these were intended as rescue options, whereas what systemd wants is something that will make sure images and block devices stay immutable. Wasn't there work to enforce that at the block layer recently?
Posted May 17, 2024 17:00 UTC (Fri)
by mezcalero (subscriber, #45103)
[Link] (42 responses)
Also why in heaven even deprecate this at all? The functionality is still available and supported after all, they just renamed the option. And what's worse, the other big relevant file systems have an option of the same name. Doing the same stuff. Only btrfs is the sole outlier now. Wtf.
Sorry, but you don't get to change kernel interfaces like this and then claim you are a compatibility extremist like the kernel folks "we dont break userspace" mantra suggests.
Posted May 17, 2024 18:07 UTC (Fri)
by hmh (subscriber, #3838)
[Link] (41 responses)
Agreed, and the patch authors agreed too, apparently. The kernel has been [trying to?] warn everyone about it since 5.11. The patch that removed the functionality, added a call to btrfs_warn() in its place.
So, what was removed in 6.8(?) was the btrfs_warn() and option handling. Which is obviously a bug, since it caused an userspace regression, and I fully expect it will be reverted everywhere now that it has been reported as such...
But "announced only though LKML" (which I extend to "source-code-change-flow", i.e. other MLs as well as the git commit log) is not correct. The code clearly attempted to get out a warning to the kernel log, every time anyone tried to mount a btrfs partition with said flag, and done that since 5.11 until the change that caused the current issue.
I wonder if btrs_warn() is being "silenced" by default somewhere? But to an outsider, it looks like whatever the reason, a better way of doing this is needed.
Documentation/ABI/ comes to mind, it is used for sysfs and configfs a lot, maybe it could be used for mount options and filesystem functionality, too...
Posted May 17, 2024 18:48 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (33 responses)
And besides, the main point is a different one: if there are users of a userspace API, do not remove it. Or do, but then stop claiming "we do not break userspace", and don't publish papers wondering why users don't trust new upstream kernel releases and just stay on enterprise stable kernels.
Posted May 19, 2024 3:59 UTC (Sun)
by wtarreau (subscriber, #51152)
[Link] (32 responses)
So that may be what needs fixing in the first place. If new kernel warnings are not detected by the CI, it's for sure aiming at unnoticeable breakage upon upgrades.
Posted May 19, 2024 10:56 UTC (Sun)
by bluca (subscriber, #118303)
[Link] (20 responses)
Posted May 19, 2024 11:17 UTC (Sun)
by wtarreau (subscriber, #51152)
[Link] (19 responses)
Posted May 19, 2024 13:18 UTC (Sun)
by bluca (subscriber, #118303)
[Link] (18 responses)
Posted May 19, 2024 13:59 UTC (Sun)
by mb (subscriber, #50428)
[Link] (16 responses)
Normal interfaces are changed extremely rarely and these obviously are the ones meant by the "do not break userspace" rule.
Yes, it is annoying, if systemd/udev are affected by an interface change. Especially, if this interface change could have been avoided. But it's not the end of the world.
Posted May 19, 2024 14:11 UTC (Sun)
by bluca (subscriber, #118303)
[Link] (15 responses)
Which by the way, neatly explains why vendor kernels are needed and are in fact the only sane choice, despite what the paper cited in this article says. Nobody should run production payloads on upstream kernels at this point, given basic stuff like mount options just breaks left and right.
Posted May 19, 2024 15:20 UTC (Sun)
by mb (subscriber, #50428)
[Link] (14 responses)
I didn't say that it is. I did not talk about this specific thing, because I don't know anything about it. I was talking about "do not break userspace" in the general form, not in this specific case.
Whether this mount change is a sane change is up to somebody else to judge.
> they don't say "we do not break userspace, apart from
Well, they pretty much do exactly that.
If you want 100% full "don't break userspace" without exceptions, we must basically stop all kernel development now.
Having a "don't break userspace without exceptions" is impossible.
> This is something that you have just made up
No. See my example of trace points.
> this is nonsense
I think it would be good to calm down before continuing the discussion.
Posted May 19, 2024 17:32 UTC (Sun)
by bluca (subscriber, #118303)
[Link] (13 responses)
This is very much about "do not break userspace" in the general form. It's the perfect example of why that mantra needs to be put to bed, once and for all, as it's completely disconnected from reality.
> Well, they pretty much do exactly that.
No, they very much do not. Look at all the enthusiastic comments from kernel people pointing to the paper in the article and saying "See? Vendor kernels are BAD, just upgrade to upstream kernels, it's fine really", and when told that new kernel version break applications and that's the real reason why vendor kernels are used, they shrug it away as "impossible, we do not break userspace"
> No. See my example of trace points.
Yes, it is exactly what you did, and there was no mention anywhere of trace points:
> Yes, it is annoying, if systemd/udev are affected by an interface change. Especially, if this interface change could have been avoided. But it's not the end of the world.
You have made up a new rule according to which it's fine to break systemd or udev (if it's not made up, then just point to where on https://kernel.org/doc/ it is defined), but *unspecified other applications* must continue to work. That is very convenient of course, it's always unspecified other applications that are supported, and the ones that break are never actually supported. That's a very easy way of guaranteeing compatibility - every time something goes wrong just say that case was never actually supposed to continue working and move on.
Posted May 19, 2024 18:10 UTC (Sun)
by mb (subscriber, #50428)
[Link] (11 responses)
There have always been exceptions and I didn't make that up. That's just silly. I even gave you an example (tracepoints).
I respect you for what you do for Linux, Systemd and so on. But you're acting like a child right now.
>it's always unspecified other applications that are supported
Yes. That is exactly like it is.
I understand that you are upset that the kernel apparently frequently breaks systemd/udev. But keep in mind that these applications are tightly coupled to the kernel. It's natural that these see more breakage than other average applications.
>every time something goes wrong just say that case was never actually supposed to continue working and move on.
That's not how things are done, though.
Now you will reply: You have made up yet another rule!
Posted May 19, 2024 18:32 UTC (Sun)
by bluca (subscriber, #118303)
[Link] (10 responses)
Literally nobody has mentioned tracepoints. I mean I'm not even sure that really qualifies as a userspace interface - maybe it does, it would seem strange, but I am not a tracing experts. But it is completely unrelated to mount options being removed.
> I understand that you are upset that the kernel apparently frequently breaks systemd/udev. But keep in mind that these applications are tightly coupled to the kernel. It's natural that these see more breakage than other average applications.
Says who? That is very much not true. Every interface that I can think of is used by multiple unrelated applications. I have no idea where you get this from. Cgroups and namespaces? Throw a rock in the general direction of a container runtime and you'll hit either or both. Netlink? There are as many network and interface managers as there are Linux vendors. Process management? That's been around since literally forever, and see the point about container management again. Mounting filesystems? fstab is older than me, I am quite sure.
'We do not break userspace, as long as userspace is a statically linked printf("hello world\n") /sbin/init' doesn't sound as catchy, does it now?
> It's done on a case by case basis.
I am well aware. And the triaging of that case by case goes like this: did it affect the machine that Linus happened to boot on that week? If so, it gets reverted and unpleasant emails are shot left and right. Else, nothing to see, move along.
Posted May 19, 2024 18:48 UTC (Sun)
by mb (subscriber, #50428)
[Link] (9 responses)
What the? I did. I mentioned them as an example for a non-stable interface. After you have asked.
> I mean I'm not even sure that really qualifies as a userspace interface
Oh. I get it. *You* want to define what a userspace interface is and what not.
That is silly.
> Says who?
Me. But I'm not sure why that matters.
> We do not break userspace, as long as userspace is a statically linked printf("hello world\n") /sbin/init
Well. I have never experienced a breakage due to a kernel interface change.
That is my experience.
Posted May 19, 2024 19:31 UTC (Sun)
by bluca (subscriber, #118303)
[Link] (8 responses)
Again, I do not know the first thing about tracepoints and have zero interest in that. Maybe it's a supported interface, maybe it's not, I really cannot say, nor care, and can't see what it has to do with mount options.
> *You* want to define what a userspace interface is and what not.
No, userspace defines what is a userspace interface, as per Hyrum's Law.
> Me. But I'm not sure why that matters.
Because it's just wrong, as explained, there are no "special custom interfaces" being used anywhere, just bog standard stuff used by most components of an operating system.
> I run a two decades old binary and it still works fine.
'We do not break userspace, as long as userspace is mb's statically linked printf("hello world\n") /sbin/init' still not quite as catchy I'm afraid
Posted May 19, 2024 20:23 UTC (Sun)
by mb (subscriber, #50428)
[Link] (7 responses)
Posted May 20, 2024 9:30 UTC (Mon)
by LtWorf (subscriber, #124958)
[Link] (2 responses)
I've had to fix software because of a kernel update, because some files in /sys were moved. But for some reason that doesn't count.
Posted May 20, 2024 9:41 UTC (Mon)
by mb (subscriber, #50428)
[Link]
That is exactly what I was saying. Yet, I'm apparently wrong.
Posted May 20, 2024 9:45 UTC (Mon)
by bluca (subscriber, #118303)
[Link]
Where is that subset defined?
Posted May 20, 2024 11:09 UTC (Mon)
by wtarreau (subscriber, #51152)
[Link]
Welcome to discussions with bluca. Agressivity, half-reading of arguments, and accusations often arrive in the second or third message when he disagrees with you. There are such people who constantly criticize Linux and who would probably do good to the community by switching to another OS of choice :-/
Posted May 20, 2024 11:26 UTC (Mon)
by bluca (subscriber, #118303)
[Link] (2 responses)
Posted May 20, 2024 11:54 UTC (Mon)
by mb (subscriber, #50428)
[Link] (1 responses)
Wow. This is a new level.
Posted May 20, 2024 12:57 UTC (Mon)
by corbet (editor, #1)
[Link]
Posted May 23, 2024 15:48 UTC (Thu)
by anton (subscriber, #25547)
[Link]
Whether that means that vendor kernels are needed, or that one can use upstream kernels if one is selective about them is up to the vendors and their customers to decide.
Posted May 20, 2024 9:28 UTC (Mon)
by LtWorf (subscriber, #124958)
[Link]
Posted May 23, 2024 10:12 UTC (Thu)
by tlamp (subscriber, #108540)
[Link] (10 responses)
A major improvement here could consist of adding a common infrastructure in the kernel to track deprecation.
This data should be assembled on kernel build, possibly even made available on runtime in one of the virtual FS, would allow distros and projects with a lot of kernel interaction like systemd to actually track those and notice those for sure, as scanning for arbitrary warnings that can change wording every point release is just an ugly mess with lots of false-positives/negatives waiting to happen.
If it was available on runtime then checks could be added to the pre- / post-installation scripts/hooks of the kernel distro packages so that users can get a much more noticeable warning printed out on upgrade if their system is affected by such an option removal.
Posted May 23, 2024 12:35 UTC (Thu)
by mb (subscriber, #50428)
[Link] (6 responses)
We had such a deprecation list under Documentation, but I think it got removed a couple of years ago.
Posted May 23, 2024 13:20 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (5 responses)
Far better to do it in the kernel itself. Probably not easy, but move all deprecated stuff into a (or several) modules behind an option "deprecated-6.8" or whatever. Bleeding edge sets all these to "no", and either someone steps up and supports it (removing the deprecated option), or it bitrots until someone says "oh, this broke ages ago, let's delete it".
And then, if there's stuff you really want to get rid off but people need it, every year or so it gets upgraded to "deprecated latest kernel", so hopefully people stop using it and it finally drops out of sight ...
Cheers,
Posted May 23, 2024 14:45 UTC (Thu)
by mb (subscriber, #50428)
[Link] (4 responses)
It would change nothing.
>let's delete it
And that is exactly when people will first start to notice.
Posted May 23, 2024 20:56 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (2 responses)
Except it changes everything
> Every distribution and everyone building their kernel will just enable this option, because stuff will break without enabling it.
You just said it!
The distributions are enabling something that is disabled by default? They're accepting responsibility for keeping it working.
Developers are enabling something that is disabled by default? They're accepting the associated risks.
People are enabling something that is marked "deprecated"? They're being placed on notice that it's being left to bit-rot.
The fact that people have to actively enable something that developers clearly don't want activated means that anybody using it will have three choices - migrate their code away, take over maintenance, or do an ostrich and bury their heads in the sand. Users will still be able to be complain "I didn't know", but their upstream won't have that excuse.
Cheers,
Posted May 23, 2024 21:01 UTC (Thu)
by mb (subscriber, #50428)
[Link] (1 responses)
Do you realize, that most kernel options are disabled by default?
>The fact that people have to actively enable something that developers clearly don't want activated means
It means that developers don't have a clue what people (users!) actually want and need.
Closing your eyes won't make the demand go away, unless you are less than three years old.
Posted May 23, 2024 22:41 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
And how many of those options have "deprecated" in their name? Surely that's a massive red flag.
> It means that developers don't have a clue what people (users!) actually want and need.
And how many developers are employed by (therefore are) users? I believe Alphabet employs loads. Meta employs loads. Most of the kernel developers I have contact with are employed by large end users. It's a little difficult to be oblivious of your own needs. (Some people manage, I'm sure ...)
How difficult is it - to set a "not enabled" flag that cannot be accessed without some sort of warning that this flag will enable deprecated functionality Surely it's not beyond the wit of your typical kernel developer? That's ALL that's required.
Cheers,
Posted May 24, 2024 8:15 UTC (Fri)
by tlamp (subscriber, #108540)
[Link]
I don't think so, mostly because my spitballed proposal was not targeted at solving the "distros get never hurt by deprecation", as IMO that cannot be solved, besides not doing any deprecation at all anymore which hardly is a good solution. Rather, I wanted to target the "how things get communicated and noticed" part and having ab extra compile option with something like "deprecated-6.8-removal-6.12" in the name could actually be quite good for that. The build configs are often tracked and even diffed, and as simple single file can be easily grep'ed against _DEPRECATION_ and then diffed for ones that would trigger soon or new ones, probably even by a CI like systemd uses.
I.e., the status quo is having warnings for deprecation, which can be brittle and are not easy to digest/parse, having that info in an easier to digest manner would help a lot as tools/distros that depend on such options can easily find out when a used one will vanish soon(ish), then they also have no excuse about being unprepared.
> And that is exactly when people will first start to notice.
If their distro or tooling did not do their work then yes, but it wouldn't be the fault of the kernel having a messy deprecation process. IME most bigger distros or big projects like systemd want to avoid that, so if they'd have the definitive information required to do so in just somewhat digestible way, then I really think that most would actually act on that.
Posted May 23, 2024 14:57 UTC (Thu)
by smurf (subscriber, #17840)
[Link] (2 responses)
I need a list of those deprecated calls/options/whatever that the current system is actually using (or rather has been using since booting).
A data structure that gets added to a list which you can check via /proc/deprecated would be quite sufficient for this.
No JSON fanciness required; a textual table identifying the subsystem or module, source file, first and last use timestamps, and its identifier in linux/Documentation/deprecations.yml [yes I know that file doesn't exist yet] would be quite sufficient.
Posted May 24, 2024 8:01 UTC (Fri)
by tlamp (subscriber, #108540)
[Link]
As said, I'd assemble on build so those options that are not relevant for a kernel build config would not be in there (or if still wanted could be tracked differently, i.e., with an extra flag or separate list)
> I need a list of those deprecated calls/options/whatever that the current system is actually using (or rather has been using since booting).
With a declarative list this is trivial to create, as a tool can just scan all modules, mount, ... options and compare if anything explicitly set is in the static list. So if you want this then a static list is IMO really the best way to achieve it, one first needs the definitive list of information before being actually able to do something with it, keeping it dumb on the kernel side and include as much as possible allows (user space or build) tooling to actually do the smart checks.
> A data structure that gets added to a list which you can check via /proc/deprecated would be quite sufficient for this.
Not sure how this minus bikeshedding is any different what I proposed, but I like that we agree in general.
> No JSON fanciness required; a textual table identifying the subsystem or module, source file, first and last use timestamps, and its identifier in linux/Documentation/deprecations.yml [yes I know that file doesn't exist yet] would be quite sufficient.
I'd named JSON simply as an option, explicitly also stated that a simple list could do. But, I named JSON as 1. generating it is trivial (compared to parsing, which isn't hard either, but not trivial anymore) 2. Allows more flexible extension for whatever data or use case gets relevant in the future without having to do a /proc/deprecation2 3. in my projects I try to avoid another not-invented-here format with it subtleties to be added, but sure if it's a simple CSV list that gets generated by the common infra (i.e., not under the control of each kernel dev with their own opinions of the day) then fine by me (not that my acknowledgment would matter anything :)
Posted May 24, 2024 13:45 UTC (Fri)
by donald.buczek (subscriber, #112892)
[Link]
This would be perfect! We would see, what we need to address in our fleet (we are not using a distribution). But distributions would have something to build on, too. They might create a feedback path of this information from the systems of their users to the distribution. The basis for everything is that the information "you are using a mechanism, which will go away" is made available in a structured way.
It is important that the information cannot only be found by digging through masses of unstructured text in mailing lists, documentation, NEWS files, dmesg or other sources and then having to analyze in each individual case whether it is is relevant to you at all.
Posted May 23, 2024 14:01 UTC (Thu)
by eru (subscriber, #2753)
[Link] (6 responses)
These days just about every distribution "helpfully" hides boot messages behind a splash screen, so few users will ever see such warnings. Might as well not be there.
It is a mystery to me why this is done (and after I found out how, I disabled it). The stream of messages is actually a nice progress indicator, and if they get stuck at some point, one gets some idea about what might be wrong.
Posted May 23, 2024 14:49 UTC (Thu)
by mb (subscriber, #50428)
[Link] (2 responses)
Because nobody can read over 9000 messages in 0.5 seconds.
>The stream of messages is actually a nice progress indicator
No, it isn't. Today's boot process is so fast (unless you're stuck with a legacy init system), that console output is pretty much useless and just slows things down, at best.
Posted May 23, 2024 15:19 UTC (Thu)
by eru (subscriber, #2753)
[Link] (1 responses)
Posted May 23, 2024 15:26 UTC (Thu)
by mb (subscriber, #50428)
[Link]
I'm not saying that messages are not useful for debugging. And for that exact reason it's always possible to enable a verbose boot.
A kernel deprecation message would certainly go by unnoticed during the burst of messages.
Posted May 23, 2024 15:14 UTC (Thu)
by zdzichu (subscriber, #17118)
[Link] (2 responses)
Developers, on the other hand, are the target of such messages. I cannot imagine developing a low-level component – like systemd – and not running `dmesg` from time to time. If you use kernel features more than anyone else, you should pay more attention than anyone else.
Posted May 23, 2024 21:14 UTC (Thu)
by mezcalero (subscriber, #45103)
[Link] (1 responses)
But even beyond that: there's *so* *much* *stuff* in dmesg right now. It's a wall, a deluge of text. Don't expect me to read all that. I only look there when I am looking for something, and then I usually do "journalctl -ke", and it never showed up there.
Posted May 23, 2024 21:48 UTC (Thu)
by pizza (subscriber, #46)
[Link]
Fedora just flipped the default, starting with their 6.8 kernels IIRC.
Reverting it is just a matter of:
sysctl kernel.dmesg_restrict=0
> It's a wall, a deluge of text. Don't expect me to read all that.
Fortunately, it's rare when anything beyond the final dozen or so lines matters. (eg the messages that show up when you plug something in, or if something goes wrong...)
Posted May 17, 2024 14:28 UTC (Fri)
by willy (subscriber, #9762)
[Link] (14 responses)
Posted May 17, 2024 14:55 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (4 responses)
I mean we change public interfaces like this in systemd too every now and then - but we just keep the old configuration option around too, undocumented, and either map it to the new one if any, or make it a no-op.
Posted May 17, 2024 15:06 UTC (Fri)
by mb (subscriber, #50428)
[Link] (3 responses)
Posted May 17, 2024 15:17 UTC (Fri)
by bluca (subscriber, #118303)
[Link] (2 responses)
Posted May 17, 2024 16:14 UTC (Fri)
by mb (subscriber, #50428)
[Link] (1 responses)
No. It depends on the change itself whether a silent ignore is better or not. It might be worse to silently ignore something the user has requested.
Posted May 17, 2024 16:22 UTC (Fri)
by bluca (subscriber, #118303)
[Link]
Posted May 17, 2024 16:20 UTC (Fri)
by joib (subscriber, #8541)
[Link] (6 responses)
Posted May 17, 2024 16:53 UTC (Fri)
by adobriyan (subscriber, #30858)
[Link] (5 responses)
Posted May 18, 2024 19:46 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link] (4 responses)
0. /etc/fstab is a configuration file. It's not the kernel's place to modify config files (software like Puppet or Ansible will change it back).
(No, paging the sysadmin at 3 AM is not a reasonable response to this situation. Nothing is actually broken!)
Posted May 18, 2024 20:29 UTC (Sat)
by adobriyan (subscriber, #30858)
[Link] (1 responses)
I know, relax :-) it was a joke.
Posted May 19, 2024 4:01 UTC (Sun)
by wtarreau (subscriber, #51152)
[Link]
Posted May 19, 2024 9:07 UTC (Sun)
by smurf (subscriber, #17840)
[Link] (1 responses)
Posted May 20, 2024 17:35 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link]
Which is also, sort of, the point I'm trying to make here. It is no longer accurate to divide userspace into "applications" and "stuff the sysadmin manually fiddles with." Sysadmins are not manually fiddling with mount options etc. these days. That's all managed by some other piece of software that isn't "the application," but can still break and cause problems all the same. E.g. k8s, Puppet, Docker, etc.
Posted May 18, 2024 17:11 UTC (Sat)
by shemminger (subscriber, #5739)
[Link]
Posted May 19, 2024 4:00 UTC (Sun)
by wtarreau (subscriber, #51152)
[Link]
Posted May 18, 2024 8:05 UTC (Sat)
by zdzichu (subscriber, #17118)
[Link]
This is not true. I have a number of systems running Fedora 39, 40 and rawhide, with btrfs rootfs and other mountpoints. They all run 6.8.x kernels released by Fedora and none has problem with mounting filesystems. And as pointed out, the patch you mention removes 3 deprecation warning that were printed to dmesg for the past couple years. If one doesn't pay attention to the dmesg, kernel developers have no other ways of contacting him.
Posted May 18, 2024 9:33 UTC (Sat)
by niner (subscriber, #26151)
[Link]
Posted May 20, 2024 12:03 UTC (Mon)
by georgyo (subscriber, #121727)
[Link]
I see a lot of people participating in this discussion, but not a lot of people actually affected.
Posted May 21, 2024 13:29 UTC (Tue)
by corbet (editor, #1)
[Link]
Posted May 22, 2024 0:03 UTC (Wed)
by bluca (subscriber, #118303)
[Link] (2 responses)
Posted May 22, 2024 5:16 UTC (Wed)
by smurf (subscriber, #17840)
[Link] (1 responses)
Posted May 22, 2024 11:52 UTC (Wed)
by bluca (subscriber, #118303)
[Link]
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
There are interfaces used by normal programs and there are special interfaces used by special programs like systemd and udev.
Every other decades old application will continue to work. That is what counts.
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
> command, seems hardly any "special"
Sometimes they actually spell out what "apart from" means.
For example trace points are an exception. There are more exceptions.
Every change is user visible eventually. Even simple changes like adding a new syscall can break programs, if the program was using the new syscall number and depended on it returning ENOSYS.
> because fuck those people
White paper: Vendor Kernels, Bugs and Stability
> Every other decades old application will continue to work. That is what counts.
White paper: Vendor Kernels, Bugs and Stability
It's up to you to ignore that. But please stop saying that I made it up.
Yes, that is unfortunate and could certainly be improved.
But please don't generalize to other applications.
There have been reverts of ABI changes due to application breakages in the past.
It's done on a case by case basis.
White paper: Vendor Kernels, Bugs and Stability
It's up to you to ignore that. But please stop saying that I made it up.
White paper: Vendor Kernels, Bugs and Stability
And everybody who disagrees is "making it up" or talking "nonsense".
I run a two decades old binary and it still works fine.
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
I'll stop here.
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
> I'll stop here.
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
This clearly is not going anywhere useful, can we all let it go at this point, please?
Stop here please
White paper: Vendor Kernels, Bugs and Stability
This is very much about "do not break userspace" in the general form. It's the perfect example of why that mantra needs to be put to bed, once and for all, as it's completely disconnected from reality.
Is it? When the breakage of existing code is reported as a bug, do the kernel developers declare the bug report as invalid, or do they fix the bug? If it's the latter, they live up to the principle. Sure, one might wish that such bugs would never happen, but apparently they feel that that going for that would be too constricting for kernel development.
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
It should allow the kernel build system to generating a declarative list (or something more structured like JSON) that includes info like "driver/module", "option" name, "kernel release it got deprecated", and "kernel release where removal is planned".
White paper: Vendor Kernels, Bugs and Stability
It was not very useful and suffered from major bitrot.
White paper: Vendor Kernels, Bugs and Stability
It was not very useful and suffered from major bitrot.
Wol
White paper: Vendor Kernels, Bugs and Stability
Every distribution and everyone building their kernel will just enable this option, because stuff will break without enabling it.
Just like everybody enabled the - how was it called? - EXPERIMENTAL option.
Such options are useless.
And there is not much anybody can do about that *except* to not break/deprecate stuff.
White paper: Vendor Kernels, Bugs and Stability
Wol
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
Wol
White paper: Vendor Kernels, Bugs and Stability
> Every distribution and everyone building their kernel will just enable this option, because stuff will break without enabling it.
> Just like everybody enabled the - how was it called? - EXPERIMENTAL option.
> Such options are useless.
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
>
> I need a list of those deprecated calls/options/whatever that the current system is actually using (or rather has been using since booting).
>
> A data structure that gets added to a list which you can check via /proc/deprecated would be quite sufficient for this.
>
> No JSON fanciness required; a textual table identifying the subsystem or module, source file, first and last use timestamps, and its identifier in linux/Documentation/deprecations.yml [yes I know that file doesn't exist yet] would be quite sufficient.
White paper: Vendor Kernels, Bugs and Stability
The kernel has been [trying to?] warn everyone about it since 5.11.
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
But in the vast majority of cases nothing is "stuck" (didn't happen in a decade for me) and there would just be a blast of messages during a couple of seconds boot time that nobody reads.
Therefore, the default should be no kernel messages and also no systemd messages.
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
1. Assuming you meant /etc/mtab, that's (usually) managed in userspace. It's also none of the kernel's business.
2. Assuming you meant /proc/mounts, there might be some userspace software that parses it and compares it against what it "should" look like, and misbehaves if a random mount option is missing (e.g. gets stuck in a hot loop of repeatedly remounting the device with the "correct" mount options, fires an alert and tells the sysadmin to come running, etc.).
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
Easier to just silently sweep legacy under the rug, and get on with fixing real issues.
White paper: Vendor Kernels, Bugs and Stability
btrfs in 6.8.x
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability
Just for anybody who hasn't long since tuned out this thread...today the Btrfs regression was reported to the Btrfs developers. Less than one hour later, a Btrfs developer acknowledged the problem and agreed to add the norecovery option back.
Btrfs regression
White paper: Vendor Kernels, Bugs and Stability
So looks like it was a regression after all ;-) All's well what ends well
White paper: Vendor Kernels, Bugs and Stability
White paper: Vendor Kernels, Bugs and Stability