Compatibility

Posted May 17, 2024 22:48 UTC (Fri) by jra (subscriber, #55261)
In reply to: Compatibility by corbet
Parent article: White paper: Vendor Kernels, Bugs and Stability

Testing testing testing :-). Every time we do this in Samba we add a new regression test to make sure we don't at least break that specific network feature ever again.

I think by the time upstream makes it into Greg's kernel trees any userspace breakage has already been found and fixed. I'm not aware of any userspace breakages there (although please correct me if I'm wrong, I'm still pretty new to this kernel stuff).

"You can never be too thin, too rich, have too much disk space or too many regression tests" :-).

Compatibility

Posted May 17, 2024 22:52 UTC (Fri) by bluca (subscriber, #118303) [Link]

See first comment on this article for another even more recent example

Compatibility

Posted May 18, 2024 6:47 UTC (Sat) by marcH (subscriber, #57642) [Link]

Promises without tests are just empty words; if it's not tested then it does not work. Every piece of software is only as good as the tests it's passing.

Etc.

Compatibility

Posted May 19, 2024 4:40 UTC (Sun) by wtarreau (subscriber, #51152) [Link] (3 responses)

One difficulty that makes this a reality is present in all software: some features happen by accident and were not intentional by developers but are more convenient side effects. They also have the interesting property of not being tested since they're unknown. When users start to rely on these and the code changes, things can break.

This is commonly visible in various setup scripts that rely on contents of /proc, /sys, output from lsmod etc. In other software sometimes a major upgrade will add support for new options to a keyword and reveal that a previous version would silently ignore any such options and that a typo in a previous version now raises an issue.

There's never a perfect solution to this. If users would only use perfectly documented and intended features, systems and programs would be very poor and boring, and nothing would evolve. They would rightfully complain even more about the occasionally needed breakage due to architectural changes. If they use any single possibility, they spark the addition of new features but face unintended changes more often, and developers have a more difficult time making their code evolve.

A sweet spot always has to be found, where developers document the intended use and users scratch a bit around what is officially supported, keeping the intended use in mind so as to limit the surprise in case of changes.

Overall I find that Linux is not bad at all on this front. You can often run very old software on a recent kernel and it still works fine. And at the same time I'm one of those being extremely careful about major upgrades because if I don't see the breakage, I suspect it will happen in my back at the moment I would have preferred not to face it. Nobody wants to discover their microphone is no longer working when joining a visio-conference, or that their 4G adapter fails to initialize when waiting for a train or plane at the station for example. On a laptop I've very rarely experienced issues, except with broken or poorly supported hardware whose behavior would change in various ways along versions. On servers I'm much more cautious because of subtle changes experienced overa long time in bonding/bridging/routing setups, and iptables also showing some more config granularity that reveals in field that you're missing some new config options from time to time. And this is a perfect reason for sticking to a stable kernel when you have other things to do than to retest everything.

Compatibility

Posted May 19, 2024 17:00 UTC (Sun) by Wol (subscriber, #4433) [Link] (1 responses)

> There's never a perfect solution to this. If users would only use perfectly documented and intended features, systems and programs would be very poor and boring, and nothing would evolve. They would rightfully complain even more about the occasionally needed breakage due to architectural changes. If they use any single possibility, they spark the addition of new features but face unintended changes more often, and developers have a more difficult time making their code evolve.

I don't think you're right here. We'll never find out, of course, because developers (and open source collaborative projects are amongst the worst) are very bad at writing documentation.

> A sweet spot always has to be found, where developers document the intended use and users scratch a bit around what is officially supported, keeping the intended use in mind so as to limit the surprise in case of changes.

That sweet spot, as an absolute minimum, needs to include DESIGN documentation. This is where Linus' management style really fails (although given that he's cat-herd in chief I don't know any style that would succeed), because so much of Linux is written by people scratching itches, fixing bugs, and everyone is fixing their immediate problem.

Nobody steps back, and asks "what is the purpose of the module I'm working? What is sensible or stupid? How can I design this to be as *small*, *complete*, and *self-contained* as possible?".

Cheers,
Wol

Compatibility

Posted May 20, 2024 11:15 UTC (Mon) by wtarreau (subscriber, #51152) [Link]

> needs to include DESIGN documentation

You're absolutely right on these points. I have asked myself in the past the reasons for this situation in a lot of such software documentation and came to the conclusion that it's based on the fact that aside a few exception, there's mostly no design but an improvement on top of something that already exists. As such, lack of design phase implies lack of design documentation. Sometimes someone spends so much time reversing something they work on that they take a lot of notes and end up creating a doc of the existing design. It does not necessarily justify choices but can be a good point to encourage new participants to update the doc. But that's not *that* common.

I'm not seeing any good solution to this , though :-/

Compatibility

Posted May 23, 2024 16:49 UTC (Thu) by anton (subscriber, #25547) [Link]

A sweet spot always has to be found, where developers document the intended use and users scratch a bit around what is officially supported, keeping the intended use in mind so as to limit the surprise in case of changes.

That's what C compiler advocates use to defend miscompilation: "works as documented". And while this bad principle is used for declaring gcc bug reports invalid, the actual practice of gcc development seems to be better: They use a lot of tests to avoid breaking "relevant" code (including for cases outside the documented behaviour), and the irrelevant code rides in the slipstream of relevant code.

Fortunately, the kernel aspires to a better principle, the often-cited "we don't break user space". And given that we have no way to automatically check whether a program conforms to the documentation, that principle is much more practical (and the practice of gcc is also in that direction, but only for "relevant" code).

Concerning accidental features, a careful design of interfaces avoids those. E.g., one can catch unintended combinations of inputs and report them as errors. Or define and implement useful behaviour in such a case. Exposing some arbitrary accident of the implementation in such cases leads to the difficulties you point out. Given that Linux' interface to user space is also a security boundary, carefully defining interfaces and avoiding exposing implementation artifacts is a good idea anyway.