Ext4 data corruption in stable kernels [LWN.net]

Ext4 data corruption in stable kernels

Posted Dec 10, 2023 6:20 UTC (Sun) by ewen (subscriber, #4772) [Link] (40 responses)

Unfortunately the "stable" Linux kernels have (at least historically) had a policy of cherry picking commits from later kernels into the "stable" branches, even if they weren't explictly sent for inclusion into stable :-/ Which means the "stable" kernels aren't especially safe to use. But unfortunately there's nothing better (even distro stable kernels, like Debian and Ubuntu, are usually tracking changes from the "stable" Linux upstream).

In this case it appears the commit was picked from 6.5-rc1 (ie, first merge set from preparing 6.5), and then the "stable" kernel was shipped without waiting for the commit that came along later in the 6.5 development cycle to fix the issue. It took two stable releases (6.1.64, 6.1.65) before enough time had passed for people to notice the issue was in backports too and undo/fix it (in 6.1.66), because the "stable" Linux kernel releases come out extremely frequently with non-trivial sets of patches in each (making it hard to review in detail).

I've got some sympathy for back porting "important" fixes from *released* Linux kernels to older "stable" kernel branches. But it should be carefully chosen, not "automatically cherry picked", and only the release kernels, not random release candidates (especially not at random from early release candidates).

And yes, you're right, this isn't the first time something like this has happened. It just happens to be one of the worst cases, because it could cause file system corruption on the most commonly used file system :-(

(In my case I narrowly avoided upgrading to a Debian release with the problematic kernel; I've got some systems I'd planned to upgrade tomorrow, but now won't until things are more stabilised again.)

Ewen

Ext4 data corruption in stable kernels

Posted Dec 10, 2023 7:56 UTC (Sun) by rolexhamster (guest, #158445) [Link] (39 responses)

... it appears the commit was picked from 6.5-rc1 ..

Indeed, why was it picked up from 6.5-rc1 (a test kernel!) and backported to stable kernels? Why would any commit be picked up from rc1 through to rc8 etc? It's basic risk management not to do that.

At the very least wait until 6.5.1 or 6.5.2. Even then, only if it fixes something actually demonstrable and testable.

In the current situation, one would be right to think that stable kernels are just an outpost in the wild west.

Ext4 data corruption in stable kernels

Posted Dec 10, 2023 8:07 UTC (Sun) by mb (subscriber, #50428) [Link] (37 responses)

>Indeed, why was it picked up from 6.5-rc1 (a test kernel!) and backported to stable kernels?

Because it is marked
CC: stable@vger.kernel.org

That is the developer saying: It should be backported to stable, as soon as it hits mainline.
And that is exactly what happened.

>Why would any commit be picked up from rc1 through to rc8 etc?

To fix things?

>It's basic risk management not to do that.

No. It would just delay a lot of fixes.

> At the very least wait until 6.5.1 or 6.5.2. Even then, only if it fixes something actually demonstrable and testable.

The patch was marked for stable inclusion. Which means the author has demonstrated and tested the problem and has then thought it would be needed to backport it.

Mistakes happen.

Ext4 data corruption in stable kernels

Posted Dec 10, 2023 12:08 UTC (Sun) by wtarreau (subscriber, #51152) [Link] (19 responses)

> The patch was marked for stable inclusion. Which means the author has demonstrated and tested the problem and has then thought it would be needed to backport it.
>
> Mistakes happen.

Definitely, there are still humans in the delivery chain. Everything went well this time and only two versions were affected in the end. I think we're just facing another grumpy user who wants 100% guarantee of zero bug. The same type of people who complain about unanticipated storms and who then complain about mistaken weather forecast when it announces rain that doesn't come. There's a solution to this: not using a computer nor anything made using a computer nor anything made by something made using a computer. Living in the woods making fire by hitting stones can have its fun but will not necessarily be safer.

For my stable kernel usages, I *tend* to pick one or two versions older than the last one if I see that the recent fixes are not important for me (i.e. I won't miss them). This helps to avoid such cases. But that's not rocket science, and for this one I would likely have updated to that version precisely because it included an ext4 fix!

Ext4 data corruption in stable kernels

Posted Dec 10, 2023 23:27 UTC (Sun) by bgilbert (subscriber, #4738) [Link] (18 responses)

> The same type of people who complain about unanticipated storms and who then complain about mistaken weather forecast when it announces rain that doesn't come.

"Stable" is not a prediction of forces beyond developer control. It's an assertion of a quality bar, which needs to be backed by appropriate tools, testing, and developer time.

> For my stable kernel usages, I *tend* to pick one or two versions older than the last one if I see that the recent fixes are not important for me (i.e. I won't miss them).

As I understand Greg KH's position, anyone applying such a policy is irresponsible for not immediately installing the newest batch of patches.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 13:47 UTC (Mon) by wtarreau (subscriber, #51152) [Link] (16 responses)

> > The same type of people who complain about unanticipated storms and who then complain about mistaken weather forecast when it announces rain that doesn't come.

> "Stable" is not a prediction of forces beyond developer control. It's an assertion of a quality bar, which needs to be backed by appropriate tools, testing, and developer time.

Which is exactly the case. Look at the latest 6.6.5-rc1 thread for example:
https://lore.kernel.org/all/20231205031535.163661217@linu...

I've counted 17 people responding to that thread with test reports, some of which indicate boot failures, others successes, on a total of around 910 systems covering lots of architectures, configs and setup. I think this definitely qualifies for "appropriate tools", "testing" and "developer time", and I doubt many other projects devote that amount of efforts to weekly releases.

> > For my stable kernel usages, I *tend* to pick one or two versions older than the last one if I see that the recent fixes are not important for me (i.e. I won't miss them).
>
> As I understand Greg KH's position, anyone applying such a policy is irresponsible for not immediately installing the newest batch of patches.

No, for having already discussed this topic with him, I'm pretty sure he never said this. I even remember that once he explained that he doesn't want to advertise severity levels in his releases so that users upgrade when they feel confident and not necessarily immediately nor when it's written that now's a really important one. Use cases differ so much between users that some might absolutely need to upgrade to fix a driver that's going to ruin their data while others might prefer not to as a later fix could cause serious availability issues.

Periodically applying updates is a healthy approach, what matters is that severe bugs do not live long enough in the wild and that releases are frequent enough to help narrow down an occasional regression based on the various reports. I personally rebuild every time I reboot my laptop (it's quite rare thanks to suspend), and phone vendors tend to update only once every few months and that's already OK.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 15:44 UTC (Mon) by bgilbert (subscriber, #4738) [Link] (9 responses)

I've counted 17 people responding to that thread with test reports, some of which indicate boot failures, others successes, on a total of around 910 systems covering lots of architectures, configs and setup. I think this definitely qualifies for "appropriate tools", "testing" and "developer time", and I doubt many other projects devote that amount of efforts to weekly releases.

Many other projects have CI tests that are required to pass before a new release can ship. If that had been the case for LTP, this regression would have been avoided. What's more, the problem was reported to affect 6.1.64 during its -rc period, but no action was taken to fix that release. 6.1.64 was released with the problem four days later.

Mistakes happen! But this is an opportunity to improve processes to prevent a recurrence, rather than accepting the status quo.

No, for having already discussed this topic with him, I'm pretty sure he never said this. I even remember that once he explained that he doesn't want to advertise severity levels in his releases so that users upgrade when they feel confident and not necessarily immediately nor when it's written that now's a really important one. Use cases differ so much between users that some might absolutely need to upgrade to fix a driver that's going to ruin their data while others might prefer not to as a later fix could cause serious availability issues.

I have personally been complained at by Greg for fixing a stable kernel regression via cherry-pick, rather than shipping the latest release directly to distro users. I've seen similarly aggressive messaging in other venues. In fact, the standard release announcement says:

All users of the x.y kernel series must upgrade.

If downstream users are intended to take a more cautious approach, the messaging should be clarified to reflect that.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 4:52 UTC (Tue) by wtarreau (subscriber, #51152) [Link] (8 responses)

> but no action was taken to fix that release. 6.1.64 was released with the problem four days later.

You should really see that as a pipeline. Even if the issue was reported you don't know if it was noticed before 6.1.64 was emitted. What matters is that the issue was quickly fixed. Sure we're still missing a way to tag certain versions as broken, like happened for 2.4.11 that was marked "dontuse" in the download repository. But it's important to understand that the constant flow of fixes doesn't easily prevent a release from being cancelled instantly.

I would not be shocked to see 3 consecutive kernels being emitted and tagged as "ext4 broken" there for the time it takes to get knowledge of the breakage and fix it.

> I have personally been complained at by Greg for fixing a stable kernel regression via cherry-pick, rather than shipping the latest release directly to distro users.

Here you're speaking about cherry-picking fixes. That's something extremely dangerous that nobody must ever do and that some distros have been doing for a while, sometimes shipping kernels remaining vulnerable for months or years due to this bad practice. The reason for recommending against cherry-picking is very simple (and was explained in lengths at multiple conferences): the ONLY combinations of kernel patches that are both tested and supported by the subsystem maintainers are the mainline and stable ones. If you perform any other assembly of patches, nobody knows if they work well together or if another important patch is missing (as happened above). Here the process worked fine because developers reported the missing patches. Imagine if you took that single patch yourself, nobody would have known and you could have corrupted a lot of your users' FSes.

So please, for your users, never ever cherry-pick random patches from stable. Take the whole stable, possibly a slightly older one if you don't feel easy with latest changes, add your distro-specific patches on top of it, but do not pick what seems relevant to you, that will eventually result in a disaster and nobody will support you for having done this.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 9:58 UTC (Tue) by bgilbert (subscriber, #4738) [Link] (3 responses)

Even if the issue was reported you don't know if it was noticed before 6.1.64 was emitted. What matters is that the issue was quickly fixed.

The message I linked above is dated November 24 and reported a regression in v6.1.64-rc1. The testing deadline for 6.1.64 was November 26, and it was released on November 28. That report was sufficient to cause a revert in 5.10.y and 5.15.y, so I don't think there can be an argument that not enough information was available.

The users who had data corruption, or who had to roll out an emergency fix to avoid data corruption, don't care that the issue was quickly fixed. They can always roll back to an older kernel if they need to. They care that the problem happened in the first place.

The reason for recommending against cherry-picking is very simple (and was explained in lengths at multiple conferences): the ONLY combinations of kernel patches that are both tested and supported by the subsystem maintainers are the mainline and stable ones. [...] Take the whole stable, possibly a slightly older one if you don't feel easy with latest changes, add your distro-specific patches on top of it, but do not pick what seems relevant to you, that will eventually result in a disaster and nobody will support you for having done this.

What are you talking about? If I ship a modified kernel and it breaks, of course no one will support me for having done so. If I ship an unmodified stable kernel and it breaks, no one will support me then either! The subsystem maintainers aren't going to help with my outage notifications, my users, or my emergency rollout. As with any downstream, I'm ultimately responsible for what I ship.

In the case mentioned upthread, my choices were: a) cherry-pick a one-line fix for the userspace ABI regression, or b) take the entire diff from 4.14.96 to 4.14.97: 69 patches touching 92 files, +1072/-327 lines. Option b) is simply not defensible release engineering. If I can't hotfix a regression without letting in a bunch of unrelated code, I'll never converge to a kernel that's safe to ship. That would arguably be true even if stable kernels didn't have a history of user-facing regressions, which they certainly did.

This discussion is a great example of the problem I'm trying to describe. Stable kernels are aggressively advertised as the only safe kernels to run, but there's plenty of evidence that they aren't safe, and the stable maintainers tend to denigrate and dismiss users' attempts to point out the structural problems — or even to work around them! These problems can be addressed, as I said, with tools, testing, and developer time. There is always, always, always room for improvement. But that will only happen if the stable team decides to make improvement a priority.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 17:55 UTC (Tue) by wtarreau (subscriber, #51152) [Link] (2 responses)

> The message I linked above is dated November 24 and reported a regression in v6.1.64-rc1. The testing deadline for 6.1.64 was November 26, and it was released on November 28. That report was sufficient to cause a revert in 5.10.y and 5.15.y, so I don't think there can be an argument that not enough information was available.

Yes but if you read Greg's response, it's obvious there has been a misunderstanding, and noone else jumped on that thread to ask for the other kernels. Sh*t happens:

> > and on the following RC's:
> > * v5.10.202-rc1
> > * v5.15.140-rc1
> > * v6.1.64-rc1
> >
> > (Note that the list might not be complete, because some branches failed to execute completely due to build issues reported elsewhere.)
> >
> > Bisection in linux-5.15.y pointed to:
> >
> > commit db85c7fff122c14bc5755e47b51fbfafae660235
> > Author: Jan Kara <jack@suse.cz>
> > Date: Fri Oct 13 14:13:50 2023 +0200
> >
> > ext4: properly sync file size update after O_SYNC direct IO
> > commit 91562895f8030cb9a0470b1db49de79346a69f91 upstream.
> >
> >
> > Reverting that commit made the test pass.
>
> Odd. I'll go drop that from 5.10.y and 5.15.y now, thanks.

I mean, it's always the same every time there is a regression: users jump on their gun and explain what OUGHT to have been done, except that unsurprisingly they were not there either to do it by then. I don't know when everyone will understand that maintaining a working kernel is a collective effort, and that when there's a failure it's a collective failure.

> If I can't hotfix a regression without letting in a bunch of unrelated code, I'll never converge to a kernel that's safe to ship.

There are two safe possibilities for this:
- either you ask the identified wrong commit and ask its author what he thinks about removing it and you do that;
- or you roll back to the latest known good kernel. Upgrades are frequent enough to allow rollbacks. Seriously...

And in both cases it's important to insist on having a fixed version so that the involved people have their say on the topic (including "take this fix instead, it's ugly but safer for now"). What matters in the end is end-users' safety, so picking a bunch of fixes that have not yet been subject to all these tests is not a good solution at all. And by the way the problem was found during the test period, which proves that testing is useful and effective at finding some regressions. It's "just" that the rest of process messed up there.

> Stable kernels are aggressively advertised as the only safe kernels to run, but there's plenty of evidence that they aren't safe, and the stable maintainers tend to denigrate and dismiss users' attempts to point out the structural problems

No, not at all. There's no such "they are safe" nor "they aren't safe". Safety is not a boolean, it's a metric. And maintainers do not dismiss whatever users' attempts, on the opposite, these attempts are welcome and adopted when they prove to be useful, such as all the tests that are run for each and every release. It's just that there's a huge difference between proposing solutions and whining. Saying "you should have done that" or "if I were doing your job I would certainly not do it this way" is just whining. Saying "let me one extra day to run some more advanced tests myself" can definitely be part of a solution to improve the situation (and then you will be among those criticized for messing up from time to time).

Ext4 data corruption in stable kernels

Posted Dec 13, 2023 4:59 UTC (Wed) by bgilbert (subscriber, #4738) [Link] (1 responses)

> Yes but if you read Greg's response, it's obvious there has been a misunderstanding, and noone else jumped on that thread to ask for the other kernels. Sh*t happens:

Yup, agreed. Process failures happen; they should lead to process improvements. Asking for more testers isn't going to solve this one.

> And in both cases it's important to insist on having a fixed version so that the involved people have their say on the topic (including "take this fix instead, it's ugly but safer for now").

I think we're talking past each other here. The fix for 4.14.96 had already landed in 4.14.97. I backported one patch from it, rather than taking the entire release.

> And maintainers do not dismiss whatever users' attempts, on the opposite, these attempts are welcome and adopted when they prove to be useful, such as all the tests that are run for each and every release. It's just that there's a huge difference between proposing solutions and whining. Saying "you should have done that" or "if I were doing your job I would certainly not do it this way" is just whining. Saying "let me one extra day to run some more advanced tests myself" can definitely be part of a solution to improve the situation (and then you will be among those criticized for messing up from time to time).

Every open-source maintainer gets complaints that the software is not meeting users' needs. Those users often aren't in a position to fix the software themselves, they may have suggestions which don't account for the full complexity of the problem, and they may not even fully understand their own needs. Even when a maintainer needs to reject a suggestion (and they should, often!) the feedback is still a great source of information about where improvements might be useful. And sometimes a suggestion contains the seed of a good idea. Even if the people in this comment section are wrong about a lot of the details, I'm sure there's at least one idea here that's worth exploring.

As you said in another subthread, the existing stable kernel process has worked remarkably well for its scale. But processes don't scale forever, and processes can't be improved without the participation (and probably the active commitment) of the actual maintainers. BitKeeper and then Git allowed kernel development to scale to today's levels, but those tools could never have succeeded if key maintainers hadn't actively embraced them and encouraged their use. At the end of the day, while a lot of the day-to-day work can be handled by any skilled contributor, the direction of a project must be set by its maintainers.

Ext4 data corruption in stable kernels

Posted Dec 13, 2023 5:44 UTC (Wed) by wtarreau (subscriber, #51152) [Link]

> I think we're talking past each other here. The fix for 4.14.96 had already landed in 4.14.97. I backported one patch from it, rather than taking the entire release.

OK got it, and yes for such rare cases where the fix is already accepted by maintainers and validated, I agree that it remains a reasonable approach.

> But processes don't scale forever, and processes can't be improved without the participation (and probably the active commitment) of the actual maintainers.

That's totally true, but it's also important to keep in mind the fact that maintainers are scarce and already overloaded, and that asking them to experiment with random changes is the best way to waste their time or make them feel their work is useless. Coming with a PoC saying "don't you think something like this could improve your work" is a lot different from "you should just do this or that". Maintainers do not miss suggestions that come from everywhere all the time. Remember how many times Linus was suggested to switch to SVN before Git appeared ? If all those who suggested it had actually just tried prior to speaking, they would have had their response and avoided to look like fools.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 10:19 UTC (Tue) by geert (subscriber, #98403) [Link] (3 responses)

Playing the devil's advocate (which can be considered appropriate for v6.6.6 ;-)

> Here you're speaking about cherry-picking fixes. That's something extremely dangerous that nobody must ever do [...]

But stable is also cherry-picking some changes, but not others?!?!? Nobody knows if they work well together or if another important patch is missing...

The only solution is to follow mainline ;-)

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 18:05 UTC (Tue) by wtarreau (subscriber, #51152) [Link] (2 responses)

> But stable is also cherry-picking some changes, but not others?!?!? Nobody knows if they work well together or if another important patch is missing...

That has always been the case. For a long time I used to say myself that the kernels I was releasing were certainly full of bugs (otherwise it would not be needed to issue future releases), but the difference with the ones peoplle build in their garage is that the official stable ones are the result of:
- reviews from all the patch authors
- tests from various teams and individuals.

I.e. they are much better known than other combinations.

One would say that patch authors do not send a lot of feedback but there are regularly one or two responses in a series, either asking for another patch if one is picked, or suggesting not to pick one, so that works as well. And the tests are invaluable. When I picked 2.6.32 and Greg insisted that now I had to follow the whole review process, I was really annoyed because it doubled my work. But seeing suggestions to fix around 10 patches per series based on review and testing showed me the garbage I used to provide before this process. That's why I'm saying: people complain but the process works remarkably well given the number of patches and the number of regressions. I remember the era of early 2.6 where you would have been foolish to run an stable version before .10 or so. I've hade on one of my machines a 4.9.2 that I never updated for 5 years for some reason, and it never messed up on me. I'm not advocating for not updating but I mean that mainline is much stabler than it used to and stable sees very few regressions.

> The only solution is to follow mainline ;-)

That's what Linus sometimes says as well. That's where you get the latest fixes and the latest bugs as well. It doesn't mean the balance is necessarily bad, but it's more adventurous :-)

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 18:49 UTC (Tue) by farnz (subscriber, #17727) [Link] (1 responses)

As an aside, I've noted more than once in my career that there's a deep tradeoff in dependency handling here:

I can stick with an old version, and keep trying to patch it to have fewer bugs but no new features. This is less work week-by-week, but when I do hit a significant bug where I can't find a fix myself, upstream is unlikely to be much help (because I'm based on an "ancient" codebase from their point of view).
I can keep up to date with the latest version, with new features coming in all the time, doing more work week-by-week, but not having the "big leaps" to make, and having upstream much more able to help me fix any bugs I find, because I'm basing my use on a codebase that they work on every day.

For example, keeping up with latest Fedora releases is harder week-by-week than keeping up with RHEL major releases; but getting support from upstreams for the versions of packages in Fedora is generally easier than getting support for something in the last RHEL major release, because it's so much closer to their current code; further, it's generally easier to go from a "latest Fedora" version of something to "latest upstream development branch" than to go from "latest RHEL release" to "latest upstream development branch" and find patches yourself that way.

Ext4 data corruption in stable kernels

Posted Dec 13, 2023 5:54 UTC (Wed) by wtarreau (subscriber, #51152) [Link]

For me that's a tooling problem before all. For users convenience most of the time you can just upgrade to the latest version since it's supposedly better. The fact is that it's *often* better but not always, and for some users, the few cases where it's not better are so worse that they'd prefer not to *take the risk* to update. This is exactly the root cause of the problem.

What I'm suggesting is to update in small jumps, but not the last version which is still lacking feedback. If you see, say, 6.1.66 being released, and you consider that the 1-month old 6.1.62 looks correct because nobody complained about it, and nobody recently spoke about a critical urgent update that requires that everyone absolutely updates to yesterday's patch, then you could just update to 6.1.62 (or any surrounding one that was reported to work pretty well). This leaves one month of feedback for these kernels for you to choose, doesn't require too frequent updates and doesn't require to live on the bleeding edge (i.e. less risks of regressions).

That's obviously not rocket science and will not always work, but this approach allows you to skip big regressions with immediate impact, and generally saves you from having to update twice in a row.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 0:16 UTC (Tue) by roc (subscriber, #30627) [Link] (5 responses)

> I've counted 17 people responding to that thread with test reports, some of which indicate boot failures, others successes, on a total of around 910 systems covering lots of architectures, configs and setup.

Relying on volunteers to manually build and boot RC kernels is both inefficient and inadequate. There should be dedicated machines that automatically build and boot those kernels AND run as many automated tests as can be afforded given the money and time available. With some big machines and 48 hours you can run a lot of tests.

This isn't asking for much. This is what other mature projects have been doing for years.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 4:55 UTC (Tue) by wtarreau (subscriber, #51152) [Link] (4 responses)

> There should be dedicated machines that automatically build and boot those kernels AND run as many automated tests as can be afforded given the money and time available. With some big machines and 48 hours you can run a lot of tests.
>
> This isn't asking for much. This is what other mature projects have been doing for years.

Well, if you and/or your employer can provide this (hardware and manpower to operate it), I'm sure everyone will be extremely happy. Greg is constantly asking for more testers. You're speaking as if some proposal for help was rejected, resources like this don't fall out from the sky. Also you seem to know what tests to run on them, please do! All the testers I mentioned run their own tests from different (and sometimes overlapping) sets and that's extremely useful.

But saying "This or that should be done", the question remains "by whom if it's not by the one suggesting it?".

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 7:16 UTC (Tue) by roc (subscriber, #30627) [Link] (1 responses)

Regarding which tests to run: as bgilbert said: "Many other projects have CI tests that are required to pass before a new release can ship. If that had been the case for LTP, this regression would have been avoided." Of course the LTP test *did* run; it's not just about having the tests and running the tests, but also gating the release on positive test results.

As it happens my rr co-maintainer Kyle Huey does regularly test RC kernels against rr's regression test suite, and has found (and reported) a few interesting bugs that way. But really the Linux Foundation or some similar organization should be responsible for massive-scale automated testing of upstream kernels. Lots of companies stand to benefit financially from more reliable Linux releases, and as I understand it, the LF exists to channel those common interests into funding.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 18:11 UTC (Tue) by wtarreau (subscriber, #51152) [Link]

> But really the Linux Foundation or some similar organization should be responsible for massive-scale automated testing of upstream kernels

But why is it always that when something happens, lots of people consider that there is surely an entity somewhere whose job should be to fix it ? Why ?

You seem to have an idea of the problem and its solution, why are you not proposing your help ? Because you don't have the time for this ? And what makes you think the problem is too large for you but very small for someone else ? What makes you think that there are people idling all the day waiting for this task to be assigned to them and to start working on it ? And if instead you didn't analyze it completely in its environment and with all of its dependencies and impacts and it was much harder to put in place ?

It's easy to always complain, really easy. If all the energy spent complaining against the current state every time there's a problem had been assigned to fixing it, maybe we wouldn't have been speaking about this issue in the first place.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 10:11 UTC (Tue) by bgilbert (subscriber, #4738) [Link] (1 responses)

Suppose the stable team announced their intention to gate stable releases on automated testing, and put out a call for suitable test suites. Test suites could be required to meet a defined quality bar (low false positive rate, completion within the 48-hour review period, automatic bisection), and any suite that repeatedly failed to meet the bar could be removed from the test program. If no one at all stepped up to offer their tests, I would be shocked.

The stable team wouldn't need to own the test runners, just the reporting API, and the API could be quite simple. I agree with roc that the Linux Foundation should take some financial responsibility here, but I suspect some organizations would run tests and contribute results even if no funding were available.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 18:13 UTC (Tue) by wtarreau (subscriber, #51152) [Link]

> Suppose the stable team announced their intention to gate stable releases on automated testing, and put out a call for suitable test suites. Test suites could be required to meet a defined quality bar (low false positive rate, completion within the 48-hour review period, automatic bisection), and any suite that repeatedly failed to meet the bar could be removed from the test program.

OK so something someone has to write and operate.

> If no one at all stepped up to offer their tests, I would be shocked.

Apparently it's just been proposed, by you. When will we benefit from your next improvements to the process ?

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 14:02 UTC (Mon) by farnz (subscriber, #17727) [Link]

Greg's position is a lot less concrete than that - it's "I make no assertions about whether or not any given batch of patches fixes bugs you care about; if you want all the fixes I think you should care about, then you must take the latest batch". Whether you want all the fixes that Greg thinks you should is your decision - but he makes no statement about what subset of stable patches you should pick in that case.

Ext4 data corruption in stable kernels

Posted Dec 10, 2023 12:10 UTC (Sun) by Wol (subscriber, #4433) [Link] (3 responses)

> Because it is marked
> CC: stable@vger.kernel.org

> Mistakes happen.

It's not always a mistake. As usual, we are using technology to try and fix a social problem. Some upstreams, I believe, have a habit of cc'ing *everything* to stable. If they've triaged it, and it's important enough, then fine. Too many of them don't.

The stable maintainers don't have time to triage everything. Upstream sometimes cannot be bothered to triage everything (or expect someone else to do it for them). What do you expect?

Cheers,
Wol

Ext4 data corruption in stable kernels

Posted Dec 10, 2023 19:42 UTC (Sun) by saffroy (guest, #43999) [Link] (2 responses)

> The stable maintainers don't have time to triage everything. Upstream sometimes cannot be bothered to triage everything (or expect someone else to do it for them). What do you expect?

What I did expect until today was that existing well-known test suites like LTP (which revealed the bug) would be on the critical path for a stable release.

I am very curious why they are not.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 14:31 UTC (Mon) by intgr (subscriber, #39733) [Link] (1 responses)

> existing well-known test suites like LTP (which revealed the bug)

Interesting fact. Do you have a link to it? And any discussions that followed?

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 15:39 UTC (Mon) by Kamiccolo (subscriber, #95159) [Link]

It was posted in this thread some time ago:
https://lore.kernel.org/stable/81a11ebe-ea47-4e21-b5eb-53...

I'd say LTP deserve at least a little bit more love ;)

Ext4 data corruption in stable kernels

Posted Dec 10, 2023 12:41 UTC (Sun) by rolexhamster (guest, #158445) [Link] (12 responses)

No. It would just delay a lot of fixes.

Perhaps that would be a good thing, especially when it comes to critical subsystems. Filesystems should not eat data. Was that commit really necessary for backporting to stable?

There's a big difference between a few corrupted pixels (say due to a bug in DRM or GPU driver), and a few corrupted files. The former is a nuisance, while the latter is a critical failure. Maybe it would be useful to classify stuff sent to stable@kernel as high/med/low risk, based on what subsystem it touches.

Ext4 data corruption in stable kernels

Posted Dec 10, 2023 13:04 UTC (Sun) by mb (subscriber, #50428) [Link]

>Filesystems should not eat data. Was that commit really necessary for backporting to stable?

By your own reasoning that filesystems should not eat data, it was necessary.

From the commit message:

>on ext4 O_SYNC direct IO does not properly
>sync file size update and thus if we crash at unfortunate moment, the
>file can have smaller size although O_SYNC IO has reported successful
>completion.

It was supposed to prevent data corruption.
-> "Filesystems should not eat data."
-> We must apply it to stable a.s.a.p.

>Maybe it would be useful to classify stuff sent to stable@kernel as high/med/low risk,
>based on what subsystem it touches.

And that would have prevented this fix from being applied to stable?
I doubt it. It was supposed to avoid corruption.

I am not saying that things went well here and I am not saying the stable process is perfect.
But in reality such problems happen rarely in stable.
Stable is not supposed to be an enterprise kernel. It is supposed to collect fixes with the least amount of manual work possible. That is guaranteed to introduce bugs sooner or later. But I think it's the best we can do at this level.

I don't think any kind of risk classification can help here.
It's basically the same problem as with security fixes. People will just start to argue whether a fix actually is really needed or not. And wrong decisions will then be made. Which will also lead to buggy stable kernels.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 9:06 UTC (Mon) by cloehle (subscriber, #128160) [Link] (9 responses)

The main issue with delaying these fixes further is that a good chunk of them is or can be security-related and thus it's obvious they should be deployed asap.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 9:53 UTC (Mon) by rolexhamster (guest, #158445) [Link] (8 responses)

Then perhaps the security fixes should always be clearly labeled as such in the list of changes, preferably with an associated CVE number (or !CVE if that's more amenable).

Otherwise we're in the security by obscurity weeds, where the "all-users-must-upgrade" upgrades are being applied in an uninformed manner, pulling in both the wheat and the chaff.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 14:20 UTC (Mon) by cloehle (subscriber, #128160) [Link] (6 responses)

Absolutely not, the kernel community rejects the CVE systems, and for very good reasons.

It is a completely unreasonable amount of effort to categorize bugs into "Very likely not security-related" and "Security-related", in fact everyone that attempts this (most vendors) messes up regularly, which is a huge weakness of the CVE (and !CVE) systems for that matter.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 15:19 UTC (Mon) by rolexhamster (guest, #158445) [Link] (5 responses)

Using that logic ("all-users-must-upgrade"), all patches in a given stable release are both security fixes and not security fixes at the same time. (In other words, heisen-patches, fashioned after heisenbugs). That's a cop out.

If a patch (or collection of patches) fixes an existing CVE/!CVE, why not simply state that in the changelog? This is distinct and separate from asking to categorize each bug/patch as "Very likely not security-related" and "Security-related".

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 15:50 UTC (Mon) by cloehle (subscriber, #128160) [Link]

>Using that logic ("all-users-must-upgrade"), all patches in a given stable release are both security fixes and not security fixes at the same time.

And that is kind of the current situation, although strangely worded.
The kernel doesn't make the distinction, don't run a kernel with fixed bugs.

To get a CVE many vendors require you to actually prove an exploit and that is often magnitudes more effort for both the reporter and the CNA to verify, but for now the kernel community would rather spend the potential days to months with fixing stuff instead of thinking "How could this bug be exploited somehow?".

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 16:50 UTC (Mon) by farnz (subscriber, #17727) [Link]

Without first determining whether or not a given patch is, or is not, a fix for a CVE/!CVE, how do I state the CVE number in the changelog? Bear in mind that at the point I write the patch, I may just be fixing a bug I've seen, without realising that it's security relevant, or indeed that someone has applied for a CVE number for it.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 18:15 UTC (Tue) by wtarreau (subscriber, #51152) [Link]

So are you going to step up to review all these patches and categorize them yourself ? Because most of the time their authors themselves have no idea that the bug they're fixing can have a security impact. That's the first part of the problem with the CVE circus.

Ext4 data corruption in stable kernels

Posted Dec 18, 2023 1:43 UTC (Mon) by jschrod (subscriber, #1646) [Link] (1 responses)

Well, *all* bug fixes are security fixes.

You cannot know in advance which bugs might be exploited, be it on a technical or a social level. This has been demonstrated so many times, I cannot believe it has to be spelled out.

*Every* bug is a security bug. If you think it isn't, the future will tell you different.

Ext4 data corruption in stable kernels

Posted Dec 18, 2023 11:38 UTC (Mon) by farnz (subscriber, #17727) [Link]

I would marginally disagree; it is possible for a bug to not be a security bug. The difficulty is distinguishing bugs with no security relevance from those with security relevance, given that the kernel's overall threat model is very broad.

For example, a bug where the kernel sometimes clears the LSB of the blue channel of 16 bpc RGB colour on a DisplayPort link is almost certainly completely irrelevant; at the sorts of brightnesses monitors can do today, the difference between 16 bits each R and G and 15 bits of B, and 16 bits each R, G, B is below human perception.

But the challenge is that from my perspective, a bug in the kernel driver for a 100G Ethernet chip that connects via PCIe is completely irrelevant - I have no systems with that sort of hardware, nor is there a way for an attacker to add that hardware without my knowledge. Similarly, a bug in iSCSI that can only be tickled once iSCSI is in use is not security-relevant to me, since I have no iSCSI set up, so to tickle the bug, the attacker needs remote code execution already. From the perspective of a company running big servers that access bulk storage over iSCSI using 100G Ethernet, however, both of those bugs can be security bugs.

Should those bugs be "security" bugs, since if you happen to have the problematic setup, they're relevant? Or should they not be security bugs since most people don't have either 100G Ethernet or iSCSI setups?

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 15:13 UTC (Mon) by farnz (subscriber, #17727) [Link]

That requires someone (who's willing to stand behind their effort) to look over all of the changes, and tell you which ones have no security relevance. And thinking this way reveals the problem with "I only apply the security-relevant bugfixes"; to do that, you first need to know which bugfixes are security relevant, which in turn implies that you know which bugfixes are not security relevant.

If you merely take all bugfixes that are known to be security relevant, then you're engaging in theatre; there will always be security relevant bugfixes that aren't known to be security relevant, either because no-one in the chain from the bug finder to Greg recognised that this bug had security relevance, or because people who recognised that it was security relevant chose to hide that fact for reasons of their own (e.g. because they work for the NSA, want future kernels to be fixed, but benefit from people not rushing to backport the fix to a vendor's 3.3 kernel).

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 13:52 UTC (Mon) by wtarreau (subscriber, #51152) [Link]

> No. It would just delay a lot of fixes.
>
> Perhaps that would be a good thing, especially when it comes to critical subsystems.

No, it would just leave users exposed longer to them and make them appear with many more related fixes, making it even harder to spot the culprit. The problem is that some users absolutely want to reject the responsibility on someone else:
- a fix is missing, what are you doing maintainers, couldn't you pick it for stable ?
- a fix broke my system, what are you doing maintainers, couldn't you postpone it ?

It will never change anyway, but it will continue to add lines here on lwn :-)

Ext4 data corruption in stable kernels

Posted Dec 10, 2023 12:12 UTC (Sun) by butlerm (subscriber, #13312) [Link]

> Why would any commit be picked up from rc1 through to rc8 etc? It's basic risk management not to do that.

Pretty sure the answer is because developers do not make a new version of the same commit for every version of the kernel. The commit on the master branch is the authoritative source for the fix in question, it is not repeated for each release candidate. Releases are cumulative of all the commits made on the branch they are derived from.

The issue here seems to be that this fix turned out to require two different commits, and in that situation the second commit should be accompanied by a red flag warning not to cherry pick or backport the original commit without the follow on commit as well, and some sort of system for making that apparent.

Ext4 data corruption in stable kernels: it's not a larger problem

Posted Dec 10, 2023 22:06 UTC (Sun) by geofft (subscriber, #59789) [Link]

Do problems like this happen frequently? I can't remember the last time that something like this happened at all in Linux. There certainly have been bugs introduced by incorrect backports and complaints about it, e.g. https://lore.kernel.org/stable/Y1DTFiP12ws04eOM@sol.local... (which I thought LWN covered but I can't find the article about it), but I can't remember an incorrect backport causing data corruption or some other problem so widespread as to retroactively tell people not to upgrade.

Actually, that particular thread is interesting in its own way, because it's complaining about an AI called "autosel" that picks patches to backport even if they're not tagged as Cc stable. But the problematic patch in this case https://lore.kernel.org/r/20231013121350.26872-1-jack@sus... wasn't picked up by autosel, it was explicitly tagged Cc stable and was explicitly claimed to be fixing a data loss bug of its own - which also means this isn't an example of "it's good to have it just in case." It was believed to be broke. That's why they fixed it.

On the other hand, I think there are a huge number of real fixes that have come from stable kernels actually being vigorously maintained.

A system as complex as Linux kernel development is never going to have zero errors of commission and zero errors of omission at the same time. One error of commission in a really long time seems like a fine tradeoff to me. We don't really have a sense of how many problems in stable kernels are left unfixed because nobody thinks to backport the change. (Anecdotally, I think 1-2 times a year at my day job where we run upstream stable, we find something broken, spend the effort to track it down, and discover that it has indeed been fixed in a newer kernel and never backported.)

In fact, LWN had an article recently about how not enough patches are being backported to stable kernels for ext4 in particular, and someone (ideally an enterprise customer) needs to step up and do so: https://lwn.net/Articles/934941/ That article also compared individual kernel branches, for which some had enterprise-backed backports and some didn't, and the general sense is the one with fewer backports was the less desirable one. I think it will require way more than a single mistake to argue convincingly that the current policy is wrong.

Ext4 data corruption in stable kernels

Posted Dec 10, 2023 22:35 UTC (Sun) by ballombe (subscriber, #9523) [Link] (2 responses)

The problem is that the time to run any meaningful QA on the kernel is far longer than the time between to two stable kernel releases.

Ext4 data corruption in stable kernels: automated QA

Posted Dec 11, 2023 0:06 UTC (Mon) by geofft (subscriber, #59789) [Link]

Is there an element of inherent wall-clock time here, or is it parallelizable to the point where it can run quickly enough if someone were to throw a couple racks of physical machines or a large public cloud budget at the problem?

(Someone like the big companies that the stable kernels are in large part made for and by?)

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 6:52 UTC (Mon) by vegard (subscriber, #52330) [Link]

In this case the problem was actually picked up by QA on November 24 and reported about 12 hours after the patches were posted on the mailing list, and that is also why it got dropped from 5.15 and older kernels: https://lore.kernel.org/stable/81a11ebe-ea47-4e21-b5eb-53...

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 13:33 UTC (Mon) by birdie (guest, #114905) [Link] (4 responses)

> This is a symptom of a larger problem: seems that patches/backports are sent to "stable" kernels in an almost willy-nilly fashion, and there is inadequate checking of what actually gets in.

The real larger problem Linux fans never talk about: very poor/inadequate/missing QA/QC in the Linux kernel.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 13:57 UTC (Mon) by wtarreau (subscriber, #51152) [Link] (3 responses)

> The real larger problem Linux fans never talk about: very poor/inadequate/missing QA/QC in the Linux kernel.

Compared to what, and as per which metric and unit ?

Latest kernel was run on ~910 systems by 17 people who found issues that were fixed before the release:

https://lore.kernel.org/all/20231205031535.163661217@linu...

If you have good plans to propose something better that doesn't put the process to a halt, I'm sure everyone would be interested to know about it. The stable team is always seeking more testers, feel free to join.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 19:54 UTC (Mon) by mat2 (guest, #100235) [Link] (2 responses)

From the mail you quoted:

> Subject: [PATCH 6.6 000/134] 6.6.5-rc1 review
> Date: Tue, 5 Dec 2023 12:14:32 +0900

[snip]

> Responses should be made by Thu, 07 Dec 2023 03:14:57 +0000.
> Anything received after that time might be too late.

I think that the time available for testing stable release candidates is too short (~48 hours). Some bugs (such as this one) are visible only after some usage period.

Longer times also mean more testers. For example, Ubuntu's mainline PPA ( https://kernel.ubuntu.com/mainline/ ) might run such stable RCs similar to how it compiles normal kernels.

So perhaps the stable testing period should be make longer, like 4-5 days.

I'll try to test stable RCs myself. Is there some mailing list available that one may subscribe to to get notifications about these releases (just notifications, without all the patches)?

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 19:55 UTC (Mon) by pizza (subscriber, #46) [Link]

> So perhaps the stable testing period should be make longer, like 4-5 days.

No matter what period is chosen, it will simultaneously be too short for some, and too long for others.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 5:08 UTC (Tue) by wtarreau (subscriber, #51152) [Link]

That's a valid point, though the time has adapted over history to the period it takes for active participants to send their reports. If you wait too long, testers start testing at the end of the period and during all that time frame, users stay needlessly exposed to unfixed bugs (including the one that was needed to fix that one). And I agree that if the period is too short, you get less opportunities to test.

It you think you could regularly participate in tests with a bit more time, you should suggest this publicly. I'm sure Greg is open to adjust is cycle a little bit to permit more testing, but it needs to be done for something real, not just suppositions. Keep in mind that he's the person that has released the largest number of kernels and has accumulated a lot of experience about what happens before and after, and by now he definitely knows how people react at various periods of the cycle.

When I was maintaining extended LTS kernels, I also got used to how my users would react. I knew that one distro would test during the week after -rc so I would leave one week of testing, then I knew that nobody would test it for one month following the release, that was specific to these use cases where users don't upgrade often and prefer to wait for the right moment. So in my head a release was not confirmed until about one month after it was emitted, which often required to quickly emit another one to fix some issues.

And nowadays I'm pretty sure that the feedback and behavior on 6.6 is not the same at all as with 5.4 or 4.14!

In haproxy we have much less changes per stable release and we announce our own level of trust about the version. That's possible because the maintainers doing the backports have already been involved in a lot of these fixes and hesitating about some backports. So we just indicate if we're really confident in the release or if it should be taken with special care. Users appreciate it a lot and help us in return by reporting suspicion about issues. I don't think it would work well for the kernel because stable maintainers receive an avalanche of fixes from various sources and it's very hard to have an idea of the impacts of these patches. Subsystem maintainers are the ones who know best, immediately followed by testers, so it's quite hard to give an appreciation of how much a version can be trusted. In an ideal world, some subsystem maintainers could indicate "be careful" and that would raise a warning. But here it wouldn't have worked since that was already a fix for a serious problem.

Fixes that break stuff are the worst ones to deal with because they create a lot of confusion. And BTW security fixes are very often in this category, which is why we insist a lot on having them discussed publicly as much as possible.