Ext4 data corruption in stable kernels

Posted Dec 12, 2023 4:52 UTC (Tue) by wtarreau (subscriber, #51152)
In reply to: Ext4 data corruption in stable kernels by bgilbert
Parent article: Ext4 data corruption in stable kernels

> but no action was taken to fix that release. 6.1.64 was released with the problem four days later.

You should really see that as a pipeline. Even if the issue was reported you don't know if it was noticed before 6.1.64 was emitted. What matters is that the issue was quickly fixed. Sure we're still missing a way to tag certain versions as broken, like happened for 2.4.11 that was marked "dontuse" in the download repository. But it's important to understand that the constant flow of fixes doesn't easily prevent a release from being cancelled instantly.

I would not be shocked to see 3 consecutive kernels being emitted and tagged as "ext4 broken" there for the time it takes to get knowledge of the breakage and fix it.

> I have personally been complained at by Greg for fixing a stable kernel regression via cherry-pick, rather than shipping the latest release directly to distro users.

Here you're speaking about cherry-picking fixes. That's something extremely dangerous that nobody must ever do and that some distros have been doing for a while, sometimes shipping kernels remaining vulnerable for months or years due to this bad practice. The reason for recommending against cherry-picking is very simple (and was explained in lengths at multiple conferences): the ONLY combinations of kernel patches that are both tested and supported by the subsystem maintainers are the mainline and stable ones. If you perform any other assembly of patches, nobody knows if they work well together or if another important patch is missing (as happened above). Here the process worked fine because developers reported the missing patches. Imagine if you took that single patch yourself, nobody would have known and you could have corrupted a lot of your users' FSes.

So please, for your users, never ever cherry-pick random patches from stable. Take the whole stable, possibly a slightly older one if you don't feel easy with latest changes, add your distro-specific patches on top of it, but do not pick what seems relevant to you, that will eventually result in a disaster and nobody will support you for having done this.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 9:58 UTC (Tue) by bgilbert (subscriber, #4738) [Link] (3 responses)

Even if the issue was reported you don't know if it was noticed before 6.1.64 was emitted. What matters is that the issue was quickly fixed.

The message I linked above is dated November 24 and reported a regression in v6.1.64-rc1. The testing deadline for 6.1.64 was November 26, and it was released on November 28. That report was sufficient to cause a revert in 5.10.y and 5.15.y, so I don't think there can be an argument that not enough information was available.

The users who had data corruption, or who had to roll out an emergency fix to avoid data corruption, don't care that the issue was quickly fixed. They can always roll back to an older kernel if they need to. They care that the problem happened in the first place.

The reason for recommending against cherry-picking is very simple (and was explained in lengths at multiple conferences): the ONLY combinations of kernel patches that are both tested and supported by the subsystem maintainers are the mainline and stable ones. [...] Take the whole stable, possibly a slightly older one if you don't feel easy with latest changes, add your distro-specific patches on top of it, but do not pick what seems relevant to you, that will eventually result in a disaster and nobody will support you for having done this.

What are you talking about? If I ship a modified kernel and it breaks, of course no one will support me for having done so. If I ship an unmodified stable kernel and it breaks, no one will support me then either! The subsystem maintainers aren't going to help with my outage notifications, my users, or my emergency rollout. As with any downstream, I'm ultimately responsible for what I ship.

In the case mentioned upthread, my choices were: a) cherry-pick a one-line fix for the userspace ABI regression, or b) take the entire diff from 4.14.96 to 4.14.97: 69 patches touching 92 files, +1072/-327 lines. Option b) is simply not defensible release engineering. If I can't hotfix a regression without letting in a bunch of unrelated code, I'll never converge to a kernel that's safe to ship. That would arguably be true even if stable kernels didn't have a history of user-facing regressions, which they certainly did.

This discussion is a great example of the problem I'm trying to describe. Stable kernels are aggressively advertised as the only safe kernels to run, but there's plenty of evidence that they aren't safe, and the stable maintainers tend to denigrate and dismiss users' attempts to point out the structural problems — or even to work around them! These problems can be addressed, as I said, with tools, testing, and developer time. There is always, always, always room for improvement. But that will only happen if the stable team decides to make improvement a priority.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 17:55 UTC (Tue) by wtarreau (subscriber, #51152) [Link] (2 responses)

> The message I linked above is dated November 24 and reported a regression in v6.1.64-rc1. The testing deadline for 6.1.64 was November 26, and it was released on November 28. That report was sufficient to cause a revert in 5.10.y and 5.15.y, so I don't think there can be an argument that not enough information was available.

Yes but if you read Greg's response, it's obvious there has been a misunderstanding, and noone else jumped on that thread to ask for the other kernels. Sh*t happens:

> > and on the following RC's:
> > * v5.10.202-rc1
> > * v5.15.140-rc1
> > * v6.1.64-rc1
> >
> > (Note that the list might not be complete, because some branches failed to execute completely due to build issues reported elsewhere.)
> >
> > Bisection in linux-5.15.y pointed to:
> >
> > commit db85c7fff122c14bc5755e47b51fbfafae660235
> > Author: Jan Kara <jack@suse.cz>
> > Date: Fri Oct 13 14:13:50 2023 +0200
> >
> > ext4: properly sync file size update after O_SYNC direct IO
> > commit 91562895f8030cb9a0470b1db49de79346a69f91 upstream.
> >
> >
> > Reverting that commit made the test pass.
>
> Odd. I'll go drop that from 5.10.y and 5.15.y now, thanks.

I mean, it's always the same every time there is a regression: users jump on their gun and explain what OUGHT to have been done, except that unsurprisingly they were not there either to do it by then. I don't know when everyone will understand that maintaining a working kernel is a collective effort, and that when there's a failure it's a collective failure.

> If I can't hotfix a regression without letting in a bunch of unrelated code, I'll never converge to a kernel that's safe to ship.

There are two safe possibilities for this:
- either you ask the identified wrong commit and ask its author what he thinks about removing it and you do that;
- or you roll back to the latest known good kernel. Upgrades are frequent enough to allow rollbacks. Seriously...

And in both cases it's important to insist on having a fixed version so that the involved people have their say on the topic (including "take this fix instead, it's ugly but safer for now"). What matters in the end is end-users' safety, so picking a bunch of fixes that have not yet been subject to all these tests is not a good solution at all. And by the way the problem was found during the test period, which proves that testing is useful and effective at finding some regressions. It's "just" that the rest of process messed up there.

> Stable kernels are aggressively advertised as the only safe kernels to run, but there's plenty of evidence that they aren't safe, and the stable maintainers tend to denigrate and dismiss users' attempts to point out the structural problems

No, not at all. There's no such "they are safe" nor "they aren't safe". Safety is not a boolean, it's a metric. And maintainers do not dismiss whatever users' attempts, on the opposite, these attempts are welcome and adopted when they prove to be useful, such as all the tests that are run for each and every release. It's just that there's a huge difference between proposing solutions and whining. Saying "you should have done that" or "if I were doing your job I would certainly not do it this way" is just whining. Saying "let me one extra day to run some more advanced tests myself" can definitely be part of a solution to improve the situation (and then you will be among those criticized for messing up from time to time).

Ext4 data corruption in stable kernels

Posted Dec 13, 2023 4:59 UTC (Wed) by bgilbert (subscriber, #4738) [Link] (1 responses)

> Yes but if you read Greg's response, it's obvious there has been a misunderstanding, and noone else jumped on that thread to ask for the other kernels. Sh*t happens:

Yup, agreed. Process failures happen; they should lead to process improvements. Asking for more testers isn't going to solve this one.

> And in both cases it's important to insist on having a fixed version so that the involved people have their say on the topic (including "take this fix instead, it's ugly but safer for now").

I think we're talking past each other here. The fix for 4.14.96 had already landed in 4.14.97. I backported one patch from it, rather than taking the entire release.

> And maintainers do not dismiss whatever users' attempts, on the opposite, these attempts are welcome and adopted when they prove to be useful, such as all the tests that are run for each and every release. It's just that there's a huge difference between proposing solutions and whining. Saying "you should have done that" or "if I were doing your job I would certainly not do it this way" is just whining. Saying "let me one extra day to run some more advanced tests myself" can definitely be part of a solution to improve the situation (and then you will be among those criticized for messing up from time to time).

Every open-source maintainer gets complaints that the software is not meeting users' needs. Those users often aren't in a position to fix the software themselves, they may have suggestions which don't account for the full complexity of the problem, and they may not even fully understand their own needs. Even when a maintainer needs to reject a suggestion (and they should, often!) the feedback is still a great source of information about where improvements might be useful. And sometimes a suggestion contains the seed of a good idea. Even if the people in this comment section are wrong about a lot of the details, I'm sure there's at least one idea here that's worth exploring.

As you said in another subthread, the existing stable kernel process has worked remarkably well for its scale. But processes don't scale forever, and processes can't be improved without the participation (and probably the active commitment) of the actual maintainers. BitKeeper and then Git allowed kernel development to scale to today's levels, but those tools could never have succeeded if key maintainers hadn't actively embraced them and encouraged their use. At the end of the day, while a lot of the day-to-day work can be handled by any skilled contributor, the direction of a project must be set by its maintainers.

Ext4 data corruption in stable kernels

Posted Dec 13, 2023 5:44 UTC (Wed) by wtarreau (subscriber, #51152) [Link]

> I think we're talking past each other here. The fix for 4.14.96 had already landed in 4.14.97. I backported one patch from it, rather than taking the entire release.

OK got it, and yes for such rare cases where the fix is already accepted by maintainers and validated, I agree that it remains a reasonable approach.

> But processes don't scale forever, and processes can't be improved without the participation (and probably the active commitment) of the actual maintainers.

That's totally true, but it's also important to keep in mind the fact that maintainers are scarce and already overloaded, and that asking them to experiment with random changes is the best way to waste their time or make them feel their work is useless. Coming with a PoC saying "don't you think something like this could improve your work" is a lot different from "you should just do this or that". Maintainers do not miss suggestions that come from everywhere all the time. Remember how many times Linus was suggested to switch to SVN before Git appeared ? If all those who suggested it had actually just tried prior to speaking, they would have had their response and avoided to look like fools.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 10:19 UTC (Tue) by geert (subscriber, #98403) [Link] (3 responses)

Playing the devil's advocate (which can be considered appropriate for v6.6.6 ;-)

> Here you're speaking about cherry-picking fixes. That's something extremely dangerous that nobody must ever do [...]

But stable is also cherry-picking some changes, but not others?!?!? Nobody knows if they work well together or if another important patch is missing...

The only solution is to follow mainline ;-)

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 18:05 UTC (Tue) by wtarreau (subscriber, #51152) [Link] (2 responses)

> But stable is also cherry-picking some changes, but not others?!?!? Nobody knows if they work well together or if another important patch is missing...

That has always been the case. For a long time I used to say myself that the kernels I was releasing were certainly full of bugs (otherwise it would not be needed to issue future releases), but the difference with the ones peoplle build in their garage is that the official stable ones are the result of:
- reviews from all the patch authors
- tests from various teams and individuals.

I.e. they are much better known than other combinations.

One would say that patch authors do not send a lot of feedback but there are regularly one or two responses in a series, either asking for another patch if one is picked, or suggesting not to pick one, so that works as well. And the tests are invaluable. When I picked 2.6.32 and Greg insisted that now I had to follow the whole review process, I was really annoyed because it doubled my work. But seeing suggestions to fix around 10 patches per series based on review and testing showed me the garbage I used to provide before this process. That's why I'm saying: people complain but the process works remarkably well given the number of patches and the number of regressions. I remember the era of early 2.6 where you would have been foolish to run an stable version before .10 or so. I've hade on one of my machines a 4.9.2 that I never updated for 5 years for some reason, and it never messed up on me. I'm not advocating for not updating but I mean that mainline is much stabler than it used to and stable sees very few regressions.

> The only solution is to follow mainline ;-)

That's what Linus sometimes says as well. That's where you get the latest fixes and the latest bugs as well. It doesn't mean the balance is necessarily bad, but it's more adventurous :-)

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 18:49 UTC (Tue) by farnz (subscriber, #17727) [Link] (1 responses)

As an aside, I've noted more than once in my career that there's a deep tradeoff in dependency handling here:

I can stick with an old version, and keep trying to patch it to have fewer bugs but no new features. This is less work week-by-week, but when I do hit a significant bug where I can't find a fix myself, upstream is unlikely to be much help (because I'm based on an "ancient" codebase from their point of view).
I can keep up to date with the latest version, with new features coming in all the time, doing more work week-by-week, but not having the "big leaps" to make, and having upstream much more able to help me fix any bugs I find, because I'm basing my use on a codebase that they work on every day.

For example, keeping up with latest Fedora releases is harder week-by-week than keeping up with RHEL major releases; but getting support from upstreams for the versions of packages in Fedora is generally easier than getting support for something in the last RHEL major release, because it's so much closer to their current code; further, it's generally easier to go from a "latest Fedora" version of something to "latest upstream development branch" than to go from "latest RHEL release" to "latest upstream development branch" and find patches yourself that way.

Ext4 data corruption in stable kernels

Posted Dec 13, 2023 5:54 UTC (Wed) by wtarreau (subscriber, #51152) [Link]

For me that's a tooling problem before all. For users convenience most of the time you can just upgrade to the latest version since it's supposedly better. The fact is that it's *often* better but not always, and for some users, the few cases where it's not better are so worse that they'd prefer not to *take the risk* to update. This is exactly the root cause of the problem.

What I'm suggesting is to update in small jumps, but not the last version which is still lacking feedback. If you see, say, 6.1.66 being released, and you consider that the 1-month old 6.1.62 looks correct because nobody complained about it, and nobody recently spoke about a critical urgent update that requires that everyone absolutely updates to yesterday's patch, then you could just update to 6.1.62 (or any surrounding one that was reported to work pretty well). This leaves one month of feedback for these kernels for you to choose, doesn't require too frequent updates and doesn't require to live on the bleeding edge (i.e. less risks of regressions).

That's obviously not rocket science and will not always work, but this approach allows you to skip big regressions with immediate impact, and generally saves you from having to update twice in a row.