Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 23:18 UTC (Mon) by roc (subscriber, #30627)
In reply to: Linux 5.12's very bad, double ungood day by airlied
Parent article: Linux 5.12's very bad, double ungood day

Something is definitely wrong if automated tests caught it and it was still released as rc1 and people still based work on top of it.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 4:26 UTC (Tue) by airlied (subscriber, #9104) [Link] (18 responses)

the intel graphics CI only gets it hands on things at rc1, there is usually a bunch of minor regressions, and sometimes a major one.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 12:01 UTC (Tue) by roc (subscriber, #30627) [Link] (17 responses)

Well then, like I said above: tests need to be run before the RC releases. This is testing 101 stuff that is table stakes for most projects today.

I am constantly frustrated by the kernel testing culture, or lack of it. rr has approximately zero resources behind it and our automated tests are still responsible for detecting an average of about one kernel regression every release.

Every time I bring this up people have a string of excuses, like how it's hard to test drivers etc. Some of those are fair, but the bug in this article and pretty much all the regressions found by rr aren't in drivers, they're in core kernel code that can easily be tested, even from user space.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 13:27 UTC (Tue) by pizza (subscriber, #46) [Link] (16 responses)

> Well then, like I said above: tests need to be run before the RC releases.

Okay, so... when exactly?

There are 10K commits (give or take a couple thousand) that land in every -rc1. Indeed, until -rc1 lands, nobody can really be sure if a given pull request (or even a specific patch) will get accepted. This is why nearly all upstream tooling treats "-rc1" as the "time to start looking for regressions" inflection point [1], and they spend the next *two months* fixing whatever comes up. This has been the established process for over a decade now.

So what if there was a (nasty) bug that takes down a test rig? That's what the test rigs are for! The only thing unusual about this bug is that it leads to silent corruption, to the point where "testing" in of itself wasn't enough; the test would have had to been robust enough to ensure nothing unexpected was written anywhere to the disk. That's a deceptively hairy testing scenario, arguably going well beyond the tests folks developing filesystems run.

Note I'm not making excuses here; it is a nasty bug and clearly the tests that its developers ran was insufficient. But it is ridiculous to expect "release-quality" regression testing to be completed at the start of the designated testing period.

[1] Indeed, many regressions are due to the combinations of unrelated changes in a given -rc1; each of those 10K patches in of themselves is fine, but (eg) patch #3313 could lead to data loss, but only in combination of a specific kernel option being enabled, and run on a system containing an old 3Ware RAID controller and a specific motherboard with a PCI-X bridge that can't pass through MSI interrupts due to how it was physically wired up. [2] [3]

[2] It's sitting about four feet away from me as I type this.

[3] Kernel bugzilla #43074

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 14:11 UTC (Tue) by epa (subscriber, #39769) [Link]

I guess if the test setup uses a checksummed filesystem then you verify the filesystem after the tests have completed and if it's corrupted, that's a failure. If the filesystem doesn't have per-file checksums, you can at least do the usual fsck stuff to check metadata. (For testing it would be handy to have a filesystem mode that always zeroes out unused pages, so that a thorough fsck can later check that all unused pages are zero.)

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 15:12 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (11 responses)

Linus can issue a pre-RC (alpha1?) to give time to run the tests, a day before the actual RC.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 15:22 UTC (Tue) by geert (subscriber, #98403) [Link] (1 responses)

That's why we linux-next, which is used for lots of automated integration tests

$ git tag --contains 48d15436fde6
next-20210128
next-20210129
next-20210201
next-20210202
next-20210203
next-20210204
next-20210205
next-20210208
next-20210209
next-20210210
next-20210211
next-20210212
next-20210215
next-20210216
next-20210217
next-20210218
next-20210219
next-20210222
next-20210223
next-20210224
next-20210225
next-20210226
next-20210301
next-20210302
next-20210303
next-20210304
next-20210305
next-20210309
v5.12-rc1
v5.12-rc1-dontuse
v5.12-rc2

Three weeks passed between the buggy commit entering linux-next and upstream.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 15:35 UTC (Tue) by pizza (subscriber, #46) [Link]

> That's why we linux-next, which is used for lots of automated integration tests
> Three weeks passed between the buggy commit entering linux-next and upstream.

So the "problem" here isn't that nothing was being tested, it's just that none of the tests run during this interval window caught this particular issue. It's also not clear that there was even a test out there that could have caught this, except by pure happenstance.

But that's the reality of software work; a bug turns up, write a test to catch it (and hopefully others of the same class), add it to the test suite (which runs as often as your available resources allow) .... and repeat endlessly.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 15:25 UTC (Tue) by pizza (subscriber, #46) [Link] (8 responses)

... okay, so rename "rc1" to "alpha1" , "rc2" to "alpha2" and so forth. Problem solved?

Not that it will stop folks complaining when "5.32-alpha0-rc4-pre3" fails to boot on their production system, obviously because it should have been tested first, and we need a pre-pre-pre-pre-pre release snapshot to start testing against.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 15:26 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

Kinda. You'll have alphaN that is released before rcN, and the distinction is that alphaN is meant only for automatic testing on throwaway hardware.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 17:47 UTC (Tue) by Wol (subscriber, #4433) [Link] (5 responses)

And how many people will ignore that (or be unaware) and load alphaN on their production server anyway?

Horse to water and all that ...

Cheers,
Wol

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 19:55 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

Crazy idea: require kernel.testing.alpha=5.20.0-alpha1 on the cmdline to boot such an alpha kernel. Reject such an option in non-alpha kernels (including development kernels; the tagged release would require this code and not otherwise).

But this kind of one-off code is annoying to test itself and someone will script adding it to their boot command lines anyways.

Linux 5.12's very bad, double ungood day

Posted Mar 10, 2021 21:58 UTC (Wed) by sjj (guest, #2020) [Link] (3 responses)

How many people run non-distro kernels these days, especially in production? If you do that with an rc kernel, you certainly deserve whatever pieces are left of your data.

I don’t think I’ve built a kernel in 10 years, or maybe that one time 7-8 years ago.

Linux 5.12's very bad, double ungood day

Posted Mar 10, 2021 22:22 UTC (Wed) by roc (subscriber, #30627) [Link] (1 responses)

In the past, when reporting a bug in a release kernel, people have asked me to install some random kernel revision to see if the bug is still present with it. If we should never do that, people should stop asking for it.

Linux 5.12's very bad, double ungood day

Posted Mar 11, 2021 8:43 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

You weren't supposed to do that in production though. Also, answering "sorry I can't" is perfectly valid. :)

Linux 5.12's very bad, double ungood day

Posted Mar 10, 2021 23:20 UTC (Wed) by Wol (subscriber, #4433) [Link]

> I don’t think I’ve built a kernel in 10 years, or maybe that one time 7-8 years ago.

You clearly don't run gentoo :-)

Cheers,
Wol

Linux 5.12's very bad, double ungood day

Posted May 2, 2021 2:58 UTC (Sun) by pizza (subscriber, #46) [Link]

> Not that it will stop folks complaining when "5.32-alpha0-rc4-pre3" fails to boot on their production system, obviously because it should have been tested first, and we need a pre-pre-pre-pre-pre release snapshot to start testing against.

I saw this scroll by when I upgraded this system to Fedora 34:

$ rpm -q icedtea-web
icedtea-web-2.0.0-pre.0.3.alpha16.patched1.fc34.3.x86_64

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 22:44 UTC (Tue) by dbnichol (subscriber, #39622) [Link] (1 responses)

Why not test linus master HEAD prior to tagging? It's not like he pulls all the requests and tags the release in one giant push. Testing a tagged release makes sense as that's what people are going to use, but a CI system could test HEAD constantly. I think that's what basically every other project does - test what's on the branch before you tag it.

Linux 5.12's very bad, double ungood day

Posted Mar 10, 2021 2:43 UTC (Wed) by roc (subscriber, #30627) [Link]

Many projects run tests on every PR before merging. It's difficult to get this to scale, but not impossible.

Linux 5.12's very bad, double ungood day

Posted Mar 10, 2021 2:34 UTC (Wed) by roc (subscriber, #30627) [Link]

Projects that are serious about testing run all their automated tests as often as they can. The sooner you detect a regression the easier it is to bisect, debug, and fix, with less impact on other developers and users.

In practice, large projects often try to maximise bang-for-the-buck by dividing tests into tiers, e.g. tier 1 tests run on every push, tier 2 every day, maybe a tier 3 that runs less often. Many projects use heuristics or machine learning to choose which tests to run in each run of tier 1.

Yes, I understand that it's difficult to thoroughly test weird hardware and configuration combinations. Ideally organizations that produce hardware with Linux support would contribute testing on that hardware. But even if we ignore all those bugs, there are still lots of core kernel bugs not being caught by kernel CI.