Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 17:30 UTC (Mon) by syrjala (subscriber, #47399)
Parent article: Linux 5.12's very bad, double ungood day

I've not seen a definite statement on whether the patch was tested at all by the author. I don't think it's unreasonable to expect changes to swapfile.c to be tested using a swapfile? "I was too busy/lazy to run any tests whatsoever" is not a valid excuse IMO.

An excuse I would accept is that appropriate tests were run but they simply did not catch the problem. In which case I would expect to see improvements to the tests (+ making sure someone is actually running them in their CI system) to make sure this same bug doesn't happen ever again.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 19:28 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (5 responses)

Well, the state of patch status visibility on mailing lists is…low. Long mail delivery time doesn't help the issue. I'm fine with mailing lists as a development nexus, but there needs to be some infrastructure around them to indicate the status of the patches as superceded, reviewed, rejected, queued, etc.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 20:34 UTC (Mon) by flussence (guest, #85566) [Link] (4 responses)

That's what https://patchwork.kernel.org/ appears to already address.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 21:56 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (3 responses)

Well, it *tries*. Look at this pull request:

https://patchwork.kernel.org/project/keyrings/patch/13239...

Its state is "New", but it has been dropped by the submitter. How would one change the state from "New" to "Dropped"?

This one is tracked as "New", but has been merged (as noted by the pr-tracker-bot). How did Patchtracker miss this?

https://patchwork.kernel.org/project/keyrings/patch/13228...

And this is for pull requests.

This one is "New", but has a "Reviewed-by" someone who can collect it for the subsystem. Where is it on its way into the appropriate tree(s)?

https://patchwork.kernel.org/project/keyrings/patch/20210...

And that's just the one list I follow loosely. How can one trust this site for actual tracking of arbitrary kernel patches with this quality of tracking? Sure, this list might not *use* patchtracker as much as other lists, but how does one determine *that* bit of information?

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 16:12 UTC (Tue) by andy_shev (subscriber, #75870) [Link] (1 responses)

Unfortunately patchwork is disconnected from the target repository. That's why Gerrit is better, for example.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 20:29 UTC (Tue) by marcH (subscriber, #57642) [Link]

> Unfortunately patchwork is disconnected from the target repository. That's why Gerrit is better, for example.

patchwork tries to find code submissions not addressed to it directly. All other code review solutions require everything to be sent to them directly, they act as gateways. That's the very simple reason why they have all the information and all work better.

The icing on the cake is of course recipient-controlled email notifications instead of sender-control notifications (a.k.a... "spam"). In other words people routinely subscribe and unsubscribe to/from individual code reviews; not possible with a crude mailing list.

None of this is rocket science, it's all very easy to see and understand except when obsessing about the "email versus web" user interface questions; these debates are not "wrong" but they hide the much more important backend and database design issues.

You could totally have a gateway type code review tool entirely driven by email. In fact requesting all submissions, review comments and approvals or rejections to be sent (by email) to patchwork directly and putting patchwork in better control of the git repo would get at least half-way there.

Linux 5.12's very bad, double ungood day

Posted Mar 11, 2021 16:04 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Another instance I saw today was that a version of the patch that went in was never on the list (a whitespace fix). While such a tiny change is probably OK in practice, I vastly prefer having a bot attached to the service that stashes away every version of the patchset for posterity.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 20:19 UTC (Mon) by cesarb (subscriber, #6266) [Link]

> I don't think it's unreasonable to expect changes to swapfile.c to be tested using a swapfile?

Even though it's called "swapfile.c", that file has the code for both swap files and swap partitions, including things like the swapon and swapoff system calls. Surprisingly little of that code is specific to either swap files only or swap partitions only, nearly all of it being in the setup and teardown of the swap areas (setting the block size on swapon for partitions, reading the swap extents from the filesystem on swapon for regular files). Search that file for S_ISBLK and S_ISREG to see the few places where the code diverges.

Linux 5.12's very bad, double ungood day

Posted Mar 8, 2021 23:17 UTC (Mon) by tome (subscriber, #3171) [Link] (4 responses)

> An excuse I would accept is that appropriate tests were run but they simply did not catch the problem. In which case I would expect to see improvements to the tests

Indeed this bug could pass tests on a swap file accidentally by just clobbering unused locations . If so those tests need more fuzz.

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 8:21 UTC (Tue) by matthias (subscriber, #94967) [Link] (1 responses)

If the test does not verify that the data is written to the correct location in the swapfile, the test will probably pass. The data can be read back fine. Even if you hit some used location this will manifest as random filesystem corruption. You only notice this if you verify the filesystem afterwards, but why should a test of the swapping code verify the correctness of a totally unrelated filesystem?

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 8:44 UTC (Tue) by marcH (subscriber, #57642) [Link]

Tests may pass, but some time later the machine will behave erratically which will raise suspicion. Pretty much what happened but too far away from the origin of the patch and too late.

So back to the original question: which "swaptorture" test suite was used to validate this patch and did that test suite include configuration(s) with swap file(s)? The main article wonders "how things could have been done better" and asks some interesting code review questions but does not seem to mention anything like "test gap".

Linux 5.12's very bad, double ungood day

Posted Mar 9, 2021 16:43 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (1 responses)

Here's an idea:

- initialize a swapfile
- do things with it
- copy it to a new, fresh FS
- restore from it

That at least gets the "everything was written to the swapfile" (rather than willy-nilly across the hosting FS). Doesn't guarantee that *extra* writes didn't go out. Some form of writing a given bitpattern to the entire FS and then ensuring that post-FS init and swap file creation, nothing else changes (modulo mtime/inode updates) might suffice? I have no idea if such a "simple" FS exists though. Or create a single file which takes up the remaining FS space with a given bitpattern reads properly after using the swapfile. Wrecking any data or metadata would presumably be detectable then, no?

Linux 5.12's very bad, double ungood day

Posted Mar 10, 2021 1:47 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

This is (hypothetically) a test. It doesn't need to be a reasonable FS in the first place. Tell the swapfile code that the file begins at such-and-such offset and occupies such-and-such size, and fill the rest of the image with 0xDEADBEEF or whatever sentinel value you like. I'm sure they could make an extremely stupid filesystem that works that way internally (and just returns EROFS or whatever if the user tries to create files or do other things it doesn't like), so you don't even need to properly mock out any of the FS code.