Linux 5.12's very bad, double ungood day
On January 26, Christoph Hellwig posted a 17-patch series containing cleanups to the code dealing with the allocation of the BIO structures used to represent block-I/O requests. The final patch in that series simplified the allocation of requests for swap files in particular. The series was applied by block maintainer Jens Axboe one day later. The change was included in this pull request sent on February 17, and landed in the mainline on February 21 as part of the massive set of pulls done by Torvalds once his power was restored.
"Swapping" is how the kernel pushes anonymous pages (those which are not backed up by a file on disk — program data, in other words) out to persistent storage when memory gets tight. Linux can swap directly to a partition on a block device; that is how systems are traditionally set up. But the kernel can also swap to a file within a mounted filesystem with something close to the same performance. Swapping to a file gives administrators some additional flexibility; it is an easy way to give some relief to a system that is running short of memory without disturbing the existing workload, for example.
The problematic patch works just fine for swap activity to a partition. But, when a swap file is in use, the offset to the location of its blocks within the filesystem would be lost. That had the result of redirecting swap traffic to an essentially random location on the filesystem — a location that was not intended for swapping and may well have contained other information that somebody would have preferred to keep. In other words, the filesystem containing the swap file would be corrupted. (Interestingly, a similar bug was introduced in 2014 and lurked undetected because it only affects extremely rare configurations.)
On March 2, a fix
was applied to the mainline repository.
One day later, Torvalds sent a
warning to the linux-kernel mailing list describing the problem and
noting that: "This is what we in the industry call 'double
ungood'
". He removed the 5.12-rc1 tag from his repository (or,
rather, renamed it to "5.12-rc1-dontuse") and generally made it clear that
this release should be avoided. He also had a couple of requests for
maintainers.
One had to do with the 5.12-rc1 tag; he can remove it from his repository, but that does not change anybody else's repository. So, he said, maintainers should manually remove that tag themselves just to prevent any accidental use of that release.
Beyond that, it is common for subsystem maintainers to base new branches on
an -rc release, even -rc1. But any such branches pose a particular hazard
for the development and testing community in general, because they will not
contain the fix for this problem. The git bisect tool, which is
frequently used to find the patch that caused a regression, can land
on any location in the development history; branches containing the buggy
commit (but not the fix) create extended ranges of history with the
potential to destroy filesystems far into the future. This, also, seems
"'double ungood'
".
The answer is to not base development branches on 5.12-rc1, but instead to use either an older release or to wait until -rc2. That will not be entirely helpful for subsystem maintainers who have already published branches with an -rc1 base, as some certainly have. They have a choice between keeping the dangerous branch or rebasing it onto a safer branch point, possibly creating difficulties for anybody else who has built on that branch. That sort of rebasing would normally be frowned upon, but Torvalds made it clear that it is permissible — and expected — this time.
The kernel community is mostly treating this as a case of "bugs happen" and moving on. One can indeed argue that the process has worked the way it is supposed to: a catastrophic bug was caught early during the stabilization period and will never make it anywhere near a production system. That said, there may be room for a bit of introspection and thinking about how things could have been done better.
For example, one might argue that the review process could have worked better. The patch in question was only posted once, and was applied to the block repository the next day. The patch in the mainline repository contains two Reviewed-by tags and one Acked-by tag, but the mailing-list thread shows those tags all being offered for a different patch in the series. Perhaps all three reviewers offered their tags off-list but, from the public record, it's not clear how much review the problematic patch actually received.
There is no guarantee that more eyeballs would have found this bug before it was committed, but it seems that those eyeballs were not given much of a chance here. It is tempting to see "cleanup" patches as being inherently safe and needing less review, but those patches can contain subtle bugs as well.
A part of the process that could have worked more quickly was warning the community as a whole of the problem. In the message linked above, Torvalds acknowledged that he should have acted more quickly rather than waiting to be sure that the proposed fix actually made the problem go away. One can only imagine that the warning will go out more quickly the next time.
There is one other little problem here that surprisingly few people are talking about. The health of the mailing-lists hosted on vger.kernel.org has not been optimal for a while; at times, posts to the mailing lists have been delayed for a day or two or dropped altogether. Torvalds's warning was not delayed that badly, but it still took several hours to get to recipients of linux-kernel. Whether the delay in the delivery of that warning caused more testers to be bitten by this bug is not clear, but it certainly did not help.
The good news is that work is currently being done to put the vger lists on a more solid, well-supported footing. With luck, the mailing-list problems will be behind us in the relatively near future.
Even then, though, a warning sent just to linux-kernel has a real chance of disappearing into the noise. The traffic on that list, which routinely exceeds 1,000 messages per day, can drown out important messages and has caused many developers to unsubscribe completely. It may well be that we need better communication channels for this kind of urgent message.
In the end, the actual pain caused by this bug was probably relatively
small; there have not been massive numbers of "it ate my filesystem!"
messages going around. The process did mostly work the way it was intended
to. With luck, lessons learned from this incident can help that process to
work even better the next time.
| Index entries for this article | |
|---|---|
| Kernel | Releases/5.12 |
