A warning about 5.12-rc1
But I want everybody to be aware of because _if_ it bites you, it bites you hard, and you can end up with a filesystem that is essentially overwritten by random swap data. This is what we in the industry call 'double ungood'." Additionally, he is asking maintainers to not start branches from 5.12-rc1 to avoid future situations where people land in the buggy code while bisecting problems.
From: | Linus Torvalds <torvalds-AT-linux-foundation.org> | |
To: | Linux Kernel Mailing List <linux-kernel-AT-vger.kernel.org> | |
Subject: | A note on the 5.12-rc1 tag | |
Date: | Wed, 03 Mar 2021 12:53:18 -0800 | |
Message-ID: | <CAHk-=wjnzdLSP3oDxhf9eMTYo7GF-QjaNLBUH1Zk3c4A7X75YA@mail.gmail.com> | |
Archive-link: | Article |
Hey peeps - some of you may have already noticed that in my public git tree, the "v5.12-rc1" tag has magically been renamed to "v5.12-rc1-dontuse". It's still the same object, it still says "v5.12-rc1" internally, and it is still is signed by me, but the user-visible name of the tag has changed. The reason is fairly straightforward: this merge window, we had a very innocuous code cleanup and simplification that raised no red flags at all, but had a subtle and very nasty bug in it: swap files stopped working right. And they stopped working in a particularly bad way: the offset of the start of the swap file was lost. Swapping still happened, but it happened to the wrong part of the filesystem, with the obvious catastrophic end results. Now, the good news is even if you do use swap (and hey, that's nowhere near as common as it used to be), most people don't use a swap *file*, but a separate swap *partition*. And the bug in question really only happens for when you have a regular filesystem, and put a file on it as a swap. And, as far as I know, all the normal distributions set things up with swap partitions, not files, because honestly, swapfiles tend to be slower and have various other complexity issues. The bad news is that the reason we support swapfiles in the first place is that they do end up having some flexibility advantages, and so some people do use them for that reason. If so, do not use rc1. Thus the renaming of the tag. Yes, this is very unfortunate, but it really wasn't a very obvious bug, and it didn't even show up in normal testing, exactly because swapfiles just aren't normal. So I'm not blaming the developers in question, and it also wasn't due to the odd timing of the merge window, it was just simply an unusually nasty bug that did get caught and is fixed in the current tree. But I want everybody to be aware of because _if_ it bites you, it bites you hard, and you can end up with a filesystem that is essentially overwritten by random swap data. This is what we in the industry call "double ungood". Now, there's a couple of additional reasons for me writing this note other than just "don't run 5.12-rc1 if you use a swapfile". Because it's more than just "ok, we all know the merge window is when all the new scary code gets merged, and rc1 can be a bit scary and not work for everybody". Yes, rc1 tends to be buggier than later rc's, we are all used to that, but honestly, most of the time the bugs are much smaller annoyances than this time. And in fact, most of our rc1 releases have been so solid over the years that people may have forgotten that "yeah, this is all the new code that can have nasty bugs in it". One additional reason for this note is that I want to not just warn people to not run this if you have a swapfile - even if you are personally not impacted (like I am, and probably most people are - swap partitions all around) - I want to make sure that nobody starts new topic branches using that 5.12-rc1 tag. I know a few developers tend to go "Ok, rc1 is out, I got all my development work into this merge window, I will now fast-forward to rc1 and use that as a base for the next release". Don't do it this time. It may work perfectly well for you because you have the common partition setup, but it can end up being a horrible base for anybody else that might end up bisecting into that area. And the *final* reason I want to just note this is a purely git process one: if you already pulled my git tree, you will have that "v5.12-rc1" tag, and the fact that it no longer exists in my public tree under that name changes nothing at all for you. Git is distributed, and me removing that tag and replacing it with another name doesn't magically remove it from other copies unless you have special mirroring code. So if you have a kernel git tree (and I'm here assuming "origin" points to my trees), and you do git fetch --tags origin you _will_ now see the new "v5.12-rc1-dontuse" tag. But git won't remove the old v5.12-rc1 tag, because while git will see that it is not upstream, git will just assume that that simply means that it's your own local tag. Tags, unlike branch names, are a global namespace in git. So you should additionally do a "git tag -d v5.12-rc1" to actually get rid of the original tag name. Of course, having the old tag doesn't really do anything bad, so this git process thing is entirely up to you. As long as you don't _use_ v5.12-rc1 for anything, having the tag around won't really matter, and having both 'v5.12-rc1' _and_ 'v5.12-rc1-dontuse' doesn't hurt anything either, and seeing both is hopefully already sufficient warning of "let's not use that then". Sorry for this mess, Linus
Posted Mar 4, 2021 20:33 UTC (Thu)
by kunitz (subscriber, #3965)
[Link] (26 responses)
Posted Mar 4, 2021 22:22 UTC (Thu)
by MatejLach (guest, #84942)
[Link] (21 responses)
Posted Mar 5, 2021 0:16 UTC (Fri)
by JMB (guest, #74439)
[Link] (18 responses)
Posted Mar 5, 2021 12:12 UTC (Fri)
by Creideiki (subscriber, #38747)
[Link] (7 responses)
Posted Mar 5, 2021 12:56 UTC (Fri)
by leromarinvit (subscriber, #56850)
[Link]
Someone should open an issue for this. I'm not really in a position to do it, because I've never even heard of this thing before, and a drive-by "your installer sucks" ticket seems kind of rude. But I really think they aren't doing their users a service by doing this.
Posted Mar 5, 2021 16:57 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
Posted Mar 5, 2021 17:06 UTC (Fri)
by rsidd (subscriber, #2582)
[Link] (2 responses)
Posted Mar 5, 2021 19:14 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Posted Mar 6, 2021 7:12 UTC (Sat)
by cesarb (subscriber, #6266)
[Link]
Having first used Linux in a computer where the size of the main memory was a low number of megabytes doesn't help; eight megabytes of swap was both a reasonable amount and a significant chunk of disk space.
Posted Mar 7, 2021 19:19 UTC (Sun)
by mss (subscriber, #138799)
[Link]
Installing, updating and removing packages on a system is a responsibility of distro's package management system.
Posted Mar 17, 2021 1:29 UTC (Wed)
by SteveClement (guest, #61839)
[Link]
By chance and perhaps ignorance I missed that the target location is bad.
Thanks for pointing it out and opening an issue.
Steve
Posted Mar 5, 2021 14:33 UTC (Fri)
by hackan (guest, #145281)
[Link] (4 responses)
So sit back and relax... 'Cause every little thing, is gonna be alright.
Posted Mar 6, 2021 5:54 UTC (Sat)
by tajyrink (subscriber, #2750)
[Link] (3 responses)
To clarify the situation otherwise a bit, Ubuntu LTS used to use partition still in 18.04. Those and older existing users that installed with default settings that have upgraded to eg 20.04 are still using a partition. New 20.04 desktop installations however use the file. Server might still default to partition, it's a separate installer.
Myself I've always used swap file everywhere on all distributions (Debian, SUSE, Ubuntu) as long as I've known about the possibility. I hate splitting things into partitions and then needing to at some point rethink if those were right size. Even in the 2000s I always thought also separate /home is just a mantra supposed to be the "right thing" while was just made on assumptions that do not matter to most people.
Posted Mar 6, 2021 23:11 UTC (Sat)
by technophobian (guest, #145315)
[Link] (2 responses)
Posted Mar 7, 2021 17:09 UTC (Sun)
by Gaelan (guest, #145108)
[Link] (1 responses)
Posted Mar 8, 2021 8:58 UTC (Mon)
by Kamiccolo (subscriber, #95159)
[Link]
(o´・_・)っ
Posted Mar 5, 2021 15:50 UTC (Fri)
by zlynx (guest, #2285)
[Link] (4 responses)
I built a new workstation PC recently with a Ryzen 3900 (now 5950), 64G RAM and an Optane boot drive. I installed Ubuntu 18.04 LTS and used all of the defaults.
As a result my Ubuntu 20.04 is using /swapfile on ext4 on one giant partition because I simply used the Ubuntu defaults. Including GNOME. Pretty much the only change I did was to make Wayland the default.
My opinion is that doing too much customization to a Linux install is a waste of time.
Posted Mar 5, 2021 17:07 UTC (Fri)
by rsidd (subscriber, #2582)
[Link] (3 responses)
Posted Mar 5, 2021 17:38 UTC (Fri)
by nivedita76 (subscriber, #121790)
[Link]
Posted Mar 5, 2021 20:00 UTC (Fri)
by clump (subscriber, #27801)
[Link] (1 responses)
Posted Mar 5, 2021 21:45 UTC (Fri)
by smurf (subscriber, #17840)
[Link]
Posted Mar 5, 2021 16:03 UTC (Fri)
by ecree (guest, #95790)
[Link] (1 responses)
Yeah, it has considerable excess kurtosis.
(I'll get my coat.)
Posted Mar 5, 2021 16:39 UTC (Fri)
by amacater (subscriber, #790)
[Link]
Posted Mar 5, 2021 11:21 UTC (Fri)
by gerdesj (subscriber, #5446)
[Link] (2 responses)
Posted Mar 18, 2021 10:00 UTC (Thu)
by mgedmin (subscriber, #34497)
[Link] (1 responses)
Here's a short ubuntu-devel@ thread from 2016 discussing this: https://lists.ubuntu.com/archives/ubuntu-devel/2016-Novem...
Posted Mar 18, 2021 10:45 UTC (Thu)
by zdzichu (subscriber, #17118)
[Link]
Posted Mar 6, 2021 8:34 UTC (Sat)
by kunitz (subscriber, #3965)
[Link]
Posted Mar 5, 2021 2:56 UTC (Fri)
by rioting_pacifist (guest, #134765)
[Link] (16 responses)
If Linus still insists on not having public unit tests because they allow for lazy development, it would be good to at least hear that a private integration test will be added for swapfiles.
Posted Mar 5, 2021 6:57 UTC (Fri)
by adobriyan (subscriber, #30858)
[Link] (5 responses)
Posted Mar 5, 2021 10:01 UTC (Fri)
by rioting_pacifist (guest, #134765)
[Link] (4 responses)
I assume hibernation is tested on an RC before it's released, just not using a swapfile
Posted Mar 5, 2021 21:50 UTC (Fri)
by smurf (subscriber, #17840)
[Link] (3 responses)
Bottom line: even if it seems to work (I don't know, not having tested that for exactly the above reason) … don't. Not if you want to keep your file system consistent, that is.
Posted Mar 5, 2021 23:00 UTC (Fri)
by mjg59 (subscriber, #23239)
[Link] (2 responses)
Posted Mar 6, 2021 13:39 UTC (Sat)
by geert (subscriber, #98403)
[Link] (1 responses)
Posted Mar 6, 2021 16:19 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Mar 5, 2021 8:01 UTC (Fri)
by gregkh (subscriber, #8)
[Link] (6 responses)
Posted Mar 5, 2021 10:01 UTC (Fri)
by adobriyan (subscriber, #30858)
[Link] (3 responses)
Posted Mar 5, 2021 10:20 UTC (Fri)
by gregkh (subscriber, #8)
[Link]
Posted Mar 8, 2021 15:02 UTC (Mon)
by willy (subscriber, #9762)
[Link] (1 responses)
If your filesystem had been destroyed at the end of it, the test failed.
Posted Mar 8, 2021 15:11 UTC (Mon)
by gregkh (subscriber, #8)
[Link]
Posted Mar 5, 2021 11:53 UTC (Fri)
by rioting_pacifist (guest, #134765)
[Link] (1 responses)
Posted Mar 6, 2021 13:58 UTC (Sat)
by khim (subscriber, #9252)
[Link]
It's not “blaming the users”. It's mostly explaining who and when is affected. Pretty reasonable thing to do. Yes, there are also words which are designed to make developers who made that mistake feel better (also, actually, pretty nice thing to do, too). But talking about future plans? At this stage? It would be like if Bush would have recalled all the heads of various departments early on September 11 and directed them to start planning USA Patriot Act. Instead of, you know, saving people which can still be saved and setting out fires. I'm really glad Linus does not talk about it. That mail is not about future plans and shouldn't be about future plans.
Posted Mar 6, 2021 13:48 UTC (Sat)
by khim (subscriber, #9252)
[Link] (2 responses)
I'm really, really, REALLY glad Linus said nothing about that. This shows me, yet again, that he is competent manager of that while process. Why? Because “we have a mess, let's try to mitigate consequences of said mess” and “we had a mess, let's try to think about how to prevent it from happening again” are two entirely different things and you should never conflate them. Fixing he mess involves using existing procedures to the fullest extent. hanging them on-the-fly tend to just make mess worse. Not something you need when things are already bad. And time is of utmost importance: then more you think about “proper” “long-term” plan the more peoples are affected. Preventing mess from returning included careful changes to the procedures and planning. You don't want to have them changed to produce some other kind of mess, after all.
Posted Mar 6, 2021 14:58 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (1 responses)
And at the end of the day, you cannot be expected to help idiots stop harming themselves. I notice Linus said "this is a heads-up for people who might rebase off of rc1". Isn't that something Linus objects to, rather strongly, to the extent he often bounces patches from people who do it?
In other words, this is normal breakage, that the process EXPECTS, there are procedures in place to handle it, and if idiots screw themselves then sorry, they get a Darwin award, or whatever it's called.
Cheers,
Posted Mar 9, 2021 19:58 UTC (Tue)
by khim (subscriber, #9252)
[Link]
Lots of developers are doing that. Last bug which may silently corrupt your system happened… I don't even remember when was that. 10 years ago? Sure, lockups, hangups and many other things are expected. Silent corruption is not (well... if you don't use experimental filesystems, but that's another kettle of fish). So taking it seriously was the right thing to do. Changes to the process… not sure. Precisely because last time it happened so long ago… maybe we should conclude that process is Ok, still.
Posted Mar 5, 2021 12:23 UTC (Fri)
by agruen (subscriber, #6613)
[Link] (1 responses)
Posted Mar 5, 2021 16:35 UTC (Fri)
by willy (subscriber, #9762)
[Link]
Posted Mar 6, 2021 1:13 UTC (Sat)
by bojan (subscriber, #14302)
[Link] (1 responses)
Examples:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/li...
https://git.kernel.org/pub/scm/linux/kernel/git/stable/li...
Not sure what to make of it. Unless these are not relevant commits...
Posted Mar 8, 2021 14:25 UTC (Mon)
by nix (subscriber, #2304)
[Link]
Posted Mar 7, 2021 10:00 UTC (Sun)
by geuder (subscriber, #62854)
[Link] (8 responses)
I remember having read that for while there has been absolutely no speed difference between swap in a filesystem and in a partition anymore. No idea how reliable that source was. Haven't checked the code, but I assume if it were guaranteed that the swap file would be contiguous, that should be possible. All you need is the block layer start address.
Posted Mar 7, 2021 11:35 UTC (Sun)
by zdzichu (subscriber, #17118)
[Link] (2 responses)
Posted Mar 8, 2021 8:52 UTC (Mon)
by geuder (subscriber, #62854)
[Link] (1 responses)
(Not that it would have any practical meaning for me. No device except my phone swaps. And that uses zram.)
Posted Mar 8, 2021 10:13 UTC (Mon)
by Wol (subscriber, #4433)
[Link]
Anyways, Linus took a look at the spaghetti that was the original swap code that had been there since forever, threw his toys out the pram, threw the code out of Linux, and waited for the dust to settle :-)
So we got a nice, clean, well-written and well-thought-out memory/swap system roundabout that point in time. It wasn't pretty while it was happening, though ...
Cheers,
Posted Mar 7, 2021 13:22 UTC (Sun)
by ailiop (subscriber, #128014)
[Link] (4 responses)
During swapon on a file, the logical block address of that file is obtained from the filesystem and used to submit block io when needed. This means that after swapfile initialization, the filesystem is completely bypassed and the underlying block device is directly addressed (that's at least for local filesystems, it's different for e.g. nfs).
In case of fragmented files, the filesystem is still bypassed: during swapon the block ranges of all extent maps are obtained from the fs, and maintained in an in-memory rbtree within the swap code that is looked up every time swap needs to read from or write to the file. This may add some overhead, depending on how fragmented the file is (i.e. on the number of extents / height of tree).
The only other difference is that in case of swapfiles the request_queue to the underlying blockdev is shared between the fs and the swap bio submissions at the block layer.
Posted Mar 8, 2021 9:02 UTC (Mon)
by geuder (subscriber, #62854)
[Link]
Ah, so while there is no difference in cycles, heavy filesystem acticity while swapping might lead to additional latency in the case of a swap file? If the block scheduling gives a swap partition good priority, which one would assume.
Posted Mar 8, 2021 22:54 UTC (Mon)
by neilbrown (subscriber, #359)
[Link] (2 responses)
While this is largely correct, it isn't quite the full story.
This only works when the filesystem provides a "bmap" interface, and doesn't provide a "swap_activate" interface.
Many local filesystems provide bmap - and so get good swap performance for free.
Posted Mar 9, 2021 0:10 UTC (Tue)
by ailiop (subscriber, #128014)
[Link]
This doesn't affect the actual swap page IO performance during runtime (swap in/out) though, as it only pertains to the swapfile initialization phase. In both variants (bmap and swap_activate) the local filesystems simply provide the blockmaps of all the extents that make up the swapfile, which are fed into add_swap_extent() and maintained in the swap_info_struct/swap_extent_root rbtree.
In either case, it is the same swap code that submits IO directly to the underlying blockdev, and after initialization the filesystem is completely out of the way and unaware that the mapped file blocks are being modified under it.
NFS is unique in that it both implements swap_activate and swap IO always goes through it (via the direct_IO address space op), which is why I mentioned it is different.
Posted Mar 11, 2021 4:11 UTC (Thu)
by dgc (subscriber, #6611)
[Link]
No, swap_activate does not mean the filesystems take responsibility for swap IO - all it changes is how the swap code maps the swapfile backing store into the swapfile's internal extent map. Both end up reporting contiguous regions of the file to the swapfile code via the add_swap_extent() function, hence there is no difference in performance between the two types of swapfile mapping mechanisms at all.
The difference is that the bmap method (generic_swapfile_activate()) only maps a block at a time and does not support files with unwritten extents. That means you can't do "fallocate 4g swapfile; swapon swapfile" because bmap will report the unwritten extents as holes in the file and so the swap code rejects those ranges as not usable. Hence to add a swapfile on a filesystem that only supports ->bmap you have to physically zero the file first. That's a problem if you are already in OOM conditions - the IO can push the system over the edge and/or take a long time to run and so the system goes off the cliff before you can activate the swapfile.
Being able to use fallocate to preallocate the swapfile means you can add tens of gigabytes of swapfile on filesystems like XFS in just a few milliseconds with minimal IO, CPU and RAM overhead and activate it straight away. This makes dynamic swapfile management (e.g. resizing) practical and much more useful compared to the old ->bmap based method for mapping that required physical zeroing before activation.
-Dave.
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
But people knowing Linux will not use the automatic install (incl. changing the partitioning) - they will at least use an OS partition, a data partition and a swap partition for sure (even in mid 1990-ies) and most people I have seen won't use GNOME either - so not Ubuntu but one of its flavours is used. So there is not one Ubuntu version (even when fixing e.g. `Focal Fossa´ = 20.04 LTS) but a big variety of those ... and this is typical for Debian and its derivatives.
But nevertheless - if an option is possible, it should work. And as swap files do exist they have to work and of cause should be tested - otherwise it should be deprecated and deleted ... and I am sure there are use cases for swap files ... but not for a professional workstation or server ...
Personally, I need at least 3 days to fully install and configure a new system for my needs - with all automation, adjustments and settings ... this is not pre-installed trash you are not allowed to change - it is a professional environment - and you are well advised to make it fit to YOUR workflow and not feeling obliged to adopt the workflow of some strange developers.
If something is possible, someone may use it under GNU/Linux ... and this is an important ingredient of freedom!
And this is the freedom of the user!
A warning about 5.12-rc1
and I am sure there are use cases for swap files ... but not for a professional workstation or server ...
You never know when someone makes a questionable decision. For example, and as yet another reason why you really, really shouldn't do curl | sudo bash blindly, take a look at this install script for a security product, which says, in part:
if [ $? == 2 ]; then
# In case you get "internal compiler error: Killed (program cc1plus)"
# You ran out of memory.
# Create some swap
sudo dd if=/dev/zero of=/var/swap.img bs=1024k count=4000
sudo mkswap /var/swap.img
sudo swapon /var/swap.img
# And compile again
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
Even Windows now tries to move to a centralized package management system with Microsoft Store.
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
Then we will know what to expect now, without resolving to 5 years old mail discussions.
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
Should be true for swapfiles created during initial installation, but may not be true when creating a swapfile on a system that's been in use for a while.
A warning about 5.12-rc1
Nope. The hibernation code writes the location of the blocks in a "linked list". So the file doesn't have to be a contiguous.
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
> Pretty weak handling of this, I get that it's a bug that got through, but instead of spending most of the email justifying it getting through it would be nice if Linus said what is going to change to stop such a bug getting through.
A warning about 5.12-rc1
A warning about 5.12-rc1
Wol
> If you run an rc1 on a system with anything important (yes I know people do extremely unwise things) then you are an idiot.
A warning about 5.12-rc1
A warning about 5.12-rc1
( BRANCH=HEAD;
git merge-base --is-ancestor 48d15436fde6 $BRANCH &&
! git merge-base --is-ancestor caf6912f3f4a $BRANCH
) && echo 'please rebase onto f69d02e37a85 or later'
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
http://lkml.iu.edu/hypermail/linux/kernel/0507.0/1690.html
See the last paragraph.
The email is from 2005. here are no performance differences between swap partitions and files since kernel 2.6.
A warning about 5.12-rc1
A warning about 5.12-rc1
Wol
A warning about 5.12-rc1
A warning about 5.12-rc1
A warning about 5.12-rc1
Network filesystems (NFS) and some local filesystems (btrfs, f2fs, xfs) provide swap_activate which effectively means that they take full responsibility for SWAP IO. Whether they then perform better or worse than the direct "bmap" approach I cannot say. All I know is that it is different code paths.
A warning about 5.12-rc1
A warning about 5.12-rc1
> which effectively means that they take full responsibility for SWAP IO. Whether they then
> perform better or worse than the direct "bmap" approach I cannot say. All I know is that
> it is different code paths