|
|
Subscribe / Log in / New account

A warning about 5.12-rc1

Linus Torvalds has sent out a note telling people not to install the recent 5.12-rc1 development kernel; this is especially true for anybody running with swap files. "But I want everybody to be aware of because _if_ it bites you, it bites you hard, and you can end up with a filesystem that is essentially overwritten by random swap data. This is what we in the industry call 'double ungood'." Additionally, he is asking maintainers to not start branches from 5.12-rc1 to avoid future situations where people land in the buggy code while bisecting problems.


From:  Linus Torvalds <torvalds-AT-linux-foundation.org>
To:  Linux Kernel Mailing List <linux-kernel-AT-vger.kernel.org>
Subject:  A note on the 5.12-rc1 tag
Date:  Wed, 03 Mar 2021 12:53:18 -0800
Message-ID:  <CAHk-=wjnzdLSP3oDxhf9eMTYo7GF-QjaNLBUH1Zk3c4A7X75YA@mail.gmail.com>
Archive-link:  Article

Hey peeps - some of you may have already noticed that in my public git
tree, the "v5.12-rc1" tag has magically been renamed to
"v5.12-rc1-dontuse". It's still the same object, it still says
"v5.12-rc1" internally, and it is still is signed by me, but the
user-visible name of the tag has changed.

The reason is fairly straightforward: this merge window, we had a very
innocuous code cleanup and simplification that raised no red flags at
all, but had a subtle and very nasty bug in it: swap files stopped
working right.  And they stopped working in a particularly bad way:
the offset of the start of the swap file was lost.

Swapping still happened, but it happened to the wrong part of the
filesystem, with the obvious catastrophic end results.

Now, the good news is even if you do use swap (and hey, that's nowhere
near as common as it used to be), most people don't use a swap *file*,
but a separate swap *partition*. And the bug in question really only
happens for when you have a regular filesystem, and put a file on it
as a swap.

And, as far as I know, all the normal distributions set things up with
swap partitions, not files, because honestly, swapfiles tend to be
slower and have various other complexity issues.

The bad news is that the reason we support swapfiles in the first
place is that they do end up having some flexibility advantages, and
so some people do use them for that reason. If so, do not use rc1.
Thus the renaming of the tag.

Yes, this is very unfortunate, but it really wasn't a very obvious
bug, and it didn't even show up in normal testing, exactly because
swapfiles just aren't normal. So I'm not blaming the developers in
question, and it also wasn't due to the odd timing of the merge
window, it was just simply an unusually nasty bug that did get caught
and is fixed in the current tree.

But I want everybody to be aware of because _if_ it bites you, it
bites you hard, and you can end up with a filesystem that is
essentially overwritten by random swap data. This is what we in the
industry call "double ungood".

Now, there's a couple of additional reasons for me writing this note
other than just "don't run 5.12-rc1 if you use a swapfile". Because
it's more than just "ok, we all know the merge window is when all the
new scary code gets merged, and rc1 can be a bit scary and not work
for everybody". Yes, rc1 tends to be buggier than later rc's, we are
all used to that, but honestly, most of the time the bugs are much
smaller annoyances than this time.

And in fact, most of our rc1 releases have been so solid over the
years that people may have forgotten that "yeah, this is all the new
code that can have nasty bugs in it".

One additional reason for this note is that I want to not just warn
people to not run this if you have a swapfile - even if you are
personally not impacted (like I am, and probably most people are -
swap partitions all around) - I want to make sure that nobody starts
new topic branches using that 5.12-rc1 tag. I know a few developers
tend to go "Ok, rc1 is out, I got all my development work into this
merge window, I will now fast-forward to rc1 and use that as a base
for the next release". Don't do it this time. It may work perfectly
well for you because you have the common partition setup, but it can
end up being a horrible base for anybody else that might end up
bisecting into that area.

And the *final* reason I want to just note this is a purely git
process one: if you already pulled my git tree, you will have that
"v5.12-rc1" tag, and the fact that it no longer exists in my public
tree under that name changes nothing at all for you. Git is
distributed, and me removing that tag and replacing it with another
name doesn't magically remove it from other copies unless you have
special mirroring code.

So if you have a kernel git tree (and I'm here assuming "origin"
points to my trees), and you do

     git fetch --tags origin

you _will_ now see the new "v5.12-rc1-dontuse" tag. But git won't
remove the old v5.12-rc1 tag, because while git will see that it is
not upstream, git will just assume that that simply means that it's
your own local tag. Tags, unlike branch names, are a global namespace
in git.

So you should additionally do a "git tag -d v5.12-rc1" to actually get
rid of the original tag name.

Of course, having the old tag doesn't really do anything bad, so this
git process thing is entirely up to you. As long as you don't _use_
v5.12-rc1 for anything, having the tag around won't really matter, and
having both 'v5.12-rc1' _and_ 'v5.12-rc1-dontuse' doesn't hurt
anything either, and seeing both is hopefully already sufficient
warning of "let's not use that then".

Sorry for this mess,
             Linus


to post comments

A warning about 5.12-rc1

Posted Mar 4, 2021 20:33 UTC (Thu) by kunitz (subscriber, #3965) [Link] (26 responses)

Ubuntu's default install process creates a swapfile in the root directory. Apparently it is not a "normal" distribution.

A warning about 5.12-rc1

Posted Mar 4, 2021 22:22 UTC (Thu) by MatejLach (guest, #84942) [Link] (21 responses)

There are reasons I would dare to call Ubuntu not a 'normal' distribution, Snaps included, but this indeed isn't one of them.

A warning about 5.12-rc1

Posted Mar 5, 2021 0:16 UTC (Fri) by JMB (guest, #74439) [Link] (18 responses)

Ubuntu (better Debian, but for companies they tend to use Ubuntu LTS) is the desktop standard (not only but especially true for gaming).
But people knowing Linux will not use the automatic install (incl. changing the partitioning) - they will at least use an OS partition, a data partition and a swap partition for sure (even in mid 1990-ies) and most people I have seen won't use GNOME either - so not Ubuntu but one of its flavours is used. So there is not one Ubuntu version (even when fixing e.g. `Focal Fossa´ = 20.04 LTS) but a big variety of those ... and this is typical for Debian and its derivatives.
But nevertheless - if an option is possible, it should work. And as swap files do exist they have to work and of cause should be tested - otherwise it should be deprecated and deleted ... and I am sure there are use cases for swap files ... but not for a professional workstation or server ...
Personally, I need at least 3 days to fully install and configure a new system for my needs - with all automation, adjustments and settings ... this is not pre-installed trash you are not allowed to change - it is a professional environment - and you are well advised to make it fit to YOUR workflow and not feeling obliged to adopt the workflow of some strange developers.
If something is possible, someone may use it under GNU/Linux ... and this is an important ingredient of freedom!
And this is the freedom of the user!

A warning about 5.12-rc1

Posted Mar 5, 2021 12:12 UTC (Fri) by Creideiki (subscriber, #38747) [Link] (7 responses)

and I am sure there are use cases for swap files ... but not for a professional workstation or server ...
You never know when someone makes a questionable decision. For example, and as yet another reason why you really, really shouldn't do curl | sudo bash blindly, take a look at this install script for a security product, which says, in part:
if [ $? == 2 ]; then
    # In case you get "internal compiler error: Killed (program cc1plus)"
    # You ran out of memory.
    # Create some swap
    sudo dd if=/dev/zero of=/var/swap.img bs=1024k count=4000
    sudo mkswap /var/swap.img
    sudo swapon /var/swap.img
    # And compile again

A warning about 5.12-rc1

Posted Mar 5, 2021 12:56 UTC (Fri) by leromarinvit (subscriber, #56850) [Link]

This seems like a case of well-intentioned "automatic remediation" gone overboard. It's fine if the installer checks if the environment is suitable, it's also fine if it offers suggestions what to change if things go wrong. But changing a system setting (that's not directly required to run the program), without even so much as mentioning it, much less asking beforehand, is way out of line IMHO.

Someone should open an issue for this. I'm not really in a position to do it, because I've never even heard of this thing before, and a drive-by "your installer sucks" ticket seems kind of rude. But I really think they aren't doing their users a service by doing this.

A warning about 5.12-rc1

Posted Mar 5, 2021 16:57 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (3 responses)

Possibly dumb question: In what universe is ~2.8 floppy disks of extra swap space enough to fix an out of memory problem? Is this software from 1995 or something?

A warning about 5.12-rc1

Posted Mar 5, 2021 17:06 UTC (Fri) by rsidd (subscriber, #2582) [Link] (2 responses)

I calculate that at 4GB, quite a bit more than 2.8 floppy disks?

A warning about 5.12-rc1

Posted Mar 5, 2021 19:14 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (1 responses)

Gah, I just can't math. Also, the weird mixed-decimal-and-binary units inclined me to think of floppy disks.

A warning about 5.12-rc1

Posted Mar 6, 2021 7:12 UTC (Sat) by cesarb (subscriber, #6266) [Link]

If it makes you feel better, I had a similar thought at first. The 'k' at the end of the bs= is really easy to miss, especially when you're used to always specifying the bs= in bytes (and both bs=512 and bs=1024 are common values, making it even more likely to miss the 'k' in this case).

Having first used Linux in a computer where the size of the main memory was a low number of megabytes doesn't help; eight megabytes of swap was both a reasonable amount and a significant chunk of disk space.

A warning about 5.12-rc1

Posted Mar 7, 2021 19:19 UTC (Sun) by mss (subscriber, #138799) [Link]

One more reason why software maintainers should not try to implement their own installers (remember bumblebee install script?).

Installing, updating and removing packages on a system is a responsibility of distro's package management system.
Even Windows now tries to move to a centralized package management system with Microsoft Store.

A warning about 5.12-rc1

Posted Mar 17, 2021 1:29 UTC (Wed) by SteveClement (guest, #61839) [Link]

I actually wrote that script.

By chance and perhaps ignorance I missed that the target location is bad.

Thanks for pointing it out and opening an issue.

Steve

A warning about 5.12-rc1

Posted Mar 5, 2021 14:33 UTC (Fri) by hackan (guest, #145281) [Link] (4 responses)

Guys, Ubuntu is nowhere near this kernel! Neither is Debian (even sid is around 5.10), and both distros are actively maintained so that I seriously doubt a faulty kernel would make its way into them. Even more so, I have yet to see an RC kernel landing naturally in those repos.

So sit back and relax... 'Cause every little thing, is gonna be alright.

A warning about 5.12-rc1

Posted Mar 6, 2021 5:54 UTC (Sat) by tajyrink (subscriber, #2750) [Link] (3 responses)

The problem is tinkerers may install a new kernel manually from https://kernel.ubuntu.com/~kernel-ppa/mainline/ - an awesome service as such.

To clarify the situation otherwise a bit, Ubuntu LTS used to use partition still in 18.04. Those and older existing users that installed with default settings that have upgraded to eg 20.04 are still using a partition. New 20.04 desktop installations however use the file. Server might still default to partition, it's a separate installer.

Myself I've always used swap file everywhere on all distributions (Debian, SUSE, Ubuntu) as long as I've known about the possibility. I hate splitting things into partitions and then needing to at some point rethink if those were right size. Even in the 2000s I always thought also separate /home is just a mantra supposed to be the "right thing" while was just made on assumptions that do not matter to most people.

A warning about 5.12-rc1

Posted Mar 6, 2021 23:11 UTC (Sat) by technophobian (guest, #145315) [Link] (2 responses)

I feel like most tinkerers will opt for distros like Arch (in which case the more obvious option on the wiki is a swap partition) instead of Ubuntu, but either way you are always taking a risk when installing the latest stuff. I would only use rc1 if I am prepared for bugs.

A warning about 5.12-rc1

Posted Mar 7, 2021 17:09 UTC (Sun) by Gaelan (guest, #145108) [Link] (1 responses)

I imagine there are people (myself included, to be honest) who consider themselves "prepared for bugs" enough to try a new kernel, but wouldn't expect that to include "corrupt the file system." Perhaps they *should* have expected that, but that's besides the point.

A warning about 5.12-rc1

Posted Mar 8, 2021 8:58 UTC (Mon) by Kamiccolo (subscriber, #95159) [Link]

Yup, You're not the only one.

(o´・_・)っ

A warning about 5.12-rc1

Posted Mar 5, 2021 15:50 UTC (Fri) by zlynx (guest, #2285) [Link] (4 responses)

I think I know Linux pretty well. I've been using it since about 1997.

I built a new workstation PC recently with a Ryzen 3900 (now 5950), 64G RAM and an Optane boot drive. I installed Ubuntu 18.04 LTS and used all of the defaults.

As a result my Ubuntu 20.04 is using /swapfile on ext4 on one giant partition because I simply used the Ubuntu defaults. Including GNOME. Pretty much the only change I did was to make Wayland the default.

My opinion is that doing too much customization to a Linux install is a waste of time.

A warning about 5.12-rc1

Posted Mar 5, 2021 17:07 UTC (Fri) by rsidd (subscriber, #2582) [Link] (3 responses)

I would use zfs instead of ext4 for the /home partition. It's worth it. And, while I'm about it, would set up a separate swap partition.

A warning about 5.12-rc1

Posted Mar 5, 2021 17:38 UTC (Fri) by nivedita76 (subscriber, #121790) [Link]

If you use ZFS, you absolutely must use a swap partition. Swap files, or swap on zvol is prone to lockups.

A warning about 5.12-rc1

Posted Mar 5, 2021 20:00 UTC (Fri) by clump (subscriber, #27801) [Link] (1 responses)

Not going to touch ZFS on Linux until it gets upstream. I trust Oracle will do the right thing and fix the license, of course.

A warning about 5.12-rc1

Posted Mar 5, 2021 21:45 UTC (Fri) by smurf (subscriber, #17840) [Link]

Oracle? Doing the right thing? With licensing? Dream on.

A warning about 5.12-rc1

Posted Mar 5, 2021 16:03 UTC (Fri) by ecree (guest, #95790) [Link] (1 responses)

> There are reasons I would dare to call Ubuntu not a 'normal' distribution

Yeah, it has considerable excess kurtosis.

(I'll get my coat.)

A warning about 5.12-rc1

Posted Mar 5, 2021 16:39 UTC (Fri) by amacater (subscriber, #790) [Link]

Given the timings of Ubuntu LTS releases - isn't each release a "Poisson d'Avril" distribution anyway?

A warning about 5.12-rc1

Posted Mar 5, 2021 11:21 UTC (Fri) by gerdesj (subscriber, #5446) [Link] (2 responses)

I have just sampled a few of my Ubuntu servers - all use a partition. They were all installed from the minimal installer with defaults.

A warning about 5.12-rc1

Posted Mar 18, 2021 10:00 UTC (Thu) by mgedmin (subscriber, #34497) [Link] (1 responses)

I believe the default depends on whether you chose to use LVM or not. LVM systems get a swap partition; other systems get a swap file.

Here's a short ubuntu-devel@ thread from 2016 discussing this: https://lists.ubuntu.com/archives/ubuntu-devel/2016-Novem...

A warning about 5.12-rc1

Posted Mar 18, 2021 10:45 UTC (Thu) by zdzichu (subscriber, #17118) [Link]

Best information should jsut come from the source. Can someone provide a link to Ubuntu's installer preset / profile or it's source code?
Then we will know what to expect now, without resolving to 5 years old mail discussions.

A warning about 5.12-rc1

Posted Mar 6, 2021 8:34 UTC (Sat) by kunitz (subscriber, #3965) [Link]

Just for clarification: I'm talking about Ubuntu Desktop 20.04 LTS.

A warning about 5.12-rc1

Posted Mar 5, 2021 2:56 UTC (Fri) by rioting_pacifist (guest, #134765) [Link] (16 responses)

Pretty weak handling of this, I get that it's a bug that got through, but instead of spending most of the email justifying it getting through it would be nice if Linus said what is going to change to stop such a bug getting through.

If Linus still insists on not having public unit tests because they allow for lazy development, it would be good to at least hear that a private integration test will be added for swapfiles.

A warning about 5.12-rc1

Posted Mar 5, 2021 6:57 UTC (Fri) by adobriyan (subscriber, #30858) [Link] (5 responses)

I don't think there is even an interface to force swap a page.

A warning about 5.12-rc1

Posted Mar 5, 2021 10:01 UTC (Fri) by rioting_pacifist (guest, #134765) [Link] (4 responses)

Hibernation will always make use of swap no?

I assume hibernation is tested on an RC before it's released, just not using a swapfile

A warning about 5.12-rc1

Posted Mar 5, 2021 21:50 UTC (Fri) by smurf (subscriber, #17840) [Link] (3 responses)

You need to mount the file system to use a swap file. The hibernation code however didn't cleanly unmount it. Mounting replays the file system log – which the resumed system assumes to be non-replayed.

Bottom line: even if it seems to work (I don't know, not having tested that for exactly the above reason) … don't. Not if you want to keep your file system consistent, that is.

A warning about 5.12-rc1

Posted Mar 5, 2021 23:00 UTC (Fri) by mjg59 (subscriber, #23239) [Link] (2 responses)

The hibernation code supports being given a block device and the offset of the start of the swap file, so if you stash that before hibernation you don't need to mount the filesystem to get at it.

A warning about 5.12-rc1

Posted Mar 6, 2021 13:39 UTC (Sat) by geert (subscriber, #98403) [Link] (1 responses)

Which assumes the blocks allocated to the file are stored contiguously on the block device.
Should be true for swapfiles created during initial installation, but may not be true when creating a swapfile on a system that's been in use for a while.

A warning about 5.12-rc1

Posted Mar 6, 2021 16:19 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

> Which assumes the blocks allocated to the file are stored contiguously on the block device.
Nope. The hibernation code writes the location of the blocks in a "linked list". So the file doesn't have to be a contiguous.

A warning about 5.12-rc1

Posted Mar 5, 2021 8:01 UTC (Fri) by gregkh (subscriber, #8) [Link] (6 responses)

`make kselftest` seems to do what you want today, if there are any gaps the kernel developers are glad to take more tests.

A warning about 5.12-rc1

Posted Mar 5, 2021 10:01 UTC (Fri) by adobriyan (subscriber, #30858) [Link] (3 responses)

I'm talking about kernel interface. The basic test involves deterministic swapping of a page, reopening backing store and verifying that the page is more or less where it is supposed to be.

A warning about 5.12-rc1

Posted Mar 5, 2021 10:20 UTC (Fri) by gregkh (subscriber, #8) [Link]

There's nothing preventing someone from writing such a test, we have an in-kernel testing framework now, with more tests being written and merged using it every release. I'm sure we would be glad to take a patch adding such a test from you if you submit it.

A warning about 5.12-rc1

Posted Mar 8, 2021 15:02 UTC (Mon) by willy (subscriber, #9762) [Link] (1 responses)

No, the basic test is "check that swapping works". That's just mmap(MAP_PRIVATE) twice the amount of memory in the VM, write the address of the page to the first word of each page, then read the first word of each page to see if it matches.

If your filesystem had been destroyed at the end of it, the test failed.

A warning about 5.12-rc1

Posted Mar 8, 2021 15:11 UTC (Mon) by gregkh (subscriber, #8) [Link]

Detecting if a filesystem really has been changed is an exercise left for the reader :)

A warning about 5.12-rc1

Posted Mar 5, 2021 11:53 UTC (Fri) by rioting_pacifist (guest, #134765) [Link] (1 responses)

Thanks I'll take a look, my criticism was more of the comms though, obviously bugs get through, but blaming users or config options instead of saying what will be done to prevent it happening again is a bad way to address problems and it's disappointing to see Linus do it.

A warning about 5.12-rc1

Posted Mar 6, 2021 13:58 UTC (Sat) by khim (subscriber, #9252) [Link]

It's not “blaming the users”. It's mostly explaining who and when is affected. Pretty reasonable thing to do. Yes, there are also words which are designed to make developers who made that mistake feel better (also, actually, pretty nice thing to do, too).

But talking about future plans? At this stage? It would be like if Bush would have recalled all the heads of various departments early on September 11 and directed them to start planning USA Patriot Act. Instead of, you know, saving people which can still be saved and setting out fires.

I'm really glad Linus does not talk about it. That mail is not about future plans and shouldn't be about future plans.

A warning about 5.12-rc1

Posted Mar 6, 2021 13:48 UTC (Sat) by khim (subscriber, #9252) [Link] (2 responses)

> Pretty weak handling of this, I get that it's a bug that got through, but instead of spending most of the email justifying it getting through it would be nice if Linus said what is going to change to stop such a bug getting through.

I'm really, really, REALLY glad Linus said nothing about that. This shows me, yet again, that he is competent manager of that while process.

Why? Because “we have a mess, let's try to mitigate consequences of said mess” and “we had a mess, let's try to think about how to prevent it from happening again” are two entirely different things and you should never conflate them.

Fixing he mess involves using existing procedures to the fullest extent. hanging them on-the-fly tend to just make mess worse. Not something you need when things are already bad. And time is of utmost importance: then more you think about “proper” “long-term” plan the more peoples are affected.

Preventing mess from returning included careful changes to the procedures and planning. You don't want to have them changed to produce some other kind of mess, after all.

A warning about 5.12-rc1

Posted Mar 6, 2021 14:58 UTC (Sat) by Wol (subscriber, #4433) [Link] (1 responses)

Bear in mind this is an rc-1. Things are *EXPECTED* to break. If you run an rc1 on a system with anything important (yes I know people do extremely unwise things) then you are an idiot.

And at the end of the day, you cannot be expected to help idiots stop harming themselves. I notice Linus said "this is a heads-up for people who might rebase off of rc1". Isn't that something Linus objects to, rather strongly, to the extent he often bounces patches from people who do it?

In other words, this is normal breakage, that the process EXPECTS, there are procedures in place to handle it, and if idiots screw themselves then sorry, they get a Darwin award, or whatever it's called.

Cheers,
Wol

A warning about 5.12-rc1

Posted Mar 9, 2021 19:58 UTC (Tue) by khim (subscriber, #9252) [Link]

> If you run an rc1 on a system with anything important (yes I know people do extremely unwise things) then you are an idiot.

Lots of developers are doing that. Last bug which may silently corrupt your system happened… I don't even remember when was that. 10 years ago? Sure, lockups, hangups and many other things are expected. Silent corruption is not (well... if you don't use experimental filesystems, but that's another kettle of fish).

So taking it seriously was the right thing to do. Changes to the process… not sure. Precisely because last time it happened so long ago… maybe we should conclude that process is Ok, still.

A warning about 5.12-rc1

Posted Mar 5, 2021 12:23 UTC (Fri) by agruen (subscriber, #6613) [Link] (1 responses)

( BRANCH=HEAD;
  git merge-base --is-ancestor 48d15436fde6 $BRANCH &&
  ! git merge-base --is-ancestor caf6912f3f4a $BRANCH
) && echo 'please rebase onto f69d02e37a85 or later'

A warning about 5.12-rc1

Posted Mar 5, 2021 16:35 UTC (Fri) by willy (subscriber, #9762) [Link]

Don't we really want people basing on a (merge) commit from Linus that does not contain 48d15436fde6, eliminating the chance that a bisect ends up here? I'm sure there's a clever git command to find the merge that brought in 48d15436fde6, and from that it's trivial to choose the parent that doesn't contain it.

A warning about 5.12-rc1

Posted Mar 6, 2021 1:13 UTC (Sat) by bojan (subscriber, #14302) [Link] (1 responses)

Very interesting that the commit from 5.12 that fixes this (to my understanding) also landed in all other branches too. In some of them, there is an explanation that the exposure is reduced to only certain block devices, but some branches do not have this text.

Examples:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/li...

https://git.kernel.org/pub/scm/linux/kernel/git/stable/li...

Not sure what to make of it. Unless these are not relevant commits...

A warning about 5.12-rc1

Posted Mar 8, 2021 14:25 UTC (Mon) by nix (subscriber, #2304) [Link]

From my reading, they're relevant: the bug is old (2014) but only affected swapfiles on filesystems that are, frankly, quite unlikely to be used to house swapfiles (who on earth would put a swapfile on zmem?!) before a simplification in 2.12-rc made it affect, well, more or less everything. (Not that I can track down the latter change in five minutes of grepping around.)

A warning about 5.12-rc1

Posted Mar 7, 2021 10:00 UTC (Sun) by geuder (subscriber, #62854) [Link] (8 responses)

> because honestly, swapfiles tend to be slower

I remember having read that for while there has been absolutely no speed difference between swap in a filesystem and in a partition anymore. No idea how reliable that source was. Haven't checked the code, but I assume if it were guaranteed that the swap file would be contiguous, that should be possible. All you need is the block layer start address.

A warning about 5.12-rc1

Posted Mar 7, 2021 11:35 UTC (Sun) by zdzichu (subscriber, #17118) [Link] (2 responses)

You can bookmark Andrew Morton as reliable source:
http://lkml.iu.edu/hypermail/linux/kernel/0507.0/1690.html
See the last paragraph.
The email is from 2005. here are no performance differences between swap partitions and files since kernel 2.6.

A warning about 5.12-rc1

Posted Mar 8, 2021 8:52 UTC (Mon) by geuder (subscriber, #62854) [Link] (1 responses)

Ah, even that old. I thought it's just been a couple of years. Interesting that Linus claims the opposite.

(Not that it would have any practical meaning for me. No device except my phone swaps. And that uses zram.)

A warning about 5.12-rc1

Posted Mar 8, 2021 10:13 UTC (Mon) by Wol (subscriber, #4433) [Link]

Yup. There was a massive rewrite of the swap setup in the 2.4/2.6 era. I always thought it was 2.4.10-ish, maybe that's why I could never find anything to back me up, maybe it was 2.6.10.

Anyways, Linus took a look at the spaghetti that was the original swap code that had been there since forever, threw his toys out the pram, threw the code out of Linux, and waited for the dust to settle :-)

So we got a nice, clean, well-written and well-thought-out memory/swap system roundabout that point in time. It wasn't pretty while it was happening, though ...

Cheers,
Wol

A warning about 5.12-rc1

Posted Mar 7, 2021 13:22 UTC (Sun) by ailiop (subscriber, #128014) [Link] (4 responses)

Indeed there should be no performance difference between swapping to a blockdev vs to a contiguous swapfile on top of a fs.

During swapon on a file, the logical block address of that file is obtained from the filesystem and used to submit block io when needed. This means that after swapfile initialization, the filesystem is completely bypassed and the underlying block device is directly addressed (that's at least for local filesystems, it's different for e.g. nfs).

In case of fragmented files, the filesystem is still bypassed: during swapon the block ranges of all extent maps are obtained from the fs, and maintained in an in-memory rbtree within the swap code that is looked up every time swap needs to read from or write to the file. This may add some overhead, depending on how fragmented the file is (i.e. on the number of extents / height of tree).

The only other difference is that in case of swapfiles the request_queue to the underlying blockdev is shared between the fs and the swap bio submissions at the block layer.

A warning about 5.12-rc1

Posted Mar 8, 2021 9:02 UTC (Mon) by geuder (subscriber, #62854) [Link]

> The only other difference is that in case of swapfiles the request_queue to the underlying blockdev is shared between the fs and the swap bio

Ah, so while there is no difference in cycles, heavy filesystem acticity while swapping might lead to additional latency in the case of a swap file? If the block scheduling gives a swap partition good priority, which one would assume.

A warning about 5.12-rc1

Posted Mar 8, 2021 22:54 UTC (Mon) by neilbrown (subscriber, #359) [Link] (2 responses)

> Indeed there should be no performance difference between swapping to a blockdev vs to a contiguous swapfile on top of a fs.

While this is largely correct, it isn't quite the full story.

This only works when the filesystem provides a "bmap" interface, and doesn't provide a "swap_activate" interface.

Many local filesystems provide bmap - and so get good swap performance for free.
Network filesystems (NFS) and some local filesystems (btrfs, f2fs, xfs) provide swap_activate which effectively means that they take full responsibility for SWAP IO. Whether they then perform better or worse than the direct "bmap" approach I cannot say. All I know is that it is different code paths.

A warning about 5.12-rc1

Posted Mar 9, 2021 0:10 UTC (Tue) by ailiop (subscriber, #128014) [Link]

> This only works when the filesystem provides a "bmap" interface, and doesn't provide a "swap_activate" interface.

This doesn't affect the actual swap page IO performance during runtime (swap in/out) though, as it only pertains to the swapfile initialization phase. In both variants (bmap and swap_activate) the local filesystems simply provide the blockmaps of all the extents that make up the swapfile, which are fed into add_swap_extent() and maintained in the swap_info_struct/swap_extent_root rbtree.

In either case, it is the same swap code that submits IO directly to the underlying blockdev, and after initialization the filesystem is completely out of the way and unaware that the mapped file blocks are being modified under it.

NFS is unique in that it both implements swap_activate and swap IO always goes through it (via the direct_IO address space op), which is why I mentioned it is different.

A warning about 5.12-rc1

Posted Mar 11, 2021 4:11 UTC (Thu) by dgc (subscriber, #6611) [Link]

> Network filesystems (NFS) and some local filesystems (btrfs, f2fs, xfs) provide swap_activate
> which effectively means that they take full responsibility for SWAP IO. Whether they then
> perform better or worse than the direct "bmap" approach I cannot say. All I know is that
> it is different code paths

No, swap_activate does not mean the filesystems take responsibility for swap IO - all it changes is how the swap code maps the swapfile backing store into the swapfile's internal extent map. Both end up reporting contiguous regions of the file to the swapfile code via the add_swap_extent() function, hence there is no difference in performance between the two types of swapfile mapping mechanisms at all.

The difference is that the bmap method (generic_swapfile_activate()) only maps a block at a time and does not support files with unwritten extents. That means you can't do "fallocate 4g swapfile; swapon swapfile" because bmap will report the unwritten extents as holes in the file and so the swap code rejects those ranges as not usable. Hence to add a swapfile on a filesystem that only supports ->bmap you have to physically zero the file first. That's a problem if you are already in OOM conditions - the IO can push the system over the edge and/or take a long time to run and so the system goes off the cliff before you can activate the swapfile.

Being able to use fallocate to preallocate the swapfile means you can add tens of gigabytes of swapfile on filesystems like XFS in just a few milliseconds with minimal IO, CPU and RAM overhead and activate it straight away. This makes dynamic swapfile management (e.g. resizing) practical and much more useful compared to the old ->bmap based method for mapping that required physical zeroing before activation.

-Dave.


Copyright © 2021, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds