A warning about 5.12-rc1

Posted Mar 8, 2021 22:54 UTC (Mon) by neilbrown (subscriber, #359)
In reply to: A warning about 5.12-rc1 by ailiop
Parent article: A warning about 5.12-rc1

> Indeed there should be no performance difference between swapping to a blockdev vs to a contiguous swapfile on top of a fs.

While this is largely correct, it isn't quite the full story.

This only works when the filesystem provides a "bmap" interface, and doesn't provide a "swap_activate" interface.

Many local filesystems provide bmap - and so get good swap performance for free.
Network filesystems (NFS) and some local filesystems (btrfs, f2fs, xfs) provide swap_activate which effectively means that they take full responsibility for SWAP IO. Whether they then perform better or worse than the direct "bmap" approach I cannot say. All I know is that it is different code paths.

A warning about 5.12-rc1

Posted Mar 9, 2021 0:10 UTC (Tue) by ailiop (subscriber, #128014) [Link]

> This only works when the filesystem provides a "bmap" interface, and doesn't provide a "swap_activate" interface.

This doesn't affect the actual swap page IO performance during runtime (swap in/out) though, as it only pertains to the swapfile initialization phase. In both variants (bmap and swap_activate) the local filesystems simply provide the blockmaps of all the extents that make up the swapfile, which are fed into add_swap_extent() and maintained in the swap_info_struct/swap_extent_root rbtree.

In either case, it is the same swap code that submits IO directly to the underlying blockdev, and after initialization the filesystem is completely out of the way and unaware that the mapped file blocks are being modified under it.

NFS is unique in that it both implements swap_activate and swap IO always goes through it (via the direct_IO address space op), which is why I mentioned it is different.

A warning about 5.12-rc1

Posted Mar 11, 2021 4:11 UTC (Thu) by dgc (subscriber, #6611) [Link]

> Network filesystems (NFS) and some local filesystems (btrfs, f2fs, xfs) provide swap_activate
> which effectively means that they take full responsibility for SWAP IO. Whether they then
> perform better or worse than the direct "bmap" approach I cannot say. All I know is that
> it is different code paths

No, swap_activate does not mean the filesystems take responsibility for swap IO - all it changes is how the swap code maps the swapfile backing store into the swapfile's internal extent map. Both end up reporting contiguous regions of the file to the swapfile code via the add_swap_extent() function, hence there is no difference in performance between the two types of swapfile mapping mechanisms at all.

The difference is that the bmap method (generic_swapfile_activate()) only maps a block at a time and does not support files with unwritten extents. That means you can't do "fallocate 4g swapfile; swapon swapfile" because bmap will report the unwritten extents as holes in the file and so the swap code rejects those ranges as not usable. Hence to add a swapfile on a filesystem that only supports ->bmap you have to physically zero the file first. That's a problem if you are already in OOM conditions - the IO can push the system over the edge and/or take a long time to run and so the system goes off the cliff before you can activate the swapfile.

Being able to use fallocate to preallocate the swapfile means you can add tens of gigabytes of swapfile on filesystems like XFS in just a few milliseconds with minimal IO, CPU and RAM overhead and activate it straight away. This makes dynamic swapfile management (e.g. resizing) practical and much more useful compared to the old ->bmap based method for mapping that required physical zeroing before activation.

-Dave.