New tricks for XFS

Posted Feb 22, 2018 20:19 UTC (Thu) by dcg (subscriber, #9198)
Parent article: New tricks for XFS

What about subvolumes inside of subvolumes? I guess they are possible, but it will require to coordinate ENOSPC management between all of them. I guess that too many embedded subvolumes will be a worst case for this model?

Also, what are your next plans for XFS - add some kind of internal raid capabilities, and then fully become a ZFS-like filesystem? :P

New tricks for XFS

Posted Feb 22, 2018 22:12 UTC (Thu) by nix (subscriber, #2304) [Link] (3 responses)

As far as I can see, you can make subvolumes and subsubvolumes and subsubsubvolumes in a single image (with one 'containing' and one 'contained' fs) by just CoWing the data making up the subvolume and asking the container to CoW the metadata (which to it is data). After all, you don't need to CoW entire files at once: you can CoW *bits* of them, so XFS can just CoW the pieces of itself that contain the metadata you are making a subvolume out of. I *think* you don't need multiple levels (which means you don't need mount rights or root privileges, unless it is decided that you need them in order to ask the containing fs to do a CoW) but they should work just as well -- you make a sparse image the size of the fs it's contained in, and allocations trickle out through all the layers, as do trim requests when deallocations happen (which turn that bit of the non-sparse file sparse again).

I am bubbling over thinking of things you can do with this. It's so powerful!

New tricks for XFS

Posted Feb 23, 2018 0:19 UTC (Fri) by nix (subscriber, #2304) [Link]

OK, having seen the talk, it's clear that what is currently implemented requires whole new fs images: the entire-subvolume is the unit of snapshotting. However... I don't see why that is necessarily the case. (Though having two files which are partially shared blocks and partially CoWed blocks, which would be a requirement for doing it this way, *is* genuinely new and might have to wait for xfs v6. Or v-infinity if the idea is simply insane. :) ).

New tricks for XFS

Posted Feb 23, 2018 15:47 UTC (Fri) by dcg (subscriber, #9198) [Link] (1 responses)

I am not sure I understand. What I was trying to say is that, as I understand it, there is a metadata structure that the copy-on-write mechanism can't share: space management. As you create subvolumes inside of subvolumes, each one of these "matryoshka" subvolumes will have to handle its internal space (eg the free space tree) in some way. At first, the CoW mechanism will share the blocks that store the free space information, of course. But once you start using the filesystems, they will differ quickly. For Nth subvolumes, you will have Nth free space trees for each one (plus the host).

Now let's imagine that you make a large file in the last subvolume. It will requires to allocate blocks for its image residing in the previous filesystem. If the allocation is big enough, it may require also to allocate blocks for that previous subvolume, and would require to allocate blocks for him too - all the way to the host filesystem. Allocation performance will suffer heavily. In short, XFS needs to replicate space management structures for each embedded subvolume (and potentially need to update *all* of them in some extreme cases). In ZFS/btrfs, space management is shared by all subvolumes so it does not have these problems.

This is what leads me to believe that excessive subvolume embedding would be a scalability issue for this model, but may be I'm not understanding it right

New tricks for XFS

Posted Feb 23, 2018 21:26 UTC (Fri) by nix (subscriber, #2304) [Link]

Agreed, that might be problematic if you have huge numbers of subsubsubvolumes -- but wouldn't that be problematical for other similar filesystems too? (The volume of log updates in all these layers is the other possible problem.)

(I'll admit that what I want isn't subvolumes at all -- it's snapshotting on a directory granularity level. No subvolumes, no notion of "this is like a smaller subfilesystem", just each subdir can be snapshotted and the snapshots can be forked freely. That's still, I think, something not that hard with a CoW tree fs but incredibly clunky with Dave's scheme. One filesystem image per directory? I don't think that'll scale too well... a complete log and fs structures per directory, uh, no. :) )