The Btrfs inode-number epic (part 2: solutions)
The Btrfs inode-number epic (part 2: solutions)
Posted Aug 25, 2021 11:26 UTC (Wed) by taladar (subscriber, #68407)In reply to: The Btrfs inode-number epic (part 2: solutions) by mezcalero
Parent article: The Btrfs inode-number epic (part 2: solutions)
I also don't really see the use case outside of filesystems like btrfs which do everything differently mainly to be different. It is not as if they couldn't have split up the 64bit available to them into two numbers that would be more than enough for the numbers of files you find on a filesystem times the number of subvolumes.
Posted Aug 25, 2021 21:02 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Aug 26, 2021 1:21 UTC (Thu)
by zblaxell (subscriber, #26385)
[Link] (2 responses)
It might be harder than it looks? So far btrfs, xfs, bcachefs, zfs, and overlayfs have all not done this.
bcachefs seems to have painted itself into the same corner as btrfs: 32-bit subvol ID, 64-bit inode, making a snapshot duplicates all existing inode numbers in the subvol. XFS experimented with subvols, gave up, and now recommends bcachefs instead. ZFS duplicates inode numbers--despite using only 48 bits of ino_t--and apologizes to no one.
Overlayfs takes up at least one bit of its own, which can interfere with any other filesystem's attempt to use all 64 bits of ino_t (indeed the btrfs NFS support patch reserves some bits for that). Overlayfs only does that sometimes--the rest of the time, it lets inode numbers from different lowerdirs collide freely.
Posted Aug 26, 2021 1:52 UTC (Thu)
by neilbrown (subscriber, #359)
[Link] (1 responses)
One reason that it is harder for btrfs is that btrfs never reuses inode numbers (well ... almost never).
So if you create a snapshot every minute you'll use 24bits of subvolume numbers in 31 years - even if you only keep a few around.
How long do we expect a filesystem to last for? 348 years is probably unrealistic - is 31?
64bits allows you to create one every microsecond and still survive for half a million years. That is much easier for a filesystem developer to live with.
I would like btrfs to re-use the numbers and impose these limits. This is far from straight forward. It is almost certainly technically possible without excessive cost (though with a non-zero cost). But it can be hard to motivate efforts to protect against uncertain future problems (.... I'm sure there is a well known example I could point to...).
Posted Sep 7, 2021 14:11 UTC (Tue)
by nye (subscriber, #51576)
[Link]
It's hard to make a direct comparison given the fundamental differences in the model of subvolumes vs ZFS' various dataset types, but FWIW, I have a running system - at home, so not exactly enterprise scale - where the total number of ZFS snapshots that have been made across filesystems/volumes in the pool over the last decade is probably around 15 million. Getting pretty close to 24 bits.
I don't know enough about btrfs to know if the equivalent setup to those filesystems and volumes would be based on some shared root there and competing for inodes, or entirely separate. I guess what that boils down to is that I don't know if the rough equivalent to a btrfs filesystem is a ZFS filesystem or a ZFS *pool*. Either way, once you're used to nearly-free snapshots, you can find yourself using a *lot* of them.
Posted Aug 27, 2021 6:10 UTC (Fri)
by mezcalero (subscriber, #45103)
[Link]
Lennart
The Btrfs inode-number epic (part 2: solutions)
The Btrfs inode-number epic (part 2: solutions)
The Btrfs inode-number epic (part 2: solutions)
Nor does it reuse subvolume numbers.
If you create 100 new files per second, you'll use 40 bits of inode numbers in 348 years - no matter how many you keep.
These creation rates are high. Are they unrealistically high? Maybe.
If you were a filesystem developer, would you feel comfortable limiting subvolumes to 24bits and inodes to 40 bits?
The Btrfs inode-number epic (part 2: solutions)
The Btrfs inode-number epic (part 2: solutions)