The Btrfs inode-number epic (part 2: solutions)

Posted Aug 25, 2021 11:26 UTC (Wed) by taladar (subscriber, #68407)
In reply to: The Btrfs inode-number epic (part 2: solutions) by mezcalero
Parent article: The Btrfs inode-number epic (part 2: solutions)

To me having globally unique inode numbers just sounds like another way to uniquely fingerprint a system that we need to protect from anyone who might want to do that.

I also don't really see the use case outside of filesystems like btrfs which do everything differently mainly to be different. It is not as if they couldn't have split up the 64bit available to them into two numbers that would be more than enough for the numbers of files you find on a filesystem times the number of subvolumes.

The Btrfs inode-number epic (part 2: solutions)

Posted Aug 25, 2021 21:02 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

If you have access to read arbitrary inode numbers, then you can in practice uniquely identify the system by sampling a bunch of files in home directory. And even a single-file 64-bit inode is already likely to be reasonably unique.

The Btrfs inode-number epic (part 2: solutions)

Posted Aug 26, 2021 1:21 UTC (Thu) by zblaxell (subscriber, #26385) [Link] (2 responses)

> It is not as if they couldn't have split up the 64bit available to them into two numbers

It might be harder than it looks? So far btrfs, xfs, bcachefs, zfs, and overlayfs have all not done this.

bcachefs seems to have painted itself into the same corner as btrfs: 32-bit subvol ID, 64-bit inode, making a snapshot duplicates all existing inode numbers in the subvol. XFS experimented with subvols, gave up, and now recommends bcachefs instead. ZFS duplicates inode numbers--despite using only 48 bits of ino_t--and apologizes to no one.

Overlayfs takes up at least one bit of its own, which can interfere with any other filesystem's attempt to use all 64 bits of ino_t (indeed the btrfs NFS support patch reserves some bits for that). Overlayfs only does that sometimes--the rest of the time, it lets inode numbers from different lowerdirs collide freely.

The Btrfs inode-number epic (part 2: solutions)

Posted Aug 26, 2021 1:52 UTC (Thu) by neilbrown (subscriber, #359) [Link] (1 responses)

> It might be harder than it looks?

One reason that it is harder for btrfs is that btrfs never reuses inode numbers (well ... almost never).
Nor does it reuse subvolume numbers.

So if you create a snapshot every minute you'll use 24bits of subvolume numbers in 31 years - even if you only keep a few around.
If you create 100 new files per second, you'll use 40 bits of inode numbers in 348 years - no matter how many you keep.

How long do we expect a filesystem to last for? 348 years is probably unrealistic - is 31?
These creation rates are high. Are they unrealistically high? Maybe.
If you were a filesystem developer, would you feel comfortable limiting subvolumes to 24bits and inodes to 40 bits?

64bits allows you to create one every microsecond and still survive for half a million years. That is much easier for a filesystem developer to live with.

I would like btrfs to re-use the numbers and impose these limits. This is far from straight forward. It is almost certainly technically possible without excessive cost (though with a non-zero cost). But it can be hard to motivate efforts to protect against uncertain future problems (.... I'm sure there is a well known example I could point to...).

The Btrfs inode-number epic (part 2: solutions)

Posted Sep 7, 2021 14:11 UTC (Tue) by nye (subscriber, #51576) [Link]

> These creation rates are high. Are they unrealistically high? Maybe

It's hard to make a direct comparison given the fundamental differences in the model of subvolumes vs ZFS' various dataset types, but FWIW, I have a running system - at home, so not exactly enterprise scale - where the total number of ZFS snapshots that have been made across filesystems/volumes in the pool over the last decade is probably around 15 million. Getting pretty close to 24 bits.

I don't know enough about btrfs to know if the equivalent setup to those filesystems and volumes would be based on some shared root there and competing for inodes, or entirely separate. I guess what that boils down to is that I don't know if the rough equivalent to a btrfs filesystem is a ZFS filesystem or a ZFS *pool*. Either way, once you're used to nearly-free snapshots, you can find yourself using a *lot* of them.

The Btrfs inode-number epic (part 2: solutions)

Posted Aug 27, 2021 6:10 UTC (Fri) by mezcalero (subscriber, #45103) [Link]

You do realize all major Linux file system implementatios export a UUID identifying the file system universally? See blkid. If you are concerned about globally unique identifiers for the system: those existing ones are a lot better suited for that. Per-inode UUIDs are much less interesting for that.

Lennart