The two sides of reflink()

Posted May 11, 2009 6:09 UTC (Mon) by nix (subscriber, #2304)
In reply to: The two sides of reflink() by giraffedata
Parent article: The two sides of reflink()

What? Directory entries and inodes aren't tied together in the fs model at
all, except that each directory entry increases i_nlink in the
corresponding inode by one. Reflinks simply would ensure that i_nlink was
*at least* one but would not increment it (probably by maintaining a
separate i_reflink count), and the semantics of unlink() would change to
ensure that a reflink()/unlink() sequence had the same (no) effect on link
count as link()/unlink().

You could no longer rely on unlink() decrementing i_nlink, but I don't
know of *anything* that depends on this (some things doubtless do but it
can't be common).

It breaks updating running software because that involves unlinking files
that are in use, and because the update process generally consists of
creating a file with a temporary name, filling it out, and rename()ing it
over the original (that's an implicit unlink right there, and it does not
fail). If you break that you break every package manager on the face of
the earth.

The two sides of reflink()

Posted May 11, 2009 7:37 UTC (Mon) by giraffedata (guest, #1954) [Link] (3 responses)

What? Directory entries and inodes aren't tied together in the fs model at all, except that each directory entry increases i_nlink in the corresponding inode by one.

That's a pretty tight bond, especially since i_nlink controls when the inode/file gets deleted. Also, you can't make the kernel create an inode without also creating a directory entry, and except temporarily, an inode cannot exist without at least one directory entry associated with it. Those are the bonds that it would be nice to get away from, as pretty much every OS except Unix does.

Reflinks simply would ...

We must be talking about different things. I was just talking about what Unix should do instead of what it always has (as a fundamental design point) done. Nothing to do with reflinks. And I'm also not claiming it would be compatible with any existing Unix application, but I do believe every application could be done at least as well with a kernel without automatic file deletion and directories.

The two sides of reflink()

Posted May 11, 2009 18:32 UTC (Mon) by nix (subscriber, #2304) [Link] (2 responses)

*And directories*? You're dreaming. Directories are in practice essential
for scalability. If they weren't in the kernel, they'd need to be in some
userspace library (ew).

The two sides of reflink()

Posted May 12, 2009 1:22 UTC (Tue) by giraffedata (guest, #1954) [Link] (1 responses)

If they weren't in the kernel, they'd need to be in some userspace library (ew).

They work better in user space -- there's more flexibility there and the basic concept of a directory has nothing to do with resource allocation between users, which is what the kernel is for. Many OSes do them outside the kernel. The only reason they have to be in the kernel in Unix is that the kernel deletes files implicitly based on directory references. And as I've been saying, we'd be better off without that.

The two sides of reflink()

Posted May 12, 2009 19:59 UTC (Tue) by nix (subscriber, #2304) [Link]

Putting directories outside the kernel also means that a whole pile of
things POSIX guarantees become, as near I can tell, impossible to provide.
I can't see any way to keep cross-directory rename() atomic, for instance.

Also it's a grotesque security hole: now you can't keep stuff secret by
hiding it in unreadable directories anymore.

Periodically there are proposals to introduce an open()-by-inode-number
syscall. They are always shot down. I don't know what sort of system
you're thinking of, but it isn't Unix.

(And if you're going to go that route, make the inums 1024 bits long and
bingo, you've got a capability-based system.)

The two sides of reflink()

Posted May 11, 2009 15:38 UTC (Mon) by butlerm (subscriber, #13312) [Link]

There is no practical way for a filesystem to implement "reflinks" such that
the reflink shares the same inode. The ownership, permissions, and file data
of both the original file and the new file all have to be modifiable
independently. To make any sense, they would also need separate inode
numbers.