User: Password:
|
|
Subscribe / Log in / New account

The two sides of reflink()

The two sides of reflink()

Posted May 5, 2009 21:47 UTC (Tue) by martinfick (subscriber, #4455)
Parent article: The two sides of reflink()

Wouldn't such a modification be filesystem specific? Would this require all supported linux filesystems to be patched? What happens when say an EXT3 partition with files like this is used on a kernel which does not support this and one of the copies is written to?


(Log in to post comments)

The two sides of reflink()

Posted May 5, 2009 23:21 UTC (Tue) by adj (subscriber, #7401) [Link]

Not having read the patch (or any relevant standard document lately), but getting back ENOSYS in errno seems like a reasonable result. But I'm probably wrong.

The two sides of reflink()

Posted May 5, 2009 23:31 UTC (Tue) by martinfick (subscriber, #4455) [Link]

Uh, perhaps you are thinking that I asked what would happen if attempting to use reflink() on an unsupported FS or a kernel which does not support it. I was asking what would happen if someone tried to modify a file with a kernel that does not understand this block sharing (assuming it was created by one that does)? Would it end up simply overwritting the blocks on disk that belong to both copies effectively making it not COW, but just a plain hard link?

The two sides of reflink()

Posted May 6, 2009 0:05 UTC (Wed) by adj (subscriber, #7401) [Link]

Yeah, I did miss that. I'm going to have to imagine that a reflink-supporting ext3 filesystem would have a new feature bit set in the superblock. And hopefully would not be mountable on a non-reflink-supporting kernel. Two inodes sharing the same data blocks isn't something that any traditional UNIXy filesystem is going to understand. MS-DOS type filesystems surely don't support it either (cross linked files should sound familiar to anyone who used a DOS system in the 1980s or early 1990s.)

The two sides of reflink()

Posted May 6, 2009 0:09 UTC (Wed) by clugstj (subscriber, #4020) [Link]

How would you get into this situation? The call is implemented in the file system. If the file system doesn't support it, the file isn't shared.

The two sides of reflink()

Posted May 6, 2009 0:14 UTC (Wed) by martinfick (subscriber, #4455) [Link]

Yes, but I can run multiple kernels on the same machine (at different times). So if I create the shared file with a modern spiffy new kernel which supports this feature and then boot an older kernel and write to one of the files, what happens?

The two sides of reflink()

Posted May 6, 2009 0:49 UTC (Wed) by JoeBuck (guest, #2330) [Link]

You'd need flags in the filesystem that would prevent the filesystem from being mounted by a kernel that lacks the needed feature.

The two sides of reflink()

Posted May 6, 2009 5:59 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

Specifically the ext filesystem family contains two sets of bit flags for this purpose. Each bit represents a feature which a particular implementation may or may not be aware of. Implementations are supposed to check one set before attempting to mount the filesystem at all, and another set in addition if the mount is read-write.

It also contains per-inode flags, so that implementations can be warned that they're missing a feature needed to read or update a particular file, in this case the implementation should fail open() for that file.

Of course poor quality implementations from third parties may be missing some or all of these checks. Fortunately the worst implementation I'm aware of as of this moment is read-only, so any problems only occur when reading files with that implementation and if/when they reboot into Linux everything is fine again.

The two sides of reflink()

Posted May 6, 2009 19:31 UTC (Wed) by clugstj (subscriber, #4020) [Link]

You are f***ed. I would suggest you don't do something this crazy. It would not be trivial to add a feature like this to a file system and assure that the older version knows about it.

a reflink would be a new type of inode

Posted May 6, 2009 3:00 UTC (Wed) by xoddam (subscriber, #2322) [Link]

Hard links are not distinct inodes (as reflinks must be); rather they are multiple directory entries pointing to a single inode. Symlinks are (in most posixish filesystems) a special kind of inode. Reflinks will be yet another special kind of inode; if the filesystem code does not recognise an inode type it will return an error when you attempt to open it (maybe -ENOSYS, I'm not sure).

You could also specify flags in the superblock as others have suggested, so as to prevent a filesystem with reflinks from being mounted at all by a kernel which does not support them.

a reflink would be a new type of inode

Posted May 6, 2009 8:55 UTC (Wed) by epa (subscriber, #39769) [Link]

There seems to be some asymmetry. If you make another hard link to a file, then the two links are equal in status and you can't see which is the original. But a reflink is to be a special inode type and different somehow from the original version of the file.

a reflink would be a new type of inode

Posted May 6, 2009 9:46 UTC (Wed) by xoddam (subscriber, #2322) [Link]

Yes, like a symlink.

a reflink would be a new type of inode

Posted May 6, 2009 12:24 UTC (Wed) by corbet (editor, #1) [Link]

A reflink would be a new type of inode only in so far as the filesystem must track the fact that it has blocks shared with another inode. There is no difference, though, between an inode created by a reflink and the file's original inode; they both become reflink inodes. In Btrfs, I believe, things are even less different; the tracking of the shared blocks is done at the extent level.

a reflink would be a new type of inode

Posted May 6, 2009 16:07 UTC (Wed) by masoncl (subscriber, #47138) [Link]

It is more accurate to say (for both btrfs and ocfs2) that the result of the reflink is an entirely new file. It has a known starting point (the contents and permissions of the original).

The two files can be changed independently without affecting each other. One could be deleted, truncated, expanded, chmoded, have new acls set, etc.

The actual block sharing is just an implementation detail...it could be implemented as a lazy copy for example.

a reflink would be a new type of inode

Posted May 6, 2009 19:40 UTC (Wed) by adj (subscriber, #7401) [Link]

That leaves the "link" part of the interface name sounding terribly misleading.

Surely there's a better name for a make-a-copy-of-this-inode-and-all-its-data-and-maybe-do-some-cool-COW-magic system call. It's too bad that the "dup" family of system call names is already used for something with a completely separate meaning.

madcow()?

Posted May 6, 2009 21:47 UTC (Wed) by AnswerGuy (subscriber, #1256) [Link]

magically-allocated-data-copy-on-write ... :)

a reflink would be a new type of inode

Posted May 7, 2009 10:12 UTC (Thu) by epa (subscriber, #39769) [Link]

A reflink would be a new type of inode only in so far as the filesystem must track the fact that it has blocks shared with another inode. There is no difference, though, between an inode created by a reflink and the file's original inode; they both become reflink inodes.
Ah, so it is symmetric, somewhat like making a hard link with 'ln'.

If one of the two reflink inodes is then removed, does the other one revert back to being a normal file? If not, is there any difference between a lone reflink inode and a normal one? Couldn't all files be reflinks?

a reflink would be a new type of inode

Posted May 7, 2009 22:59 UTC (Thu) by dlang (subscriber, #313) [Link]

I believe that one of the features of a reflink is that it tells you what else it's linked with, so that you can find it to break the COW

if this isn't the case, it should be, for just this reason.

as I understand things, if a file is changed it may break the linking entirely (copying the entire file), or it may break the link partially (still sharing the common parts of the file, but with the differences being separated) at the option of the filesystem

The two sides of reflink()

Posted May 9, 2009 14:25 UTC (Sat) by sdbrady (guest, #56894) [Link]

Hmm, surely you'd expect ENOTSUP if the filesystem doesn't support the operation, and ENOSYS if the kernel doesn't support it?


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds