User: Password:
|
|
Subscribe / Log in / New account

The two sides of reflink()

The two sides of reflink()

Posted May 5, 2009 21:12 UTC (Tue) by flewellyn (subscriber, #5047)
Parent article: The two sides of reflink()

I'm a little fuzzy on what benefit such a system call has in the first place. Has this been covered previously?


(Log in to post comments)

The two sides of reflink()

Posted May 5, 2009 21:21 UTC (Tue) by martinfick (subscriber, #4455) [Link]

Copy on write has huge benefits in space savings, not just in disk space, but more importantly in memory, particularly for virtualized systems. For example, the vserver project already implements a solution to this which allows many virtual servers to share the same files securely. This means that if you have 1000 servers running the same copy of apache, not only can you have only one copy on disk, but the kernel will also only keep one copy in memory (of the shared stuff like program text, of course). While you could achieve a similar sharing with hard links, this would be less secure since a breach in one system would allow the file to be modified in all the other systems. With COW, this is avoided.

The two sides of reflink()

Posted May 5, 2009 21:29 UTC (Tue) by flewellyn (subscriber, #5047) [Link]

I see. That IS beneficial. Thanks very much.

The two sides of reflink()

Posted May 6, 2009 14:25 UTC (Wed) by wilreichert (subscriber, #17680) [Link]

How is this different from deduplication at the filesystem level?

The two sides of reflink()

Posted May 6, 2009 15:09 UTC (Wed) by dlang (subscriber, #313) [Link]

it sounds like it's one mechanism to use for deduplication.

The two sides of reflink()

Posted May 6, 2009 17:03 UTC (Wed) by elanthis (guest, #6227) [Link]

To the filesystem, a cp isn't a copy -- it's one process reading from one file and writing to another. Figuring out that that is supposed to be a copy is very non-trivial and expensive, especially when taking into account metadata operations which aren't part of the regular file stream. I'm not sure it's even plausible to do without a second pass, e.g. a "combine files" daemon, which would still just be extra overhead.

If on the other hand cp says "this is a copy" to the kernel then the filesystem can just do the right thing. Of course, other applications will need to be modified to take advantage of the new feature, but such is the truth of most progress.

The two sides of reflink()

Posted May 7, 2009 21:03 UTC (Thu) by anton (subscriber, #25547) [Link]

For shared stuff like program text, all servers could use the same binaries (through mount, mount -t bind, or hard links), so that's not a good justification for reflinks, either (and if you don't trust the other servers not to write to the file, why would you trust them with access to the device at all?). Writable files that would mostly or completely be the same on both VMs would be a better example, but no concrete example comes to my mind.

The two sides of reflink()

Posted May 7, 2009 21:13 UTC (Thu) by martinfick (subscriber, #4455) [Link]

"why would you trust them with access to the device at all?"

You don't, usually the host system mounts a portion of the filesystem into a separate chroot for each guest server. The guests typically then have a limited root capability that does not included making device nodes so they really do not have access to the device, only the filesystem.

The two sides of reflink()

Posted May 10, 2009 18:42 UTC (Sun) by anton (subscriber, #25547) [Link]

The guests typically then have a limited root capability that does not included making device nodes so they really do not have access to the device, only the filesystem.
With the limits on the root capabilities, the binaries can surely be made read-only even for the guest roots, so no reflinks are needed for the binaries.

The two sides of reflink()

Posted May 10, 2009 19:09 UTC (Sun) by martinfick (subscriber, #4455) [Link]

    The guests typically then have a limited root capability that does not included making device nodes so they really do not have access to the device, only the filesystem.
With the limits on the root capabilities, the binaries can surely be made read-only even for the guest roots, so no reflinks are needed for the binaries.

Sure, but if you make the binaries read only you no longer have independent guest systems that can be administered without knowledge of the host or other guests. In other words, if I now want to upgrade the apache server in one guest, I can't since the binary is read only to my guest root user. With COW, no problem, as a guest admin I do not even know that my apache binary is shared with others. It is only relevant to the host (the host unifies the various guest binaries, not the guest).

The two sides of reflink()

Posted May 6, 2009 14:33 UTC (Wed) by MarkWilliamson (subscriber, #30166) [Link]

Some more possible uses:

Those folks who are fortunate enough to have their home directory on a netapp filer
have for years been able to "cd ~/.snapshot/" and find a special directory of
historical versions of their files. These are stored efficiently because of the nature of
the WAFL filesystem. With reflink, it would be possible to create a lightweight
version of historical snapshotting: you'd have a daemon run every night (for
instance) and recursively reflink the current state of all your files into a directory
tree at ~/.old-versions/<date>/ - then, if you ever needed to go back to an old
version of a file you could just look in there.

With reflinks this would be very fast and would not use up loads of disk space
(though there would still be quota concerns). It would make time-machine or Netapp
.snapshot-like functionality easy to implement efficiently on single disk systems.
Probably the most quoted reference for stuff like this is the Elephant research
filesystem, about which there are a number of decent research papers.

Another use that I've seen mentioned is the ability to make checkouts / clones in
moderen version control systems go faster and be more lightweight in terms of disk
storage - for instance, cloning a git repository could transparently share all the
underlying data (including the working directory!) using reflinks. Similar tricks being
possible for the other VCSes.

Finally, you could probably have a daemon that rummages around the system, finds
identical files and unifies them on disk using reflink in order to save space.

Loads of cool stuff :-)

The two sides of reflink()

Posted May 6, 2009 14:34 UTC (Wed) by MarkWilliamson (subscriber, #30166) [Link]

Ugh, what happened to my line endings? :-( Maybe my browser did something evil ...
somehow.

The two sides of reflink()

Posted May 6, 2009 16:38 UTC (Wed) by cdarroch (subscriber, #26812) [Link]

Yes -- that .snapshot directory is incredibly convenient. Deleted a file by accident? No problem; there's an hourly backup in .snapshot. Rogue program deleted 1 TB of data overnight? Just reach into .snapshot and pull it all out again. Having equivalent functionality on non-NetApp hardware would awfully nice.

The two sides of reflink()

Posted May 6, 2009 16:59 UTC (Wed) by MarkWilliamson (subscriber, #30166) [Link]

Indeed.

rdiff-backup (http://www.gnu.org/savannah-checkouts/non-gnu/rdiff-backup/) gives
somewhat similar snapshotting convenience but you have to interact with it through a
command line app. Also, it does use up extra space (although if you're backing up to
another machine / another drive for redundancy then that's just fine!).

archfs (http://code.google.com/p/archfs/) provides a Fuse interface to browse rdiff-backup
repositories. Last time I tried it it wasn't really suitable for large repositories but this may
have been fixed since then. rdiff-backup's page on related info has some other solutions:
http://www.gnu.org/savannah-checkouts/non-gnu/rdiff-backu...

.snapshot is a very nice user interface to have to old revisions.

The two sides of reflink()

Posted May 9, 2009 5:07 UTC (Sat) by TRS-80 (subscriber, #1804) [Link]

rdiffWeb is a nice web interface to rdiff-backup. At work we're using rdiff-backup for weekly snapshots to complement our nightly amanda tape, a 1TB drive lasted us a year.

Line endings - make sure you select HTML not plain text, as the latter doesn't do wrapping for some reason.

The two sides of reflink()

Posted May 11, 2009 0:21 UTC (Mon) by vonbrand (guest, #4458) [Link]

Please don't.

I suffered through DOS's "you can undelete files whenever you fatfingered DEL". Most of the time it worked, but Murphy's Law ensured that when you really needed to get something back, it would usually be gone for good. Unix' idea of "rm is final" is harsh, but you learn not to misplace stuff in the first place. Makes for a better experience in the long run.

The two sides of reflink()

Posted May 11, 2009 1:10 UTC (Mon) by MarkWilliamson (subscriber, #30166) [Link]

Netapp's .snapshot and the similar functionality reflinks can provide will give you semantics similar to a backup (a version of the file from a particular point in time, which will stay there until your backup regime removes it as too old). So it's a big improvement on DOS's "maybe you'll be able to grab the data back before the space is recycled by the filesystem". So it should at least have reliable, predictable semantics for things like accidental deletion.

Although in practice it's going to get used to undo rm occasionally, it seems to me only sensible to have something like this available so I'm able to roll back important documents and settings to previous states if I make the wrong modification, or if some program barfs over everything and corrupts things.

Users will probably have to be repeatedly reminded that, yes, they do need an independent backup on another disk somewhere because reflinks won't save you if your computer explodes. But most folks don't do proper backups *anyhow*, so I doubt it'll make that aspect of user behaviour much worse!


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds