User: Password:
|
|
Subscribe / Log in / New account

The two sides of reflink()

The two sides of reflink()

Posted May 6, 2009 14:33 UTC (Wed) by MarkWilliamson (subscriber, #30166)
In reply to: The two sides of reflink() by flewellyn
Parent article: The two sides of reflink()

Some more possible uses:

Those folks who are fortunate enough to have their home directory on a netapp filer
have for years been able to "cd ~/.snapshot/" and find a special directory of
historical versions of their files. These are stored efficiently because of the nature of
the WAFL filesystem. With reflink, it would be possible to create a lightweight
version of historical snapshotting: you'd have a daemon run every night (for
instance) and recursively reflink the current state of all your files into a directory
tree at ~/.old-versions/<date>/ - then, if you ever needed to go back to an old
version of a file you could just look in there.

With reflinks this would be very fast and would not use up loads of disk space
(though there would still be quota concerns). It would make time-machine or Netapp
.snapshot-like functionality easy to implement efficiently on single disk systems.
Probably the most quoted reference for stuff like this is the Elephant research
filesystem, about which there are a number of decent research papers.

Another use that I've seen mentioned is the ability to make checkouts / clones in
moderen version control systems go faster and be more lightweight in terms of disk
storage - for instance, cloning a git repository could transparently share all the
underlying data (including the working directory!) using reflinks. Similar tricks being
possible for the other VCSes.

Finally, you could probably have a daemon that rummages around the system, finds
identical files and unifies them on disk using reflink in order to save space.

Loads of cool stuff :-)


(Log in to post comments)

The two sides of reflink()

Posted May 6, 2009 14:34 UTC (Wed) by MarkWilliamson (subscriber, #30166) [Link]

Ugh, what happened to my line endings? :-( Maybe my browser did something evil ...
somehow.

The two sides of reflink()

Posted May 6, 2009 16:38 UTC (Wed) by cdarroch (subscriber, #26812) [Link]

Yes -- that .snapshot directory is incredibly convenient. Deleted a file by accident? No problem; there's an hourly backup in .snapshot. Rogue program deleted 1 TB of data overnight? Just reach into .snapshot and pull it all out again. Having equivalent functionality on non-NetApp hardware would awfully nice.

The two sides of reflink()

Posted May 6, 2009 16:59 UTC (Wed) by MarkWilliamson (subscriber, #30166) [Link]

Indeed.

rdiff-backup (http://www.gnu.org/savannah-checkouts/non-gnu/rdiff-backup/) gives
somewhat similar snapshotting convenience but you have to interact with it through a
command line app. Also, it does use up extra space (although if you're backing up to
another machine / another drive for redundancy then that's just fine!).

archfs (http://code.google.com/p/archfs/) provides a Fuse interface to browse rdiff-backup
repositories. Last time I tried it it wasn't really suitable for large repositories but this may
have been fixed since then. rdiff-backup's page on related info has some other solutions:
http://www.gnu.org/savannah-checkouts/non-gnu/rdiff-backu...

.snapshot is a very nice user interface to have to old revisions.

The two sides of reflink()

Posted May 9, 2009 5:07 UTC (Sat) by TRS-80 (subscriber, #1804) [Link]

rdiffWeb is a nice web interface to rdiff-backup. At work we're using rdiff-backup for weekly snapshots to complement our nightly amanda tape, a 1TB drive lasted us a year.

Line endings - make sure you select HTML not plain text, as the latter doesn't do wrapping for some reason.

The two sides of reflink()

Posted May 11, 2009 0:21 UTC (Mon) by vonbrand (guest, #4458) [Link]

Please don't.

I suffered through DOS's "you can undelete files whenever you fatfingered DEL". Most of the time it worked, but Murphy's Law ensured that when you really needed to get something back, it would usually be gone for good. Unix' idea of "rm is final" is harsh, but you learn not to misplace stuff in the first place. Makes for a better experience in the long run.

The two sides of reflink()

Posted May 11, 2009 1:10 UTC (Mon) by MarkWilliamson (subscriber, #30166) [Link]

Netapp's .snapshot and the similar functionality reflinks can provide will give you semantics similar to a backup (a version of the file from a particular point in time, which will stay there until your backup regime removes it as too old). So it's a big improvement on DOS's "maybe you'll be able to grab the data back before the space is recycled by the filesystem". So it should at least have reliable, predictable semantics for things like accidental deletion.

Although in practice it's going to get used to undo rm occasionally, it seems to me only sensible to have something like this available so I'm able to roll back important documents and settings to previous states if I make the wrong modification, or if some program barfs over everything and corrupts things.

Users will probably have to be repeatedly reminded that, yes, they do need an independent backup on another disk somewhere because reflinks won't save you if your computer explodes. But most folks don't do proper backups *anyhow*, so I doubt it'll make that aspect of user behaviour much worse!


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds