LWN.net Logo

de-duplication in filesystems

de-duplication in filesystems

Posted Jul 8, 2006 17:44 UTC (Sat) by giraffedata (subscriber, #1954)
In reply to: Interesting work - and some ideas for the future by dion
Parent article: The 2006 Linux Filesystems Workshop (Part III)

The tricky part of de-duplication is identifying the duplicate files.

Users today create multiple copies of files because it's easier than sharing. The idea of de-duplication is that the users maintain that ease, but get the benefits of sharing because the system stores only one copy anyhow.

The copy on write technology is pretty much the same as is used today for snapshot copies. But the identification of duplicate files (or, in some proposals, blocks) is something I have yet to see done with demonstrable gain.


(Log in to post comments)

de-duplication in filesystems

Posted Jul 10, 2006 18:58 UTC (Mon) by martinfick (subscriber, #4455) [Link]

Check out the vserver work on vhashify.

de-duplication in filesystems

Posted Jul 15, 2006 11:26 UTC (Sat) by nix (subscriber, #2304) [Link]

That's sort of similar, except I'm trying to work on the block level. The hardest part is arranging to detect cases, where, say, someone has a big text file and inserts one byte at the front of it: the rest should still be detected as a duplicate, even if the original file and the new file are not version-related (in which case detecting the duplicate is feasible), but doing that for arbitrary unrelated files without storing ridiculously many hashes is tricky. (More generally, modifications that are not multiples of a block size should not cause unmodified portions of duplicated files to be un-duplicated.)

de-duplication in filesystems

Posted Jul 22, 2006 3:43 UTC (Sat) by JumpJoe (guest, #39288) [Link]

Not sure what level the deduplication is being done however:
www.datadomain.com

Other companies are doing deduplication above the filesystem layer (CAS)

http://searchstorage.techtarget.com/originalContent/0,289...

Yes, it would be great to have a compression/deduplication built into a filesystem.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds