LWN.net Logo

de-duplication in filesystems

de-duplication in filesystems

Posted Jul 15, 2006 11:26 UTC (Sat) by nix (subscriber, #2304)
In reply to: de-duplication in filesystems by martinfick
Parent article: The 2006 Linux Filesystems Workshop (Part III)

That's sort of similar, except I'm trying to work on the block level. The hardest part is arranging to detect cases, where, say, someone has a big text file and inserts one byte at the front of it: the rest should still be detected as a duplicate, even if the original file and the new file are not version-related (in which case detecting the duplicate is feasible), but doing that for arbitrary unrelated files without storing ridiculously many hashes is tricky. (More generally, modifications that are not multiples of a block size should not cause unmodified portions of duplicated files to be un-duplicated.)


(Log in to post comments)

de-duplication in filesystems

Posted Jul 22, 2006 3:43 UTC (Sat) by JumpJoe (guest, #39288) [Link]

Not sure what level the deduplication is being done however:
www.datadomain.com

Other companies are doing deduplication above the filesystem layer (CAS)

http://searchstorage.techtarget.com/originalContent/0,289...

Yes, it would be great to have a compression/deduplication built into a filesystem.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds