Poettering: Revisiting the fragmentation
Poettering: Revisiting the fragmentation
Posted Sep 2, 2014 21:35 UTC (Tue) by martin.langhoff (subscriber, #61417)In reply to: Poettering: Revisiting the fragmentation by mezcalero
Parent article: Poettering: Revisiting how we put together Linux systems
Is this up to date? https://btrfs.wiki.kernel.org/index.php/Deduplication -- it seems fairly limited. Yes, you can run "hardlink" style programs telling btrfs that they are dupes instead off hardlinking. However that does not scale very well at all: (a) to get savings across VMs/containers you need to see "everything", and (b) "everything" in a large system is far too many files to use this strategy.
Netapp filers have a fast-and-small hash for each block, computed and saved at write time, and use those to get a hint of dedupe candidates. This solves the issue of finding dedupe candidates across large volumes, without having a "user land" that "can see everything". Cost is ~7% slowdown in writes...