Yeah, I agree that dedupe is overrated, for most applications.
Also keep in mind that the more compressed and de-duped your data is, the more likely it is that you'll lose data when there's a hardware problem. Some filesystems, like HDFS, actually write out the data three times or more, which is a kind of anti-deduplication.
Posted Jan 23, 2012 14:29 UTC (Mon) by jezuch (subscriber, #52988)
[Link]
> Yeah, I agree that dedupe is overrated, for most applications.
On the other hand, cp --reflink is quite awesome.
> Also keep in mind that the more compressed and de-duped your data is, the more likely it is that you'll lose data when there's a hardware problem. Some filesystems, like HDFS, actually write out the data three times or more, which is a kind of anti-deduplication.
I guess that native RAID-ing in the filesystem is expected to offset this risk, in any "normal" situation at least.