User: Password:
Subscribe / Log in / New account

Desmond: Out-Tridging Tridge

Desmond: Out-Tridging Tridge

Posted Sep 2, 2013 19:53 UTC (Mon) by ttonino (subscriber, #4073)
In reply to: Desmond: Out-Tridging Tridge by ikm
Parent article: Desmond: Out-Tridging Tridge

When I heard about deduplicating in a way that is not block-oriented, I immediately thought about cutting the source into variable length blocks at a specific marker or set of markers. If the file is cut at a 2 byte sequence such as 0x3456, the average chunk size will be 64KB.

This is obviously not random enough (binary and text behave differently). So I would propose having perhaps 512 different markers of 3 bytes length for an average block size of 32 KB. These blocks can then be deduplicated with little processing by determining the hash and perhaps length.

(Log in to post comments)

Desmond: Out-Tridging Tridge

Posted Sep 3, 2013 4:28 UTC (Tue) by khc (guest, #45209) [Link]

In practice it works well and is fast (although you usually want to compare a full word and not 2 bytes).

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds