|
|
Log in / Subscribe / Register

Merkle trees

Merkle trees

Posted Jun 29, 2017 16:45 UTC (Thu) by tau (subscriber, #79651)
Parent article: Distributing filesystem images and updates with casync

I was thinking about a similar problem recently. Why not use a Merkle tree store for disk images? Kind of like what dm-verity does, except use the Merkle tree for deduplication instead of integrity. After all, files (which may be present in many images within the store) within a filesystem tend to be aligned on block boundaries, and the leaves of the tree would successfully deduplicate no matter what order the files' underlying storage blocks are allocated in. The leaves within the store would also be aligned on disk block boundaries themselves, allowing for efficient read-only mounting of a virtual block device view in real time.

Was going to implement a prototype but I'm too busy these days.


to post comments

Merkle trees

Posted Jul 10, 2017 22:46 UTC (Mon) by helsleym (guest, #92730) [Link]

Just be careful about your block size. If the data you need to dedup is offset by 1, 2, ... blocksize-1 bytes then you're out of luck. The larger the block size the less likely you will be able to deduplicate more -- I think this is a corollary to Zipf's law. Yet higher block sizes reduce overhead of storing the merkle tree. Could still be interesting and worthwhile though!


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds