|
|
Log in / Subscribe / Register

Distributing filesystem images and updates with casync

Distributing filesystem images and updates with casync

Posted Jun 29, 2017 8:22 UTC (Thu) by mezcalero (subscriber, #45103)
Parent article: Distributing filesystem images and updates with casync

Quick correction: "…which is then recorded in an index together with the chunk size and a filename". That's not entirely correct. While the hash value and the chunk size are recorded, no filename is recorded. The idea of casync is after all to not care so much about file boundaries.

"Buzhash is a hash-based search algorithm". Well, buzhash is a hash function. People use it for different things, one prominent use-case being searching. But you can use it for other stuff as well, for example chunking. But saying that buzhash itself was a "search algorithm" isn't precisely right.

Thank you very much for putting together this article!

Lennart


to post comments

Distributing filesystem images and updates with casync

Posted Jun 29, 2017 13:42 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

Yeah. The rolling hash mechanism is an ingenious way to convert a hash or checksum (like, say, BuzHash or Adler32) into a content-sensitive chunk-boundary detector. It amazes me that, with this algorithm out there for decades, people are *still* writing deduplicators and other chunkers that just split into N-byte-sized blocks; a rolling hash is almost always a better way. (With a modern memory hierarchy it's nearly free, too -- you need to incur delays to get the thing you're working over into cache no matter what, and compared to that the overhead of rolling-hashing vanishes in the memory latencies.)

Distributing filesystem images and updates with casync

Posted Jun 29, 2017 17:00 UTC (Thu) by andrewsh (subscriber, #71043) [Link]

I guess it’s just because it’s a concept that’s a bit difficult to wrap your head around, so people just think ‘scrap it, I’m just going to split the input into the equally-sized bits and be done with it’.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds