This is obviously not random enough (binary and text behave differently). So I would propose having perhaps 512 different markers of 3 bytes length for an average block size of 32 KB. These blocks can then be deduplicated with little processing by determining the hash and perhaps length.
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds