|
|
Log in / Subscribe / Register

Integration into file formats.

Integration into file formats.

Posted Jan 17, 2026 22:45 UTC (Sat) by cesarb (subscriber, #6266)
In reply to: Integration into file formats. by martinfick
Parent article: Format-specific compression with OpenZL

> If the file/object data is already compressed when it is inserted, then it makes it much harder to perform any sort of cross file or version deltafication, such as what git can do. [...] Another problem you will encounter, perhaps even worse, is with content addressable object stores, here once again git comes to mind. Inserting already compressed data makes it almost impossible to improve upon the original compression, and thus freezes/osifies the compression since any hashes of the content would be performed on the compressed content instead of the raw data.

Funny you mention git. Very early in the git history, it worked exactly like that: the object identifier was the hash of the *compressed* data. See https://github.com/git/git/commit/d98b46f8d9a3daf965a39f8... ("Do SHA1 hash _before_ compression.") and https://github.com/git/git/commit/f18ca731663191477613645... ("The recent hash/compression switch-over missed the blob creation."), where it was changed to the current behavior of using the hash of the *uncompressed* data.


to post comments


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds