chunked files
chunked files
Posted Dec 12, 2018 0:22 UTC (Wed) by anarcat (subscriber, #66354)In reply to: splitting the large CVE list in the security tracker by JoeBuck
Parent article: Large files with Git: LFS and git-annex
This didn't make it to the final text, but that's something that could be an interesting lead in fixing the problem in git itself: chunking. Many backup software (like restic, borg and bup) use a "rolling checksum" system (think rsync, but for storage) to extract the "chunks" that should be stored, instead of limiting the data to be stored on file boundaries. This makes it possible to deduplicate across multiple versions of the same files more efficiently and transparently.
Incidentally, git-annex supports bup as a backend. And so when I asked joeyh about implementing chunking support in the git-annex backend (it already supports chunked transfers), that's what he answered of course. :)
That would be the ultimate git killer feature, in my opinion, as it would permanently solve the large file problem. But having worked on the actual implementation of such rolling checksum backup software, I can tell you it is *much* harder to wrap your head around that data structure than git's more elegant design.
Maybe it could be a new pack format?
      Posted Dec 14, 2018 1:13 UTC (Fri)
                               by pixelpapst (guest, #55301)
                              [Link] 
       
The chunking approach and on-disk data structure seem solid; git would probably use a standard casync chunk store, but a git-sprecific index file. 
(Just for giggles, I've been meaning to even evaluate how much space would be shared when backing up a casync-ified .git directory (including its chunk store) and the checked-out objects to a different, common casync chuck store.) 
I cannot wait to see to what new heights git-annex would grow in a world where every ordinary git user already had basic large-file interoperability with it. 
(Anarcat, thank you for educating people about git-annex and all your documentation work.) 
     
    chunked files
      
 
           