Not logged in
Log in now
Create an account
Subscribe to LWN
An "enum" for Python 3
An unexpected perf feature
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
PostgreSQL 9.3 beta: Federated databases and more
SparkleShare 1.0 released
Posted Dec 10, 2012 18:00 UTC (Mon) by oever (subscriber, #987)
The project bup deals with large files more efficiently and uses the same pack format as git with an extension (new blob type) that stores fragments of files instead of whole files. By using a rolling checksum algorithm, there is a lot of overlap in fragments for similar files.
Posted Dec 11, 2012 17:22 UTC (Tue) by drag (subscriber, #31333)
People probably expect it to work properly and 'do the right thing' with no manual intervention.
That is if they accidentally drag-n-drop a 5GB movie file to the sparkle client and don't notice that it took a few hours to upload they should be able to delete it or just leave it there without degradation in performance or git massively inflating the size of the repository to 10 or 15G when moving it around.
Posted Dec 13, 2012 8:36 UTC (Thu) by oever (subscriber, #987)
If the file system implements copy-on-write, then it is possible to keep a spare copy of the file relatively cheaply, however, git does not take advantage of that. On Linux, copy-on-write systems are common yet. Nevertheless, an optimization in git could be to keep large files unaltered in the .git repository with a side-car file.
The git repository will not inflate in size when moving the 5GB file around. But every time the file changes, then the storage requirement will grow by 5GB. Luckily large user files are usually media files and these do not change a lot. Using git for video or audio editing data is not a good idea. I have no idea how e.g. Box or Dropbox will deal with such data.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds