LWN.net Logo

SparkleShare 1.0 released

SparkleShare 1.0 released

Posted Dec 10, 2012 18:00 UTC (Mon) by oever (subscriber, #987)
In reply to: SparkleShare 1.0 released by xxiao
Parent article: SparkleShare 1.0 released

How sane do you want it? You can commit binary files and this was possible from the first version of git. git-annex moves the big files out of the git repository. This simply complicates things, since now your files are spread over two systems. If you are afraid that the repository will grow too large, then you can use a shallow checkout 'git clone --depth 1' or truncate the history of a repository.

The project bup deals with large files more efficiently and uses the same pack format as git with an extension (new blob type) that stores fragments of files instead of whole files. By using a rolling checksum algorithm, there is a lot of overlap in fragments for similar files.


(Log in to post comments)

SparkleShare 1.0 released

Posted Dec 11, 2012 17:22 UTC (Tue) by drag (subscriber, #31333) [Link]

> How sane do you want it?

People probably expect it to work properly and 'do the right thing' with no manual intervention.

That is if they accidentally drag-n-drop a 5GB movie file to the sparkle client and don't notice that it took a few hours to upload they should be able to delete it or just leave it there without degradation in performance or git massively inflating the size of the repository to 10 or 15G when moving it around.

SparkleShare 1.0 released

Posted Dec 13, 2012 8:36 UTC (Thu) by oever (subscriber, #987) [Link]

In the plain git implementation, if you add a 5GB file to git it will take some time to add it as git blob to the .git directory. It will be gzipped but since large files tend to be binary, it will still take up almost 5GB. That means that by simply adding the file to git, the disk usage is doubled. This is the only way which allows a user to restore the file even if the computer is offline.

If the file system implements copy-on-write, then it is possible to keep a spare copy of the file relatively cheaply, however, git does not take advantage of that. On Linux, copy-on-write systems are common yet. Nevertheless, an optimization in git could be to keep large files unaltered in the .git repository with a side-car file.

The git repository will not inflate in size when moving the 5GB file around. But every time the file changes, then the storage requirement will grow by 5GB. Luckily large user files are usually media files and these do not change a lot. Using git for video or audio editing data is not a good idea. I have no idea how e.g. Box or Dropbox will deal with such data.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds