For projects where pretty pictures and movies are a big part of the data, it seems like any version control system really needs to handle large binaries.
So far I've heard two reasons why git is slow on binaries:
1. git normally rescans the entire file during operations like "git diff".
For huge binaries, this gets expensive.
I wonder if git could use the file's mtime to determine whether to scan it for changes. Or does it already?
2. The git format still has some limitations with large files.
Those seem fixable. I wonder if anyone is working on this.
Posted Apr 8, 2010 0:31 UTC (Thu) by dlang (✭ supporter ✭, #313)
[Link]
1. I believe that git will first compare the stored hash of the two files (actually of the two trees, so if the trees are the same it doesn't bother checking the individual files), only if that is different will it actually do the diff
2. This has been discussed, and most of the design work has been done for a couple of different possible solutions.
the first is to store the files separately and just have a reference of how to get the file inside the existing git records. the design work has been mostly done, but nobody has taken the time to code it (GSOC interest anyone ;-)
the second is to modify the pack format to handle larger things. There are people working on this, but since this would be a very invasive set of changes they are trying to anticipate all possible improvements and so it is moving very slowly
Support large repositories!
Posted Apr 8, 2010 16:35 UTC (Thu) by Spudd86 (guest, #51683)
[Link]
In fact I think it first checks the mtime of the file before even computing the hash