Yeah. In my experience Git doesn't handle huge trees very well. The Linux kernel isn't very big. For example, a company might have the kernel, glibc, and 30+ other large custom and open source libraries and applications in the same versioned tree, with tens or hundreds of thousands of revisions.
This isn't feasible with Git. I once let a git-svn clone run--over a LAN--for 2 days straight before giving up. Not just Git's fault, granted, but a problem all the same.
The Git solution is to use sub-modules. Whether this is better or worse is fairly debatable, but it's not practical to shift huge SVN trees over to Git.
Posted Apr 3, 2010 21:36 UTC (Sat) by maks (subscriber, #32426)
[Link]
git-svn import is not the way when switching from svn to git (Google how Gnome did it. :)
git svn is nice when you are stuck for whatever reason with svn and can commit to it and have allmost all power of git.
Subversion considered obsolete
Posted Apr 4, 2010 0:55 UTC (Sun) by jengelh (subscriber, #33263)
[Link]
>I once let a git-svn clone run--over a LAN--for 2 days straight before giving up. Not just Git's fault,
I don't see git at fault, but how SVN is organized/utilized is the bottleneck. If you had to make one HTTP request for every changed file in every revision (the repo reaching approximately 2 million objects this year), you may not be done downloading linux-2.6.git in two days either. You can easily test that.. unpack the packs into separate objects, create the http metadata and let it clone. Then again, you could probably pull it off, given git does not have to calculate any diffs.
Subversion considered obsolete
Posted Apr 4, 2010 3:38 UTC (Sun) by nbd (subscriber, #14393)
[Link]
The solution to that is to let one central server keep an up to date git svn clone around (run git svn rebase in a cron job).
Then you can use that to speed up clones for working on. I have a small script for cloning a git tree then adding the necessary information to sync against the svn server it was cloned from: http://nbd.name/gsc
Subversion considered obsolete
Posted Apr 4, 2010 6:59 UTC (Sun) by smurf (subscriber, #17840)
[Link]
Since when is git-svn's speed relevant for evaluating git? If nothing else, you import only once, then use the destination VCS for. the rest of the project's life (or until something better comes along :-p ).
Importing into git can be _fast_ and is limited (on the git side) by the write speed of your disk, and your CPU's zip and sha computing.
On the other hand, last time I checked git-svn essentially does a HTTP request for every revision it pulls. That's hardly fast under the best circumstances. Sorry to say, I have zero interest in finding out whether this could be sped up.
Subversion considered obsolete
Posted Apr 4, 2010 9:34 UTC (Sun) by epa (subscriber, #39769)
[Link]
I think you are right that because a git clone includes all history, it doesn't work well for really huge repositories. I did see a project (don't remeber its name) that let you use git in a more svn-like style, creating 'working copies' which do not include their own history, and committing from there back to a different repository.
Subversion considered obsolete
Posted Apr 4, 2010 11:16 UTC (Sun) by peschmae (guest, #32292)
[Link]
First of all you don't have to clone the entire history (see git-clone --depth)
Secondly the space used by git for storing the entire history is, in most cases, less than the space used by a working copy. i.e. for my linux kernel clone (with complete history going back 2.6.12) currently 400 MB for the .git directory as opposed to 450 MB for the actual checkout. Not really much of an issue in practice
Subversion considered obsolete
Posted Apr 4, 2010 17:25 UTC (Sun) by RCL (guest, #63264)
[Link]
Now try using gamedev repositories full of binary files (where code is a
tiny fraction of overall repo) and ... git can't even handle that (see my
thread of comments below).
SHA calculation is what kills it. As I wrote, I didn't succeed in creating
a 22GB (modest by gamedev standards) repo with git/bzr/hg...
Versioning really-big files
Posted Apr 4, 2010 20:33 UTC (Sun) by smurf (subscriber, #17840)
[Link]
Hmm. I can think of a simple way to fix the SHA1 problem (hash all (blockno-contents-of-block) tuples separately and XOR the result, or whatever; needs editor support to be effective).
The larger problem, however, is that you want a way to carry multiple versions of slowly-changing multi-GB files in your repo -- without paying the storage price of (a compressed version of) the whole blob, each time you check in a single-byte change. Same for network traffic when sending that change to a remote repository.
This is essentially a solved problem (rsync does it all the time) and just needs integration into the VCS-of-the-day. This problem is quite orthogonal to the question of whether said VCS-of-the-day is distributed or central, or whether it is named git or hg or bzr or whatever.
Yes, I know that the SVN people seem to have gotten this one mostly-right ("mostly" because their copy of the original file is not compressed). Hopefully, somebody will do the work for git or hg or whatever. It's not exactly rocket science.
Versioning really-big (binary) files
Posted Apr 6, 2010 18:47 UTC (Tue) by vonbrand (subscriber, #4458)
[Link]
git uses delta compression by default (and has done so for a long time now), so the "huge binary files that change a bit" shouldn't be a problem. Please check with the latest version.
Versioning really-big (binary) files
Posted Apr 6, 2010 23:12 UTC (Tue) by dlang (✭ supporter ✭, #313)
[Link]
the real problem is that in many cases when people say 'huge binary file that changes a bit' they really mean 'huge binary file where the meaning changes a little, but the actual file contents change a lot', usually due to a compression algorithm being used
even for images and audio, if you were to check them in uncompressed the git delta functionality would work well and diff the files against each other, but if you compress the file (jpeg, mp3, or even png) before checking it in, a small change to the uncompressed data results in a huge change to the compressed data. If it's a lossless compression (i.e. png) then it would be possible to have git uncompress it before checking for differences, but if it's a lossy compression you can't do this.
Versioning really-big (binary) files
Posted Apr 7, 2010 7:53 UTC (Wed) by paulj (subscriber, #341)
[Link]
The real problem is people thinking such files are suitable for checking
into an SCM. Just archive them somewhere.
Versioning really-big (binary) files
Posted Apr 12, 2010 1:14 UTC (Mon) by vonbrand (subscriber, #4458)
[Link]
Not really.
If the contents needs version control, it should be handled by a VCS. The size or format of the files could be a technical hurdle, sure; but it shouldn't be an excuse for not solving the problem.
Subversion considered obsolete
Posted Apr 4, 2010 20:55 UTC (Sun) by simlo (subscriber, #10866)
[Link]
Now you should not put (large) binaries into a VCS. VCS is for source code, not any form of compilation output. What you store in the VCS is a script which can pull the right, uniqely defined (version, sha1, etc.) binary from some server, or a build script, which produces the binary.
Where I work, we use subversion (and I use git svn :-). We have scripts which pulls the tar.gz files for various packages in specific versions from a server, unpacks, patches and crosscompiles them to our target. The only thing we have in subversion is the scripts and the files we have changed.
For the Linux kernel we tried to have the full thing in subversion, but it took way too much for subversion, so now we only have a makefile, which clones a git repository, when the source is needed.
Subversion considered obsolete
Posted Apr 5, 2010 3:12 UTC (Mon) by RCL (guest, #63264)
[Link]
Please understand that some software products rely on large amounts of data which is essentially a part of program and should be versioned together with source code.
It's almost like having very large firmware blobs in Linux kernels, much larger than they are currently...
See comments below where I elaborate on this interdependency between code and data in games.
Subversion considered obsolete
Posted Apr 5, 2010 14:00 UTC (Mon) by simlo (subscriber, #10866)
[Link]
It depends on wether you edit those data or not. Firmware files you don't edit. You grap a specific version, someone compiled for you, labeled and released.
On the other hand you can have data like maps and icons, which is also "source code" and belongs in the VCS if you edit them by using some program (some map editor, GIMP or whatever). But the overall system is badly designed if these files are "large". They ought to be seperated into small files, each containing seperate parts of the information and then "compiled" into larger files. This will usually make a more flexible and maintainable system (besides making life easier for the VCS). It is the same with C code: You don't make one big file but smaller ones, seperated by functionality.
Subversion considered obsolete
Posted Apr 5, 2010 3:22 UTC (Mon) by martinfick (subscriber, #4455)
[Link]
"We can do anything better than you"..."not X"..."you shouldn't do X", hmmm?
Subversion considered obsolete
Posted Apr 8, 2010 10:08 UTC (Thu) by epa (subscriber, #39769)
[Link]
It's remarkable how quickly 'my preferred tool is unable to do X' turns into the moral imperative 'you should not do X'.
Subversion considered obsolete
Posted Apr 5, 2010 12:12 UTC (Mon) by peschmae (guest, #32292)
[Link]
I agree, that in these cases (blobs that you'd like to have versioned along with your source code) git may not be the right choice for you. Still it does very well for large source code repositories.
Looks like you're stuck with perforce :-p
Subversion considered obsolete
Posted Apr 5, 2010 17:21 UTC (Mon) by chad.netzer (✭ supporter ✭, #4257)
[Link]
"git clone --depth" has some severe drawbacks, in that you cannot push or fetch from it. So any commits you make have to be made into patches and mailed to someone with a full repo (ie, a team of developers still all need full repos to work effectively with each other).
But yes, in practice, git stores source code repos very compactly, and by doing branch operations in place (rather than using separate directory copies for each "branch"), it uses much less space per client than SVN checkouts for a busy developer. And its also *much* faster for the same reason.
Subversion considered obsolete
Posted Apr 4, 2010 17:19 UTC (Sun) by lacostej (guest, #2760)
[Link]
* android uses git (under the hood). I don't see a problem with 'large'
trees. Maybe it isn't large enough for you. Seems to work for many.
* git-svn taking time is SVN's fault, not git. git svn is a bridge that
will check out the revisions using the svn client, one by one. As said
later in the comments, keep the git-svn clone on the server, done once
people will clone from that one instead.
* from my experience a full git history takes less space than the latest
svn checkout. And you can fully work offline, do branches, etc.
To me, SVN is an outdated technology. Git is harder to learn though and
people can (& do) make mistakes until they grasp properly the DVCS
concepts.
Note: hg is a compelling alternative and some people might want to look at http://hginit.com/ for introduction