User: Password:
Subscribe / Log in / New account

SCM innovation in git

SCM innovation in git

Posted Apr 21, 2005 12:03 UTC (Thu) by kevinbsmith (guest, #4778)
Parent article: A very quick guide to starting with git

Some folks may question the value of git, given that several free SCM tools already exist. I did at first, too. But increasingly it appears that Linus has managed to advance the state of the art in a few ways. It's too early to know for sure, but git appears to be an impressive piece of design work.

It relies on a simpler core than most SCM's, and has a very coherent philosophy. It fully separates the "engine" or "filesystem" layer from the user-oriented SCM layer. I expect there will be at least a couple front-ends to git (cogito being one), sharing compatible back-end data. The core design is very simple, making it easy for people to experiment with better merging algorithms and other SCM-level features.

One of the most interesting design choices is not to track file renames. At first, this seems like a step backward, since most of the other tools use rename tracking to support commands like "blame", where you can track the history of a piece of code, even if the module it is in has been renamed. If you don't track renames, aren't you back in the CVS world where history gets lost? Fortunately, it looks like that is not the case.

Linus claims that a rename (or move) is merely a special case of a more interesting problem: Text moving from one file to another. He points out that it is pretty common to cut and paste a function from one module to another. Or to split a file into two or three pieces. If you only track renames, you will lose history in any of those cases.

Instead, he has laid out the design of a tool that would allow you to point to some code in the current version, and track that code backward through time, even as it moved from file to file. Even better, this tool would not need any patch metadata to do its would rely solely on the tree snapshots (or diffs).

The trick is that when the tool wants to know where some text came from, it doesn't have to search the entire directory tree. It only has to search those files that have been touched by this changeset. So it can be fast, even without requiring any "hints" to be written at commit time.

Very cool! I have never heard of an SCM that does this. Certainly not any of the free distributed SCM systems.

(Log in to post comments)

SCM innovation in git

Posted Apr 21, 2005 15:05 UTC (Thu) by zooko (guest, #2589) [Link]

I look forward to seeing how these ideas pan out.

I haven't noticed any evidence that Linus, Pasky, or the other git developers had studied the other Free Software alternatives before they launched into git. Except for Linus's original post about having tried monotone and found it too slow.

It's entirely possible that git will advance the state of art, even if only because it is new, small, simple, and has The Linus Factor.

However, I am really curious, if only for historical reasons, what sort of research the git developers have done into the extant alternatives.

SCM innovation in git

Posted Apr 21, 2005 16:34 UTC (Thu) by bos (guest, #6154) [Link]

He's looked at arch (too slow, too fragile), darcs (too slow), and monotone (too slow). He hasn't said anything about other alternatives, like svk or bzr, but it's a reasonably safe bet they weren't overlooked.

All of the open source alternatives to date have, in any case, been so far behind BitKeeper in terms of functionality, stability and performance, it isn't even funny.

SCM innovation in git

Posted Apr 21, 2005 18:37 UTC (Thu) by flewellyn (subscriber, #5047) [Link]

I hear tell that the DARCS folks want to use git as their new backend.

SCM innovation in git

Posted Apr 21, 2005 20:47 UTC (Thu) by bos (guest, #6154) [Link]

They do indeed. As does Tom Lord, for arch.

SCM innovation in git

Posted Apr 28, 2005 20:46 UTC (Thu) by huaz (guest, #10168) [Link]

You know, he has used BitKeeper for 3 years.

Let's give Larry some credit.

SCM innovation in monotone

Posted Apr 22, 2005 7:31 UTC (Fri) by xoddam (subscriber, #2322) [Link]

it appears that Linus has managed to advance the state of the art in a few ways. It's too early to know for sure, but git appears to be an impressive piece of design work.

It's impressive design work because it's essentially the same design as monotone, just stripped back and implemented from scratch with simplicity and speed as primary objectives. Monotone didn't disregard either simplicity or speed, but recent work on the (cool, interesting, complicated) netsync protocol introduced a lot of redundant sanity checks which hit performance hard. Monotone is not inherently slower than git (which uses rsync, which is cool, interesting, complicated and already debugged), and the fundamental ideas are the same.

The fact that everyone now wants to use git as a backend is as much a tip of the hat to Graydon Hoare as to Linus Torvalds.

am I subscribed or not?

Posted Apr 22, 2005 7:42 UTC (Fri) by xoddam (subscriber, #2322) [Link]

I just resubscribed, I can read the Weekly Edition and post comments.
But I remain marked as a guest! I have logged out-and-in-again now.

SCM innovation in monotone

Posted Apr 22, 2005 12:48 UTC (Fri) by kevinbsmith (guest, #4778) [Link]

Absolutely, Graydon and the monotone folks deserve lots of credit. As do Tom Lord and the arch folks for their pioneering work with free distributed SCM. I'm sure there are many other people and projects whose ideas are present in git.

But you say git's design is merely "stripped back and implemented from scratch with simplicity and speed as primary objectives". In fact, simplicity is a *very* valuable design aspect, and one that few SCM architects have managed to achieve. In fact, simplicity is one of git's major innovations. I already mentioned a couple other unique design aspects of git so I won't repeat them here.

SCM innovation in git

Posted Apr 25, 2005 12:14 UTC (Mon) by karath (subscriber, #19025) [Link]

It is interesting how the git project has taken off. Obviously, one major contributing factor is that Linus Torvalds started it. But he also started sparse and, while it is successful [1], it has not taken off in the same way [2]. So there must be much more to git than just the name of the founder.

The fact that, so soon after inception, git will have to handle the patch flow of the Linux kernel project (which in volume terms has to be the biggest of any free or open source project) has several implications. There is the technical performance aspect. But also the human scalability - Linux is too big and too open to tolerate a centralised approach to commit privileges (and the resulting flame wars and forks over loss of commit privileges). And of course, Linus has some very definite (but evolving) opinions about what makes a good SCM and the processes that he wants around such an SCM.

Another interesting area is the separation of the core file system-like plumbing from front ends. git-pasky (soon to be cogito - "git inside") looks to be the primary front end but there are others, including several separate web front ends. However, while there are certainly some areas of policy that can be chosen by the front ends, given the core design choices embedded in the plumbing/file system like design of the git core, it is not clear how much the separation of policy and plumbing there is.

Already, at least two other SCM projects are seriously looking at some level of integration with git.

For darcs, <> announces a patch for "A darcs that can pull from git". The same message suggests that there is a further independent investigation of how to integrate darcs and git.

And Tom Lord, not often known for accepting other people's ideas about SCM, never mind their code, announced (see <>) that "'git' technology will form the basis of a new archive/revlib/cache format and the basis of new network transports".

While progress appears to have been very rapid, there have been some negatives for the early adopters. There have been two file format changes that have led to problems for some (one was to regularise date formats and the other changed the order of SHA-1 hashing and compression to improve performance).

And for the larger kernel process, there is a hiatus while the new system is being developed and matured [3]. But there is a good chance that adopting any of the existing SCM systems would have had a similar delay while the major protagonists got used to it. My memory is that (leaving aside the flame wars), there was a much longer period of transition when BK was first adopted.

There is an interesting message at <> where Linus suggests that a tool such as quilt may be more effective for the messy early stages of development of a patch (patchset). git keeps all of the history and does not allow retrospective editing. Therefore, the only way to cleanup the history of a patch is to export a patch from the dirty/messy git tree and then import into a clean tree [4]. So, despite, all the progress, it looks like there is still some way to go before there there can be a single integrated tool supporting the Linux kernel workflow.


[1] Successful in that it has driven several campaigns of type-checking cleanup in the kernel. I do not know if it is used outside the kernel (i.e. I'm too lazy to research this).

[2] For example, if my memory is correct, Linus suggested on a couple of occasions that he would like to see someone take sparse all the way through to actually generating runnable kernel code to "prove" that it was correct.

[3] It is interesting that transitioning of the SCM came hard on the heels of the "sucker" super-stable kernel tree process change. I suspect that when all has settled down, "history" will note this time as one of huge change in the Linux kernel process.

[4] There was a similar message from Larry McVoy last year, see <>, where he agreed that the quilt type workflow was valid and he did not yet know how to integrate it into BK.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds