Git approaches 1.0

[Posted July 27, 2005 by corbet]

On April 5, 2005, it was announced that BitMover would "focus exclusively" on its commercial BitKeeper offering and withdraw the free-beer client used by a number of free software developers. This was a nervous moment; BitKeeper had become an integral part of the Linux kernel development process. Nobody wanted to go back to the old days - when no source code management system was used at all - but there was no clear successor to BitKeeper on offer.

And where might such a successor have been expected to come from? We had been told many times that the development of BitKeeper required numerous person-years of work and millions of dollars of funding. The free software community was simply not up to the task of creating a tool with that sort of capabilities - especially not in a hurry. The kernel development community, having lost a tool it relied upon heavily, appeared doomed to a long painful period of adjustment.

Two full days later, Linus announced the first release of a tool called "git." It was, he said, "_really_ nasty," but it was a starting point. On April 20, fifteen days after the withdrawal of BitKeeper, the 2.6.12-rc3 kernel prepatch, done entirely with git, was released. The git tool, in those days, was clearly suitable only for early adopters, but, even then, it was also clearly going somewhere.

Git brings with it some truly innovative concepts; it is not a clone of any other source code management system. Indeed, at its core, it is not really an SCM at all. What git offers is a content-addressable object filesystem. If you store a file in git, it does not really have a name; instead, it can be looked up using its contents (as represented by an SHA hash). A hierarchical grouping of files - a particular kernel release, for example - is represented by a separate "tree" object listing which files are part of the group and where they are to be found. Files do not have any history - they simply exist or not, and two versions of the same file are only linked by virtue of being in the same place in two different tree objects.

This way of organizing things is hard to grasp, initially, but it makes some interesting things possible. One of the harder problems in many SCM systems - handling the renaming of files - requires no special care with git. A single git repository can hold any number of branches or parallel trees without confusion. File integrity checking is built into the basic lookup mechanism, so that corruption will be detected automatically, and, if desired, kernel releases can be cryptographically signed easily. Perhaps most importantly, however: git made certain options, such as the merging of patches, very fast.

It's worth noting that git is not a clone of BitKeeper, or of any other SCM. Certainly it incorporates lessons learned from years of use of BitKeeper and other tools; it supports changesets, for example, and is designed to be used in a distributed mode. But git is something new, it brings a unique approach to the problem.

Watching the git development process snowball over the last few months has been fascinating. A large and active development community coalesced around git in short order; interestingly, relatively few of the core git developers were significant kernel contributors. In a short period of time, git has acquired most of the features expected from an SCM, its rough edges have been smoothed, it has picked up a variety of graphical interfaces, and it is widely used in the kernel development community. Git is clearly a success.

The git developers are now working toward a 1.0 release. As part of that process, Linus has now handed git over to a new maintainer: Junio Hamano. Junio has been an active git developer for some time; he will now attempt to take the project forward as its leader. He will have plenty of work ahead of him as git moves into a more stable (though still fast-moving) phase.

Git is an example of how well the free software process can work. Linus has shown us, once again, that he knows how to get a successful free software project started: put out a minimal (but well thought out) core that begins to solve a problem, then let the community run with it. The result is a vibrant, living project which incorporates the best of what has been learned before while simultaneously breaking new ground. The creator of the Linux kernel appears to have launched another winner.

But, then, some things still seem to surprise even Linus:

August 25, 1991	July 26, 2005
"I'm doing a (free) operating system (just a hobby, won't be big and professional like gnu) for 386(486) AT clones."	"...this thing ended up being a bit bigger and more professional than I originally even envisioned."

Let this be a lesson to all free software developers out there: the humblest of projects can, with the right ideas and participation, become far more "big and professional" than one might ever imagine.

Index entries for this article
Kernel	Development tools/Git
Kernel	Git

And don't forget Mercurial

Posted Jul 28, 2005 15:00 UTC (Thu) by bos (guest, #6154) [Link] (1 responses)

Jonathan, perhaps you could do an article on Mercurial, since it's the other major distributed revision control system to have appeared since BK became unavailable to non-paying-customers.

It doesn't have the same size of galloping horde behind it that git does, but it has a devoted following in the kernel development community, and is seeing intense interest outside that sphere.

But I'm biased, because I work on it a lot :-)

http://www.selenic.com/mercurial
http://www.serpentine.com/mercurial

And don't forget Mercurial

Posted Jul 28, 2005 15:10 UTC (Thu) by corbet (editor, #1) [Link]

It's on my list. I went to Matt's session at OLS, and I'm meaning to play with it some, when I get a chance...

Renames

Posted Jul 28, 2005 15:07 UTC (Thu) by vmole (guest, #111) [Link] (2 responses)

One of the harder problems in many SCM systems - handling the renaming of files - requires no special care with git.

That's because git completely ignores the problem. Renaming "foo" to "bar" is treated exactly the same as deleting "foo" and then creating "bar". By this definition cvs handles renames just fine, too. Linus handwaves this problem by saying you can compare content of "foo" and "bar" and guess that the transition was a rename. I personally don't want my SCM to be "guessing" about what has happened.

Which is not to diss git. It does what Linus wants it to do, and does it very quickly. It's an impressive piece of work, and especially so considering the timeframe. But it's not suitable for every development project or style.

Renames

Posted Aug 1, 2005 17:17 UTC (Mon) by bronson (subscriber, #4806) [Link] (1 responses)

It's a little more subtle... During development, files are always being created and destroyed and great swaths of code moved between them. The filename is just a temporary label. It's the content that is key.

Git will tell you "File A in tree 1 is 78% the same as File B in tree 2." The developer then knows that file B derives heavily from file A. One day git will also be able to tell you that "File C is 95% the same as parts of file A, and file D is 98% the same as parts of file A." This makes it pretty clear that file A was split into files C and D. Git just follows the content, no more, no less.

"Sure," you say, "svn mv and cp can show this and it's much easier to use!" (or Arx or insert favorite CMS here) So let's consider more real-world problems. What if you scatter the functions in file A across 5 different files, 3 of which already exist. Consider, for instance, the great USB reorg. Git still happily tells you exactly what happened, whereas file-based CMSes fall flat or, at the very least, need a colossal amount of hand-holding. Git encourages broad refactoring. By locking the filename to particular content, other CMSes tend to discourage it.

Git tells you _exactly_ what happened. Where did you get the idea that it guesses?

"But it's not suitable for every development project or style."

I doubt anybody would disagree with this!

Renames

Posted Aug 2, 2005 15:58 UTC (Tue) by karath (subscriber, #19025) [Link]

I am interested to know how GIT does the fragment tracking as described in the previous message. I have followed the GIT mailing list closely and have seen a hint from Linus that fragment tracking is what he sees a need for in the future.

However, my understanding is that, while GIT is layered on a content addressable "filesystem", the content addressing system used is the SHA1 sum of the entire content of the file. So, without specific tools that seach for fragment matches in different files, I cannot not see how GIT does fragment tracking.

BTW, GIT now has commands to explicitly track renames.

regards,
Charles

Git developers aren't kernel developers?

Posted Aug 4, 2005 7:24 UTC (Thu) by Wol (subscriber, #4433) [Link]

A large and active development community coalesced around git in short order; interestingly, relatively few of the core git developers were significant kernel contributors.

That shouldn't actually be a surprise. Like many people, I like to follow kernel development (but, as I don't run linux much :-( I don't really contribute).

But what I'm really involved in, and would be more involved in if I could find the time, is database development. It's what I do professionally (use and program databases) and, inasmuch as I do Free Software development, it's database engine stuff.

Cheers,
Wol