By Jake Edge
August 22, 2007
Development using Git, with its decentralized model, is gaining
proponents for projects beyond its Linux kernel heritage. Some recent
threads
on the kde-core-devel mailing list have been discussing how Git might be
used by some developers without disrupting the Subversion (svn) infrastructure that is
used by KDE. That conversation has broadened to consider how a large
project like KDE might reorganize to take advantage of Git's strengths. It
does not look like KDE is really considering a switch – they
converted from CVS a little over two years ago – but the discussion
is useful to anyone thinking about using Git.
There are really two separate discussions taking place, the first concerns
using Git
without disrupting svn, while the second covers the larger issues of how to
structure and use Git for a larger project. The two are intertwined as
the "best practice" for a KDE-sized project is to convert incrementally.
Smaller sub-projects, a particular KDE application for example, would use
Git while still committing the changes back to the svn repository. Trying
to do a wholesale conversion of a project the size of KDE, with many
developers, testers, translators and users – not to mention millions of
lines of code – would be something approaching impossible.
For tracking an svn repository, while using Git locally, the
git-svn tool is indispensable.
It uses any of the svn protocols to check out a
repository, optionally including branches and tags, and installing them
as a Git repository. A developer then uses Git commands locally, using
git-svn again when ready to update from or push changes to the svn
repository. It is not a perfect fit, complaints about losing history in
the conversion have been heard, but it does provide Git users a way to
interact with svn.
The decentralized nature of the Git development model is always a
stumbling block for projects that are used to the single, central,
repository model of svn and other revision control systems. Adam Treat
invited a rather well-known expert on Git, with some small experience in
applying it to large projects, to comment on some of the questions he and
others had. Linus Torvalds, who is also a KDE user, responded,
at length, with some very useful insights.
Breaking the project into sub-projects is the first step:
So I'm hoping that if you guys are seriously considering git, you'd also
split up the KDE repository so that it's not one single huge one, but with
multiple smaller repositories (ie kdelibs might be one, and each major app
would be its own), and then using the git "submodule" support to tie it
all together.
Using the git-submodule
command, a project can be broken up into many pieces, each with their own
Git repository. Those separate repositories can then be stitched together
into a "superproject" that understands how to handle a collection of
repositories. If a change affects multiple modules, it can still be
handled in an atomic way:
What happens is that you do a single commit in each submodule that is
atomic to that *private* copy of that submodule (and nobody will ever see
it on its own, since you'd not push it out), and then in the supermodule
you make *another* commit that updates the supermodule to all the changes
in each submodule.
See? It's totally atomic. Anybody that updates from the supermodule will
get one supermodule commit, when that in turn fetches all the
submodule changes, you never have any inconsistent state.
Users of a development tree have differing needs, which Git supports by not
requiring a central repository that all users must interact with. Torvalds
believes that the development organization, not the tool, should determine
which repositories are central:
I certainly agree that almost any project will want a "central" repository
in the sense that you want to have one canonical default source base that
people think of as the "primary" source base.
But that should not be a *technical* distinction, it should be a *social*
one, if you see what I mean. The reason? Quite often, certain groups would
know that there is a primary archive, but for various reasons would want
to ignore that knowledge.
For Linux, his kernel Git tree is the center, but for a variety of other
users, the "stable" tree or distribution kernel trees for example, their
repositories are the source. Those repositories can and do update from
time to time from the main tree, but they control when and the users of
those trees don't have to care.
On the subject of mapping the current KDE practices to Git, Torvalds is, characteristically, not shy about expressing
his opinion:
Hey, you can use your old model if you want to. git doesn't *force* you to
change. But trust me, once you start noticing how different groups can
have their own experimental branches, and can ask people to test stuff
that isn't ready for mainline yet, you'll see what the big deal is all
about.
Centralized _works_. It's just *inferior*.
There is a clash of development models going on and Torvalds is
pushing the kernel's model. His reasons are good, though they may not
convince everyone, which is why Git tries hard to avoid forcing any
particular style. As he did with open source development, Torvalds is
trying to lead by example, while not forcing anyone to change.
Reading the full threads including the entire posting by Torvalds will be very
interesting to those who follow source code management issues. This
culture clash, centralized and somewhat bureaucratic versus decentralized and
freewheeling will come up again and again over the next few years.
Torvalds seems to think the Git model will work most everywhere and his
track record for making smart choices is good. It will be interesting to
watch.
(
Log in to post comments)