Ten-year timeline part 6: almost to the present
Ten-year timeline part 6: almost to the present
Posted Feb 27, 2008 20:46 UTC (Wed) by nix (subscriber, #2304)Parent article: Ten-year timeline part 6: almost to the present
git was, for me, final proof of the `release early, release often' idea. The initial git was very nearly unusable by mortals and was certainly far too disk-space-inefficient to actually be used for a project with a churn rate as high as the kernel for long. But because the *representation* was right, the other problems could be solved later (and were, yet fast enough that nobody's disks filled up): and a whole bunch of really nasty ones that had bedeviled other VC systems for ages just ceased to exist, like rename tracking (what? you need to track renames? why not just search for similar content when you pack? that way you can merge stuff that's similar whether or not it originated in a rename.) And it keeps improving. My kernel git repo actually uses *less* space now than it did in the 2.6.18 era because of repacker improvements...
Posted Feb 27, 2008 21:51 UTC (Wed)
by zooko (guest, #2589)
[Link] (16 responses)
Posted Feb 28, 2008 0:04 UTC (Thu)
by ncm (guest, #165)
[Link] (8 responses)
Posted Feb 28, 2008 0:16 UTC (Thu)
by zooko (guest, #2589)
[Link] (7 responses)
Posted Feb 28, 2008 0:27 UTC (Thu)
by zooko (guest, #2589)
[Link] (5 responses)
Posted Feb 28, 2008 1:00 UTC (Thu)
by drag (guest, #31333)
[Link] (4 responses)
Posted Feb 29, 2008 0:05 UTC (Fri)
by graydon (guest, #5009)
[Link] (3 responses)
Posted Feb 29, 2008 12:21 UTC (Fri)
by daniel (guest, #3181)
[Link] (2 responses)
Posted Feb 29, 2008 12:46 UTC (Fri)
by zooko (guest, #2589)
[Link]
Posted Feb 29, 2008 17:14 UTC (Fri)
by graydon (guest, #5009)
[Link]
Posted Feb 28, 2008 1:43 UTC (Thu)
by ncm (guest, #165)
[Link]
Posted Feb 28, 2008 17:26 UTC (Thu)
by zooko (guest, #2589)
[Link] (6 responses)
Posted Feb 28, 2008 21:15 UTC (Thu)
by nix (subscriber, #2304)
[Link] (5 responses)
Posted Feb 28, 2008 21:51 UTC (Thu)
by zooko (guest, #2589)
[Link] (3 responses)
Posted Feb 29, 2008 1:48 UTC (Fri)
by nix (subscriber, #2304)
[Link] (2 responses)
Posted Mar 1, 2008 21:07 UTC (Sat)
by xtifr (guest, #143)
[Link] (1 responses)
Posted Mar 4, 2008 6:16 UTC (Tue)
by xoddam (subscriber, #2322)
[Link]
Posted Feb 29, 2008 4:23 UTC (Fri)
by njs (subscriber, #40338)
[Link]
Posted Feb 28, 2008 7:55 UTC (Thu)
by jamesh (guest, #1159)
[Link] (2 responses)
Posted Feb 28, 2008 10:23 UTC (Thu)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Feb 28, 2008 20:18 UTC (Thu)
by njs (subscriber, #40338)
[Link]
Ten-year timeline part 6: almost to the present
It continues to bother me that Linus didn't give credit to Monotone in his git release
announcement. Git at its inception was, as I understand it, a clone of a subset of monotone,
perhaps with the addition of some tweaks that I haven't understood. That's fine -- it's a
wonderful thing to draw ideas from other people and fit them into your needs, and to extend
them to work better. It's more wonderful when you give them credit -- that's a useful part of
the scientific process, and people should treat it as a moral obligation to do so as well as
they can.
Ten-year timeline part 6: almost to the present
Instead Linus still trash-talks Monotone, even though its implementation is completely
different than when he raided it, and he (evidently) hasn't looked at it since.
Ten-year timeline part 6: almost to the present
Reference, please?
Ten-year timeline part 6: almost to the present
Oh by, Graydon Hoare -- author of monotone -- posted, at the time, his summary of his
discussion with Linus and reasons why he thinks Linus rejected monotone:
http://www.mail-archive.com/monotone-devel@nongnu.org/msg...
I greatly appreciate the way Graydon is precise and to the point while also being soft-spoken
and charitable towards others. Not to put too fine a point on it, but I find myself
noticeably happier at the prospect of reading something Graydon has written than something
Linus has written.
A related story to "Linus rejects monotone" is "Linus rejects Mercurial". There is an
interesting thread on lkml about that. Here is a climactic point where Linus seems to be
wavering about adopting Mercurial instead of git:
http://lkml.org/lkml/2005/5/3/2
I couldn't find the next inflection point -- where Linus decided to keep git.
I think maybe the last word on the subject belongs to the long-lost ntk.net:
http://www.ntk.net/2005/04/29/
"Given the surfeit of next generation systems - including darcs, codeville, arch, monotone,
bazaar, bazaar-ng, vesta, svk, ArX, aegis, we suspect that the winner will be git, just out of
the Mighty Power Of Fanboyism."
:-)
Ten-year timeline part 6: almost to the present
Well speed was a very very high priority for Linus with Git. His opinion is that it puts a
different dynamic on a feature if it's slow vs fast. That is people will take much more
advantage of features and use them in creative ways if they are fast, and if they are very
slow it makes them very much less useful in real world situations.
Seems like he felt that people were caring about features and code structure (ie being a bit
to academic) rather then concentrating on making quick and essential functionality.
I donno. Stuff your guys are talking about are probably much improved but git exists now and
it's gotten popular so I doubt there is a reason to switch now.
I don't know how this compares to other things like Monotone, BK, or any other 'third
generation' version control system else like that, but it is also important for him that each
developer can have his own private tree local for his own work. That there is no enforced
hierarchy, no secondary players in git-land. No centralized anything. Everything is
distrubted. Each personal repository is on equal footing with everybody else's and all you
essentially have is a multitude of separate git trees that can share code equally.
There is a lovely talk here:
http://video.google.com/videoplay?docid=-2199332044603874737
Also keep in mind that jives and insults matter different in different contexts. Friends often
are very insulting to each other were I am from, it's actually quite friendly because they
know that they can behave like that and trust each other not to take it personally. That is as
long as it stays 'good nature'. Reassuringly friendly sometimes. (this does not mix well with
alcohol, though. Not at all)
I notice that often people don't understand that and they come from a place were politeness
and careful attention to social sensitivities is very important. This is fine, it's just
different. Depending on context this can be also mis-interpreted and lack of trust/empathy and
taken sort of indicator of a person with a self-superior attitude.
It's not really that important, but it's something to keep in mind I guess. Of course Linus
can be abrasive and he is proud of it, so lots of his BS just should be ignored completely.
Ten-year timeline part 6: almost to the present
You're wrong, but it doesn't matter. The only thing that really matters -- to anyone outside
those few of us mildly offended by being misrepresented -- is that the idea, and some
implementation of it, is now spreading like wildfire.
The fact that git didn't invent the idea is one of those easily-overlooked juicy details lost
in history. It has technical and political momentum to dominate, so ... run with it.
Ten-year timeline part 6: almost to the present
Hi Graydon,
Nice to see you here, and nice to see you getting credit for the considerable advances you
brought about in the state of this art.
I wonder if anybody will ever chronicle my part in the story?
Regards,
Daniel
Ten-year timeline part 6: almost to the present
Do tell. I don't know your part of the story. Nor your last name.
Ten-year timeline part 6: almost to the present
Nice to see you again too! But I must admit to not knowing all the twists and turns of your
part of the story to chronicle them correctly. I know some bits but I'd probably blurt them
out wrong.
I didn't mean to imply that I invented the interesting ideas in monotone. Merely that it, as a
social and researchy development project, both discovered a few fresh ideas and consolidated /
refined many others, and has subsequently been a ripe source of ideas for its successors. I
did some of the work, but also made a ton of mistakes; the key theoretical work we stumbled
through during the course of monotone development was mostly the doing of others. Jerome
Fisher, Nathaniel Smith, Derek Scherger, Bram and Ross Cohen, Timothy Brownawell, Christof
Petig, Richard Levitte, Zack Weinberg, Peter Simons, Daniel Phillips, Emile Snyder, Markus
Schiltknecht, Paul Crowley ... and a long list of others who I am probably implicitly
insulting by not mentioning here (sorry, limited comment space).
We really lucked out, for whatever reason, in drawing together a group of exceptional people
to mull over the problem and push around potential solutions in code, without anyone getting
too pissy about "being right". It's been a really enjoyable and open community.
Ten-year timeline part 6: almost to the present
I was referring to
http://lwn.net/Articles/249457/
Ten-year timeline part 6: almost to the present
I think part of what bothers me about this story is our Loyal Editor's assertion: "Git was not
the first free distributed revision control system, but it was the first to be employed on
such a massive scale. In a real sense, git launched a new era of free software development."
The first sentence is true, as far as I goes, but it was Linux switching to a Free Software,
decentralized revision control tool that was so important, not the invention of yet another
Free Software, decentralized revision control tool. The combination of Linus not giving
credit where credit was due and of Linus-fans subsequently misattributing monotone's
innovations to git bothers me.
The use of a decentralized, Free, revision control tool for the kernel was a major step
forward. The invention of git was a minor tweak to the state of the art -- an exploration of
other parts of the design space.
Don't get me wrong -- I like git, and I'm glad it exists. I like diversity and redundancy. I
like exploring the design space widely instead of everyone congregating on the first part of
the design space that is Good Enough.
And I'm sure that git serves the needs of linux kernel developers -- and of many other people
-- as well or better than various alternatives would. But I don't like for the history of
scientific invention to be obscured by enthusiasm for Linus's personality.
What did our Loyal Editor mean by writing that git launched a new era of free software
development?
Ten-year timeline part 6: almost to the present
You're apparently reading things into what I wrote that I didn't say.
I didn't say `Linus invented all this stuff'. I said that he got the
representation right, not that nobody else had ever done the same. I'm not
such a fool as to imagine that nobody else ever tried content-addressable
storage in version control systems before. (However, I'm fairly sure
nobody ever released a VCS in such an embryonic state before: generally
release schedules for VCSes are quite conservative because people hate
losing their work. It's impressive that git has managed to go so long with
an aggressive release policy with so few incidents of significant data
loss.)
Ten-year timeline part 6: almost to the present
Sorry -- I didn't really mean *you*. Your point about releasing a version control system was
an interesting and valid point, I thought. I didn't really mean any specific person on this
thread -- more the general folklore that I imagine exists in which people think that Linus
took a break from his kernel hacking in order to singlehandedly move forward the state of the
art of revision control.
Ten-year timeline part 6: almost to the present
Ah, OK. Damn English: why can't we have visibly distinct singular and
plural second person pronouns anyway?
Ten-year timeline part 6: almost to the present
(Totally off-topic)
There are regional dialects which make the distinction at least in part. For example, the
American South offers us the term "y'all", which is universally used (among those who use it)
as a second-person plural. Although I'm not from the South, I find the term useful enough
that I occasionally drop it into informal speech or writing. Unfortunately, I don't know of
any equivalent that is unambiguously singular.
quasi-English plural and singular forms for 'you'
Scots offers 'youse' as another effective second-person plural (this has also become common in
Australian vernacular in recent decades).
There is no modern-sounding English pronoun that is unambiguously singular, but the archaic
(some Northern English dialects preserved this usage up to the 1950s) 'thee' and accusative
'thou' will do, if you don't mind sounding vaguely biblical.
If you do use these, please please conjugate your funny old verb forms (thou dost, she doth)
correctly!
Ten-year timeline part 6: almost to the present
>I didn't say `Linus invented all this stuff'. I said that he got the representation right,
not that nobody else had ever done the same. I'm not such a fool as to imagine that nobody
else ever tried content-addressable storage in version control systems before. (However, I'm
fairly sure nobody ever released a VCS in such an embryonic state before: generally release
schedules for VCSes are quite conservative because people hate losing their work. It's
impressive that git has managed to go so long with an aggressive release policy with so few
incidents of significant data loss.)
Point of history: Git got the representation (by which I'm assuming you mean the core
file/tree/commit blob design) right because it copied the hard parts from Monotone. Monotone
got the representation right because it started with a good idea (i.e., "hey, let's use
content-addressing to decouple storage and history representations"), and then evolved it over
several years (including two major representation rewrites, one of which added the crucial
"commit" object), using monthly time-based releases (i.e., "oops, it's Monday, time to ship
whatever's in trunk"), and was self-hosting from ~the very beginning -- I think before Graydon
had even received a single outside patch. It did all this with continuous field upgrades for
all storage/representation changes, minimal segfault bugs -- I'm remembering ~2-4? (mtn is
written in C++) -- and no reported data loss by any users.
None of which is to say that git is unimpressive -- I don't subscribe to the peculiar notion
whereby any achievement seen once becomes unimpressive forever after -- and git is a
well-done, well-maintained project containing other innovations and that is making a lot of
users happy. That's always impressive :-). ATM, in fact, it's doing a better job of it than
monotone is -- probably as a result of Linus's emphasis on building a tool that would be
immediately useful under extreme conditions, while monotone was noodling around trying to
invent about three different novel technologies. Turns out that one suffices. Oops. OTOH,
it's not like this is the first time an idea had to jump projects to move from research to
mainstream; they reward different approaches. It's entirely possible that if monotone had
started out with the attitude that made git so successful, neither would exist at this point.
So... *shrug*.
(You just *wait* 'til we get those other two nailed down, though! Muahaha!)
Ten-year timeline part 6: almost to the present
> and a whole bunch of really nasty ones that had bedeviled other VC
> systems for ages just ceased to exist, like rename tracking (what?
> you need to track renames? why not just search for similar content
> when you pack? that way you can merge stuff that's similar whether
> or not it originated in a rename.)
The primary reason for wanting to track renames in a version control system is not storage
efficiency: it is to correctly merge a branch even if files have been moved around.
Git's answer to the merge problem is to infer the renames based on the file content, which
works quite well except for a few cases where it doesn't (some cases with renamed directories
and added files come to mind).
In contrast, VCS's that do track renames generally only use the information they captured when
performing the merge. This can break down in cases where a file has been split in two, since
they'll often try to apply the changes to just one of the halves.
One thing to note is that the data model of a VCS that tracks renames does not preclude
performing git-style content based merging, while the reverse is not true. Perhaps the ideal
merge algorithm will turn out to be a combination of tracking renames and content based
heuristics.
As both approaches have their faults, I'd prefer to keep my data in a form that tracks the
renames. 5 to 10 years down the track, if it turns out that content based merges are state of
the art I can always throw that information away.
Ten-year timeline part 6: almost to the present
It's true about the merging stuff, I quite forgot about that thanks to using VCs that had
largely solved this problem for so long now :)
Tracking renames explicitly works fine until your users forget to tell the VCS as well as the
filesystem about the rename. Then it breaks. I find the users forget to do this all the damn
time.
I suspect manual rename tracking (as opposed to content-detection inference at some stage)
will work properly only when the FS and VCS are merged, and look at ClearCase for an example
of how ugly *that* can get. (Also, manual rename tracking intrinsically can't handle the case
you mentioned of 'oh that file got cut in half', let alone more complex cases.)
Ten-year timeline part 6: almost to the present
It's easy for the VCS to detect when the user has forgotten to record a rename -- there's an
inconsistency between the VCS's idea of what the tree should look like and what the tree
actually looks like on disk. (This is true both when the VCS thinks a rename should have
occurred but none did, or vice-versa.) Most VCSes already refuse to allow a commit when
there's a "missing file" situation like this. So just add some code to that routine that says
"such and such files are missing -- hmm, but it looks like very similar files are in an
'unknown' state over there. (Do you want me to record the following renames to fix things
up?)/(Do you want me to rename the following files on disk to fix things up?): a -> b, c ->
d".
This combines the ease-of-use of the automatic content tracking systems with the
predictability of explicit rename tracking. It still doesn't help with files getting cut in
half, though, sure. (Handling that case in an explicit and predictable manner looks hopeless,
though, which is why I personally would prefer that such heuristics only be used as part of a
manual merge with a human checking the results. YMMV.)
Hooking into the FS doesn't really help, and probably hurts, because a VCS "rename" is at a
higher semantic level than a FS "rename". It's perfectly common for people to, say, copy a
file, edit the copy, and then replace/delete the original. rename(2) is a way to move bits
around, "myvcs rename" is a way to record intentions.