Re: Index/hash order
[Posted April 13, 2005 by corbet]
| From: |
| Linus Torvalds <torvalds-AT-osdl.org> |
| To: |
| "H. Peter Anvin" <hpa-AT-zytor.com> |
| Subject: |
| Re: Index/hash order |
| Date: |
| Wed, 13 Apr 2005 10:24:53 -0700 (PDT) |
| Cc: |
| Ingo Molnar <mingo-AT-elte.hu>, git-AT-vger.kernel.org |
On Wed, 13 Apr 2005, H. Peter Anvin wrote:
>
> I see what you mean. Do remember, however, that the fact that the blobs
> are compressed is part of the argument as to why there is no need to do
> xdelta-type incremental storage.
No.
The reason for not doing deltas is not "because we compress stuff we don't
need it".
The reason for not doing deltas is purely about consistency, speed, and
distribution. Compression is not it.
The reason I rejected deltas out-of-hand in the design was:
- I want top-of-tree to be fast. And by "fast" I mean so frigging
unbelievably fast that I feel confident that nothing that gives the
same kind of consistency guarantees can top it (that said, I'll also
freely admit that my definition of "fast" is "fast for things _I_ care
about" ;)
This means that a delta format just isn't acceptable. Either you have
to build up the top based on history (forward-moving deltas), which
clearly does not scale performance-wise, or you have to re-base the
deltas and keep the top up-to-date and make the slowdown happen for
old revisions.
Making older revisions slower is fine by me, but it fails my second
basic requirement:
- I want things to distribute well. This means that it has to be based
on a "append data" model, where historical data never changes, and you
only append on top of it (either by adding totally new files, or by
just letting the files grow).
This works in a forward-delta environment (which is fundamentally based
on the notion of "we know the old version, we're adding new stuff on
top of it"), but does _not_ work in the backwards model of "we keep the
old history as a delta against the new" model.
In other words, I don't dislike delta's per se. But they are fundamentally
incompatible with the very purpose of "git", so git does not use them.
Now, it's quite possibly a _wonderful_ idea to use deltas for git-to-git
synchronization. For example, one of the nice properties of "git" is
exactly the fact that the data involved _fully_ determines all objects. So
let's say that you already have the parent version of a commit: you do not
have to send the full object database to synchronize, you really _can_
send just the diff of the data and the file structure (modes) and the
exact commit object (*), and the receiving side can then re-create the
rest from the git database it already has.
And this is all possible exactly because git does not pollute the git
objects with _anything_ else than their contents and has a fixed method
for re-creating them.
So if you want to do a "git-sync" protocol that sends deltas back and
forth, that is quite possible, and is totally independent of the fact that
the git database itself is designed to be totally stable.
In fact, the total stability of the git database is a huge boon. It means
that while a "git-sync" is going on, the synchronization process in _no_
way needs to worry about any writes happening to the git database on
either end. In other words, you can synchronize a git database with no
locking _what-so-ever_.
Trust me, not needing locking is a huge boon. I don't think people realize
just how much thought I've put into my database selection and what the
implications are.
It's perfect, I tell you.
Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
(
Log in to post comments)