Google has announced
that it has put up a (read-only) git.kernel.org mirror at kernel.googlesource.com.
"kernel.googlesource.com is served out of multiple Google data
centers, utilizing facilities in Asia, the United States and Europe to
provide speedy access from almost anywhere in the world." (Thanks
to several LWN readers).
(Log in to post comments)
What goes around, comes around
Posted Apr 25, 2012 15:13 UTC (Wed) by alonz (subscriber, #815)
[Link]
Payback for kernel.org's hosting of Android?
For what it's worth, I (being far from the continental US) applaud this move.
What goes around, comes around
Posted Apr 25, 2012 15:16 UTC (Wed) by dlang (✭ supporter ✭, #313)
[Link]
it's actually amazing that the git hosting at kernel.org has been operating without outside mirrors for so long.
What goes around, comes around
Posted Apr 25, 2012 18:10 UTC (Wed) by dmitrij.ledkov (subscriber, #63320)
[Link]
Why being far from US matters? They are pushing the mirror over CDN from many locations.
What goes around, comes around
Posted Apr 25, 2012 19:14 UTC (Wed) by dlang (✭ supporter ✭, #313)
[Link]
the normal http and ftp accessable data is mirrored widely, but not the git repositories.
git.kernel.org mirror at Google
Posted Apr 25, 2012 16:34 UTC (Wed) by jgg (guest, #55211)
[Link]
The downloading was fast, but it took forever to figure out what to fetch.. Using https for the kernel git doesn't seem entirely awesome :|
git.kernel.org mirror at Google
Posted Apr 25, 2012 16:57 UTC (Wed) by juliank (subscriber, #45896)
[Link]
Do they really not use git's smart http mode?
git.kernel.org mirror at Google
Posted Apr 25, 2012 18:26 UTC (Wed) by spearce (guest, #61702)
[Link]
> Do they really not use git's smart http mode?
kernel.googlesource.com only supports the Git smart HTTP mode. The backend isn't a traditional filesystem, so the older "dumb" HTTP protocol isn't feasible.
git.kernel.org mirror at Google
Posted Apr 25, 2012 18:32 UTC (Wed) by spearce (guest, #61702)
[Link]
> The downloading was fast, but it took forever
> to figure out what to fetch..
Its a mirror. The path names are the same as on git.kernel.org, just swap out the host. And yes, Linus' repository is buried under a lot of paths as pub/scm/linux/kernel/git/torvalds/linux, and unfortunately doesn't even have a description set.
> Using https for the kernel git doesn't seem entirely awesome :|
Without SSL there isn't an easy way to validate the references received match what the source site has to send. Its only ~500 MiB to clone a kernel git. Clients can usually decrypt this faster than the network can transit the data.
git.kernel.org mirror at Google
Posted Apr 25, 2012 19:26 UTC (Wed) by apoelstra (subscriber, #75205)
[Link]
>Its only ~500 MiB to clone a kernel git. Clients can usually decrypt this faster than the network can transit the data.
I would expect the concern to be that the server can't encrypt fast enough to saturate its pipe. At home, I can't scp over a gigabit link at full speed (and top shows sshd pinning the CPU).
git.kernel.org mirror at Google
Posted Apr 25, 2012 22:02 UTC (Wed) by Cyberax (✭ supporter ✭, #52523)
[Link]
Truecrypt with AES-NI turned on benchmarks at about 1Gb/sec on my Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz.
git.kernel.org mirror at Google
Posted Apr 26, 2012 15:05 UTC (Thu) by spearce (guest, #61702)
[Link]
>> Its only ~500 MiB to clone a kernel git. Clients can usually decrypt this faster than the network can transit the data.
>
> I would expect the concern to be that the server can't
> encrypt fast enough to saturate its pipe
We actually find we have sufficient server CPU to do the SSL encryption, but there is never enough bandwidth between the client endpoint and the remote server. Our servers won't permit this experiment (they require use of SSL), but doing a git clone over http:// runs at the same bandwidth as https://, as the choke points aren't the server CPU, but instead limited bandwidth on network links between client and server.
The Google side of the network is obviously shared with other services the company offers, and this bulk data transfer traffic may be prioritized lower than other data that users want immediately, such as web search results.
git.kernel.org mirror at Google
Posted Apr 25, 2012 23:02 UTC (Wed) by jgg (guest, #55211)
[Link]
No, I mean literally: 'git fetch' sat for about 30 seconds before it printed anything and then it went (slowly) through more phases than normal for a git: URL, and then it finally started downloading (fairly quickly)
Posted Apr 26, 2012 8:55 UTC (Thu) by juliank (subscriber, #45896)
[Link]
They most likely do not use standard git on the server side, but store the repository in big table and have their own git implementation.
git.kernel.org mirror at Google
Posted Apr 26, 2012 15:19 UTC (Thu) by spearce (guest, #61702)
[Link]
> They most likely do not use standard git on the server side,
We don't use git-core, no. :-)
> but store the repository in big table
Almost. We store some information in BigTable, and most data directly in Google's filesystem. The bulk of the data is actually in relatively normal Git pack files, but they are stored differently than git-core would do.
> and have their own git implementation.
Not really. We use JGit (http://www.eclipse.org/jgit) with the DFS storage package (org.eclipse.jgit.storage.dfs) and some glue to connect that code to BigTable and the Google filesystem. We have no custom patches to JGit, everything was upstreamed already months ago.
We haven't yet figured out how to open source the glue code. Its non-trivial in size, which indicates the stock org.eclipse.jgit.storage.dfs package is not sufficient on its own to run a service like this. But most of the glue code calls Google specific APIs, like the BigTable client library. We could port this code to another database, but that would just be code thrown over the wall. What is in JGit right now is at least exactly what we run, and thus something we maintain and use every day.
Google is slow
Posted Apr 26, 2012 0:33 UTC (Thu) by ras (subscriber, #33059)
[Link]
I use Google's dl.google.com for Debian (googleearth add's it to your sources.list), and for Android (as an Eclipse upgrade site). Slow doesn't quite capture it.
In the case of Debian, it takes longer issue the GET and get the "it hasn't changed" response from Google then it does to download the megabytes of Debian package index data from all other sources combined. As others have observed, once the data starts flowing it moves at a reasonable clip, but it can take a literally a minute to get the first byte of the response.
It has been like this for months. One day I am going to run out of patience and get rid of google-earth.list out of sources.list.d. It has a real impact on the time it takes me to run through a debian package development cycle, which involves doing an update package index.
Google is slow
Posted Apr 26, 2012 0:47 UTC (Thu) by whiprush (subscriber, #23428)
[Link]
This is a known problem they're working on, see here:
Posted Apr 26, 2012 0:54 UTC (Thu) by ras (subscriber, #33059)
[Link]
> This is a known problem they're working on
Thanks, that is handy to know.
Google is slow
Posted Apr 26, 2012 15:12 UTC (Thu) by spearce (guest, #61702)
[Link]
> I use Google's dl.google.com for Debian (googleearth add's it to your sources.list), and for Android (as an Eclipse upgrade site). Slow doesn't quite capture it.
dl.google.com is different from kernel.googlesource.com. I don't even think they have the same IP address, let alone other things in common. You can call dl.google.com slow, but I'm not sure I would apply the same label (yet anyway) to kernel.googlesource.com. :-)
We usually see about 400ms latency from user to server for kernel.googlesource.com, and most of this is in the SSL setup cost. git clone/fetch/pull/ls-remote all need to create a new SSL connection before they can obtain the list of references from the repository.
We have a modified Git protocol helper for HTTPS that we need to get open sourced, it enables reuse of SSL connections across git command invocations. With this helper I see response times below 80ms from kernel.googlesource.com. We will probably release it GPLv2 into git.git's contrib/ directory, but its an odd duck because the helper is written in Go.
git.kernel.org mirror at Google
Posted Apr 26, 2012 5:18 UTC (Thu) by laroche (subscriber, #24463)
[Link]
Enabling rsync access on kernel.googlesource.com would be another plus,
so that further mirroring could also happen. It would also be good to
know if rsync or git is used to keep up this mirror.
best regards,
Florian La Roche
git.kernel.org mirror at Google
Posted Apr 26, 2012 14:55 UTC (Thu) by spearce (guest, #61702)
[Link]
> Enabling rsync access on kernel.googlesource.com would be another plus
This is not possible for us, for a number of reasons.
The backend storage is not the standard Git filesystem format. A client that rsync copied our data would have some Git-looking stuff, but it may be also appear to be corrupt. This is because we sort of use the Git file formats, but we have modified them slightly to work around some limitations in our storage system. We patch up the format on the fly to be Git standard when speaking with a client using the Git protocol. Doing that same patch up work with rsync would be difficult.
The other reason this won't work is our network only routes HTTP. We can't route rsync from the client into the backend server that is handling the request.
> It would also be good to know if rsync or git is used
> to keep up this mirror
The mirror uses git://git.kernel.org/ to fetch repositories. It has a complete list of the available repositories, and polls each one with git fetch every 30 minutes. Because we use git to update the mirror, the mirror is always "Git consistent", it never exposes transient corruption to a client. This is unlike git.kernel.org, which uses rsync to update itself, and can fail client requests because of remote corruption where rsync didn't get everything in a single pass.