Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for May 23, 2013
An "enum" for Python 3
An unexpected perf feature
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
git.kernel.org mirror at Google
Posted Apr 25, 2012 16:57 UTC (Wed) by juliank (subscriber, #45896)
Posted Apr 25, 2012 18:26 UTC (Wed) by spearce (guest, #61702)
kernel.googlesource.com only supports the Git smart HTTP mode. The backend isn't a traditional filesystem, so the older "dumb" HTTP protocol isn't feasible.
Posted Apr 25, 2012 18:32 UTC (Wed) by spearce (guest, #61702)
Its a mirror. The path names are the same as on git.kernel.org, just swap out the host. And yes, Linus' repository is buried under a lot of paths as pub/scm/linux/kernel/git/torvalds/linux, and unfortunately doesn't even have a description set.
> Using https for the kernel git doesn't seem entirely awesome :|
Without SSL there isn't an easy way to validate the references received match what the source site has to send. Its only ~500 MiB to clone a kernel git. Clients can usually decrypt this faster than the network can transit the data.
Posted Apr 25, 2012 19:26 UTC (Wed) by apoelstra (subscriber, #75205)
I would expect the concern to be that the server can't encrypt fast enough to saturate its pipe. At home, I can't scp over a gigabit link at full speed (and top shows sshd pinning the CPU).
Posted Apr 25, 2012 22:02 UTC (Wed) by Cyberax (✭ supporter ✭, #52523)
Posted Apr 26, 2012 15:05 UTC (Thu) by spearce (guest, #61702)
We actually find we have sufficient server CPU to do the SSL encryption, but there is never enough bandwidth between the client endpoint and the remote server. Our servers won't permit this experiment (they require use of SSL), but doing a git clone over http:// runs at the same bandwidth as https://, as the choke points aren't the server CPU, but instead limited bandwidth on network links between client and server.
The Google side of the network is obviously shared with other services the company offers, and this bulk data transfer traffic may be prioritized lower than other data that users want immediately, such as web search results.
Posted Apr 25, 2012 23:02 UTC (Wed) by jgg (guest, #55211)
https:// seems to require more work than git://
remote: Counting objects: 2677, done
remote: Finding sources: 100% (2120/2120)
remote: Getting sizes: 100% (820/820)
remote: Compressing objects: 100% (263/263)
remote: Total 2120 (delta 1610), reused 1865 (delta 1525)
Receiving objects: 100% (2120/2120), 367.23 KiB, done.
Resolving deltas: 100% (1779/1779), completed with 549 local objects.
(almost 1 min of run time)
remote: Counting objects: 6421, done.
remote: Compressing objects: 100% (1130/1130), done.
remote: Total 4935 (delta 4470), reused 4248 (delta 3790)
Receiving objects: 100% (4935/4935), 698.54 KiB | 504 KiB/s, done.
Resolving deltas: 100% (4470/4470), completed with 1068 local objects.
(about 7 seconds run time)
Posted Apr 26, 2012 8:55 UTC (Thu) by juliank (subscriber, #45896)
Posted Apr 26, 2012 15:19 UTC (Thu) by spearce (guest, #61702)
We don't use git-core, no. :-)
> but store the repository in big table
Almost. We store some information in BigTable, and most data directly in Google's filesystem. The bulk of the data is actually in relatively normal Git pack files, but they are stored differently than git-core would do.
> and have their own git implementation.
Not really. We use JGit (http://www.eclipse.org/jgit) with the DFS storage package (org.eclipse.jgit.storage.dfs) and some glue to connect that code to BigTable and the Google filesystem. We have no custom patches to JGit, everything was upstreamed already months ago.
We haven't yet figured out how to open source the glue code. Its non-trivial in size, which indicates the stock org.eclipse.jgit.storage.dfs package is not sufficient on its own to run a service like this. But most of the glue code calls Google specific APIs, like the BigTable client library. We could port this code to another database, but that would just be code thrown over the wall. What is in JGit right now is at least exactly what we run, and thus something we maintain and use every day.
Google is slow
Posted Apr 26, 2012 0:33 UTC (Thu) by ras (subscriber, #33059)
In the case of Debian, it takes longer issue the GET and get the "it hasn't changed" response from Google then it does to download the megabytes of Debian package index data from all other sources combined. As others have observed, once the data starts flowing it moves at a reasonable clip, but it can take a literally a minute to get the first byte of the response.
It has been like this for months. One day I am going to run out of patience and get rid of google-earth.list out of sources.list.d. It has a real impact on the time it takes me to run through a debian package development cycle, which involves doing an update package index.
Posted Apr 26, 2012 0:47 UTC (Thu) by whiprush (subscriber, #23428)
Posted Apr 26, 2012 0:54 UTC (Thu) by ras (subscriber, #33059)
Thanks, that is handy to know.
Posted Apr 26, 2012 15:12 UTC (Thu) by spearce (guest, #61702)
dl.google.com is different from kernel.googlesource.com. I don't even think they have the same IP address, let alone other things in common. You can call dl.google.com slow, but I'm not sure I would apply the same label (yet anyway) to kernel.googlesource.com. :-)
We usually see about 400ms latency from user to server for kernel.googlesource.com, and most of this is in the SSL setup cost. git clone/fetch/pull/ls-remote all need to create a new SSL connection before they can obtain the list of references from the repository.
We have a modified Git protocol helper for HTTPS that we need to get open sourced, it enables reuse of SSL connections across git command invocations. With this helper I see response times below 80ms from kernel.googlesource.com. We will probably release it GPLv2 into git.git's contrib/ directory, but its an odd duck because the helper is written in Go.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds