LWN.net Logo

git.kernel.org mirror at Google

git.kernel.org mirror at Google

Posted Apr 25, 2012 16:34 UTC (Wed) by jgg (guest, #55211)
Parent article: git.kernel.org mirror at Google

The downloading was fast, but it took forever to figure out what to fetch.. Using https for the kernel git doesn't seem entirely awesome :|


(Log in to post comments)

git.kernel.org mirror at Google

Posted Apr 25, 2012 16:57 UTC (Wed) by juliank (subscriber, #45896) [Link]

Do they really not use git's smart http mode?

git.kernel.org mirror at Google

Posted Apr 25, 2012 18:26 UTC (Wed) by spearce (guest, #61702) [Link]

> Do they really not use git's smart http mode?

kernel.googlesource.com only supports the Git smart HTTP mode. The backend isn't a traditional filesystem, so the older "dumb" HTTP protocol isn't feasible.

git.kernel.org mirror at Google

Posted Apr 25, 2012 18:32 UTC (Wed) by spearce (guest, #61702) [Link]

> The downloading was fast, but it took forever
> to figure out what to fetch..

Its a mirror. The path names are the same as on git.kernel.org, just swap out the host. And yes, Linus' repository is buried under a lot of paths as pub/scm/linux/kernel/git/torvalds/linux, and unfortunately doesn't even have a description set.

> Using https for the kernel git doesn't seem entirely awesome :|

Without SSL there isn't an easy way to validate the references received match what the source site has to send. Its only ~500 MiB to clone a kernel git. Clients can usually decrypt this faster than the network can transit the data.

git.kernel.org mirror at Google

Posted Apr 25, 2012 19:26 UTC (Wed) by apoelstra (subscriber, #75205) [Link]

>Its only ~500 MiB to clone a kernel git. Clients can usually decrypt this faster than the network can transit the data.

I would expect the concern to be that the server can't encrypt fast enough to saturate its pipe. At home, I can't scp over a gigabit link at full speed (and top shows sshd pinning the CPU).

git.kernel.org mirror at Google

Posted Apr 25, 2012 22:02 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

Truecrypt with AES-NI turned on benchmarks at about 1Gb/sec on my Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz.

git.kernel.org mirror at Google

Posted Apr 26, 2012 15:05 UTC (Thu) by spearce (guest, #61702) [Link]

>> Its only ~500 MiB to clone a kernel git. Clients can usually decrypt this faster than the network can transit the data.
>
> I would expect the concern to be that the server can't
> encrypt fast enough to saturate its pipe

We actually find we have sufficient server CPU to do the SSL encryption, but there is never enough bandwidth between the client endpoint and the remote server. Our servers won't permit this experiment (they require use of SSL), but doing a git clone over http:// runs at the same bandwidth as https://, as the choke points aren't the server CPU, but instead limited bandwidth on network links between client and server.

The Google side of the network is obviously shared with other services the company offers, and this bulk data transfer traffic may be prioritized lower than other data that users want immediately, such as web search results.

git.kernel.org mirror at Google

Posted Apr 25, 2012 23:02 UTC (Wed) by jgg (guest, #55211) [Link]

No, I mean literally: 'git fetch' sat for about 30 seconds before it printed anything and then it went (slowly) through more phases than normal for a git: URL, and then it finally started downloading (fairly quickly)

https:// seems to require more work than git://

eg:

remote: Counting objects: 2677, done
remote: Finding sources: 100% (2120/2120)
remote: Getting sizes: 100% (820/820)
remote: Compressing objects: 100% (263/263)
remote: Total 2120 (delta 1610), reused 1865 (delta 1525)
Receiving objects: 100% (2120/2120), 367.23 KiB, done.
Resolving deltas: 100% (1779/1779), completed with 549 local objects.
From https://kernel.googlesource.com/pub/scm/linux/kernel/git/...

(almost 1 min of run time)

VS

remote: Counting objects: 6421, done.
remote: Compressing objects: 100% (1130/1130), done.
remote: Total 4935 (delta 4470), reused 4248 (delta 3790)
Receiving objects: 100% (4935/4935), 698.54 KiB | 504 KiB/s, done.
Resolving deltas: 100% (4470/4470), completed with 1068 local objects.
From git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable

(about 7 seconds run time)

git.kernel.org mirror at Google

Posted Apr 26, 2012 8:55 UTC (Thu) by juliank (subscriber, #45896) [Link]

They most likely do not use standard git on the server side, but store the repository in big table and have their own git implementation.

git.kernel.org mirror at Google

Posted Apr 26, 2012 15:19 UTC (Thu) by spearce (guest, #61702) [Link]

> They most likely do not use standard git on the server side,

We don't use git-core, no. :-)

> but store the repository in big table

Almost. We store some information in BigTable, and most data directly in Google's filesystem. The bulk of the data is actually in relatively normal Git pack files, but they are stored differently than git-core would do.

> and have their own git implementation.

Not really. We use JGit (http://www.eclipse.org/jgit) with the DFS storage package (org.eclipse.jgit.storage.dfs) and some glue to connect that code to BigTable and the Google filesystem. We have no custom patches to JGit, everything was upstreamed already months ago.

We haven't yet figured out how to open source the glue code. Its non-trivial in size, which indicates the stock org.eclipse.jgit.storage.dfs package is not sufficient on its own to run a service like this. But most of the glue code calls Google specific APIs, like the BigTable client library. We could port this code to another database, but that would just be code thrown over the wall. What is in JGit right now is at least exactly what we run, and thus something we maintain and use every day.

Google is slow

Posted Apr 26, 2012 0:33 UTC (Thu) by ras (subscriber, #33059) [Link]

I use Google's dl.google.com for Debian (googleearth add's it to your sources.list), and for Android (as an Eclipse upgrade site). Slow doesn't quite capture it.

In the case of Debian, it takes longer issue the GET and get the "it hasn't changed" response from Google then it does to download the megabytes of Debian package index data from all other sources combined. As others have observed, once the data starts flowing it moves at a reasonable clip, but it can take a literally a minute to get the first byte of the response.

It has been like this for months. One day I am going to run out of patience and get rid of google-earth.list out of sources.list.d. It has a real impact on the time it takes me to run through a debian package development cycle, which involves doing an update package index.

Google is slow

Posted Apr 26, 2012 0:47 UTC (Thu) by whiprush (subscriber, #23428) [Link]

This is a known problem they're working on, see here:

http://code.google.com/p/chromium/issues/detail?id=93409

Google is slow

Posted Apr 26, 2012 0:54 UTC (Thu) by ras (subscriber, #33059) [Link]

> This is a known problem they're working on

Thanks, that is handy to know.

Google is slow

Posted Apr 26, 2012 15:12 UTC (Thu) by spearce (guest, #61702) [Link]

> I use Google's dl.google.com for Debian (googleearth add's it to your sources.list), and for Android (as an Eclipse upgrade site). Slow doesn't quite capture it.

dl.google.com is different from kernel.googlesource.com. I don't even think they have the same IP address, let alone other things in common. You can call dl.google.com slow, but I'm not sure I would apply the same label (yet anyway) to kernel.googlesource.com. :-)

We usually see about 400ms latency from user to server for kernel.googlesource.com, and most of this is in the SSL setup cost. git clone/fetch/pull/ls-remote all need to create a new SSL connection before they can obtain the list of references from the repository.

We have a modified Git protocol helper for HTTPS that we need to get open sourced, it enables reuse of SSL connections across git command invocations. With this helper I see response times below 80ms from kernel.googlesource.com. We will probably release it GPLv2 into git.git's contrib/ directory, but its an odd duck because the helper is written in Go.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds