git.kernel.org mirror at Google

Posted Apr 25, 2012 18:32 UTC (Wed) by spearce (guest, #61702)
In reply to: git.kernel.org mirror at Google by jgg
Parent article: git.kernel.org mirror at Google

> The downloading was fast, but it took forever
> to figure out what to fetch..

Its a mirror. The path names are the same as on git.kernel.org, just swap out the host. And yes, Linus' repository is buried under a lot of paths as pub/scm/linux/kernel/git/torvalds/linux, and unfortunately doesn't even have a description set.

> Using https for the kernel git doesn't seem entirely awesome :|

Without SSL there isn't an easy way to validate the references received match what the source site has to send. Its only ~500 MiB to clone a kernel git. Clients can usually decrypt this faster than the network can transit the data.

git.kernel.org mirror at Google

Posted Apr 25, 2012 19:26 UTC (Wed) by apoelstra (subscriber, #75205) [Link] (2 responses)

>Its only ~500 MiB to clone a kernel git. Clients can usually decrypt this faster than the network can transit the data.

I would expect the concern to be that the server can't encrypt fast enough to saturate its pipe. At home, I can't scp over a gigabit link at full speed (and top shows sshd pinning the CPU).

git.kernel.org mirror at Google

Posted Apr 25, 2012 22:02 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

Truecrypt with AES-NI turned on benchmarks at about 1Gb/sec on my Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz.

git.kernel.org mirror at Google

Posted Apr 26, 2012 15:05 UTC (Thu) by spearce (guest, #61702) [Link]

>> Its only ~500 MiB to clone a kernel git. Clients can usually decrypt this faster than the network can transit the data.
>
> I would expect the concern to be that the server can't
> encrypt fast enough to saturate its pipe

We actually find we have sufficient server CPU to do the SSL encryption, but there is never enough bandwidth between the client endpoint and the remote server. Our servers won't permit this experiment (they require use of SSL), but doing a git clone over http:// runs at the same bandwidth as https://, as the choke points aren't the server CPU, but instead limited bandwidth on network links between client and server.

The Google side of the network is obviously shared with other services the company offers, and this bulk data transfer traffic may be prioritized lower than other data that users want immediately, such as web search results.

git.kernel.org mirror at Google

Posted Apr 25, 2012 23:02 UTC (Wed) by jgg (subscriber, #55211) [Link] (2 responses)

No, I mean literally: 'git fetch' sat for about 30 seconds before it printed anything and then it went (slowly) through more phases than normal for a git: URL, and then it finally started downloading (fairly quickly)

https:// seems to require more work than git://

eg:

remote: Counting objects: 2677, done
remote: Finding sources: 100% (2120/2120)
remote: Getting sizes: 100% (820/820)
remote: Compressing objects: 100% (263/263)
remote: Total 2120 (delta 1610), reused 1865 (delta 1525)
Receiving objects: 100% (2120/2120), 367.23 KiB, done.
Resolving deltas: 100% (1779/1779), completed with 549 local objects.
From https://kernel.googlesource.com/pub/scm/linux/kernel/git/...

(almost 1 min of run time)

remote: Counting objects: 6421, done.
remote: Compressing objects: 100% (1130/1130), done.
remote: Total 4935 (delta 4470), reused 4248 (delta 3790)
Receiving objects: 100% (4935/4935), 698.54 KiB | 504 KiB/s, done.
Resolving deltas: 100% (4470/4470), completed with 1068 local objects.
From git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable

(about 7 seconds run time)

git.kernel.org mirror at Google

Posted Apr 26, 2012 8:55 UTC (Thu) by juliank (guest, #45896) [Link] (1 responses)

They most likely do not use standard git on the server side, but store the repository in big table and have their own git implementation.

git.kernel.org mirror at Google

Posted Apr 26, 2012 15:19 UTC (Thu) by spearce (guest, #61702) [Link]

> They most likely do not use standard git on the server side,

We don't use git-core, no. :-)

> but store the repository in big table

Almost. We store some information in BigTable, and most data directly in Google's filesystem. The bulk of the data is actually in relatively normal Git pack files, but they are stored differently than git-core would do.

> and have their own git implementation.

Not really. We use JGit (http://www.eclipse.org/jgit) with the DFS storage package (org.eclipse.jgit.storage.dfs) and some glue to connect that code to BigTable and the Google filesystem. We have no custom patches to JGit, everything was upstreamed already months ago.

We haven't yet figured out how to open source the glue code. Its non-trivial in size, which indicates the stock org.eclipse.jgit.storage.dfs package is not sufficient on its own to run a service like this. But most of the glue code calls Google specific APIs, like the BigTable client library. We could port this code to another database, but that would just be code thrown over the wall. What is in JGit right now is at least exactly what we run, and thus something we maintain and use every day.