User: Password:
|
Log in / New account

Announcing GitTorrent: A Decentralized GitHub

At his blog, Chris Ball announces "GitTorrent," his new project designed to let developers host Git repositories on BitTorrent. The system takes advantage of Git's ability to run over arbitrary network protocols. "We ask for the commit we want and connect to a node with BitTorrent, but once connected we conduct this Smart Protocol negotiation in an overlay connection on top of the BitTorrent wire protocol, in what’s called a BitTorrent Extension. Then the remote node makes us a packfile and tells us the hash of that packfile, and then we start downloading that packfile from it and any other nodes who are seeding it using Standard BitTorrent. We can authenticate the packfile we receive, because after we uncompress it we know which Git commit our graph is supposed to end up at; if we don’t end up there, the other node lied to us, and we should try talking to someone else instead." The project is, obviously, a new one that still has important ground to cover—such as dealing with comments or pull requests—but there are interesting ideas to consider already.


(Log in to post comments)

Announcing GitTorrent: A Decentralized GitHub

Posted May 29, 2015 21:12 UTC (Fri) by rillian (subscriber, #11344) [Link]

Anyone else seeding? My clone gittorrent://github.com/cjb/gitorrent has been stalled for some hours now... :-)

Announcing GitTorrent: A Decentralized GitHub

Posted May 30, 2015 1:29 UTC (Sat) by pabs (subscriber, #43278) [Link]

It probably needs to query the central location at the same time as the DHT.

Also, when do we get git-remote-ipfs?

Announcing GitTorrent: A Decentralized GitHub

Posted May 29, 2015 22:35 UTC (Fri) by flewellyn (subscriber, #5047) [Link]

Great idea, actually. Distributed, decentralized repositories are the whole point of Git, so a decentralized, distributed means of accessing them via the network makes perfect sense.

Announcing GitTorrent: A Decentralized GitHub

Posted May 29, 2015 22:53 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

it's only a decentralized github if you ignore everything except git hosting. This makes no effort to provide anything else.

Which isn't to say that it's not worthwhile, just that the headline is bad.

Announcing GitTorrent: A Decentralized GitHub

Posted May 30, 2015 21:40 UTC (Sat) by hirnbrot (subscriber, #89469) [Link]

Everything remotely related to git has been called "github", so this just continues the pattern.

Announcing GitTorrent: A Decentralized GitHub

Posted May 31, 2015 0:04 UTC (Sun) by dlang (✭ supporter ✭, #313) [Link]

> Everything remotely related to git has been called "github", so this just continues the pattern.

no, "github" is the set of services provided by a specific company. It's nowhere close to "everything remotely elated to git"

Announcing GitTorrent: A Decentralized GitHub

Posted Jun 1, 2015 15:35 UTC (Mon) by flussence (subscriber, #85566) [Link]

Not to mention, "decentralized hub" is an oxymoron in itself.

And on closer inspection, all this does is distribute the bandwidth-heavy part of cloning a git repository - everything else is just as hub-like as before.

Announcing GitTorrent: A Decentralized GitHub

Posted Jun 1, 2015 22:01 UTC (Mon) by paulj (subscriber, #341) [Link]

If you can distribute everything but a minimal 'handle' needed to bootstrap, then it becomes much easier to switch.

Announcing GitTorrent: A Decentralized GitHub

Posted Jun 2, 2015 8:54 UTC (Tue) by fb (subscriber, #53265) [Link]

I don't mean to sound grumpy but I really don't get how any of this will help motivating anyone to get out of GitHub. FWIW, I'm still a happy GitHub user, and don't feel any motivation to leave it.

[...]

Lack of ways to serve code over "git://" is not what makes GitHub popular.

The power of GitHub lies in the way its platform gives you a formal & standard way to do pull-requests. It has a well defined way to send a PR to a given project owner, and a well defined way to comment back and forth on the pull-request. Same for bugs.

The fact that it is the very same identical interface for all projects I interact with (and that such interface is "good enough") is its killer feature.

How these folks expect a PR to take place? Over a mailing list?

Announcing GitTorrent: A Decentralized GitHub

Posted Jun 2, 2015 7:42 UTC (Tue) by fb (subscriber, #53265) [Link]

I think the "grand-parent post" point (which I find a pertinent one) is that _in practice_ many people conflate git (the scm tool) with github (the company), and that even this announcement is reinforcing this confusion.

Announcing GitTorrent: A Decentralized GitHub

Posted May 30, 2015 7:34 UTC (Sat) by graemes (subscriber, #3788) [Link]

I wonder if this is something the IA.BAK (http://iabak.archiveteam.org/) team could use?

No IPv6 support, no UDP holepunching

Posted May 30, 2015 15:42 UTC (Sat) by jch (guest, #51929) [Link]

The DHT implementation they're using doesn't support BEP-32 [1], and there's no support for the (undocumented) ut_holepunch extension, so you better set up port forwarding if you want to see any peers.

[1] http://www.bittorrent.org/beps/bep_0032.html

Announcing GitTorrent: A Decentralized GitHub

Posted May 31, 2015 15:13 UTC (Sun) by jond (subscriber, #37669) [Link]

This could be great for cloning large projects such as the kernel. Last time I used a github mirror which was faster for me than kernel.org bit I did wonder whether a multi-repo fetch would have been faster.

Announcing GitTorrent: A Decentralized GitHub

Posted May 31, 2015 22:47 UTC (Sun) by njwhite (subscriber, #51848) [Link]

> Last time I used a github mirror which was faster for me than kernel.org bit I did wonder whether a multi-repo fetch would have been faster.

As I read it this just uses bittorrent's DHT functionality to find hosts with the needed repo and then downloads a packfile from one of them, rather than parts of a packfile from multiple hosts at once. In which case the speed that comes from downloading from many peers in parallel with 'normal' bittorrent usage isn't present. But perhaps I just missed that part?

Announcing GitTorrent: A Decentralized GitHub

Posted Jun 1, 2015 4:32 UTC (Mon) by Otus (subscriber, #67685) [Link]

From the announcement post:
> Then the remote node makes us a packfile and tells us the hash of that packfile, and then we start downloading that packfile from it and any other nodes who are seeding it using Standard BitTorrent.

So it should download from multiple nodes using normal BitTorrent. However, will other nodes have *that* particular packfile? Will they know to also create it somehow or is parallelism dependent on someone else having requested that exact set of changes before?

I haven't looked at the code yet, so I don't know.

Announcing GitTorrent: A Decentralized GitHub

Posted Jun 1, 2015 5:43 UTC (Mon) by zenaan (subscriber, #3778) [Link]

Deterministic packfile creation is required for parallel git downloads.

Example git pack file parameter/ configuration variations:
- compression on X number of CPUs
- maximum packfile size
- more trees/ branches in this repo than on that repo

A way is needed to capture these packfile config variations and distribute them to other git servers (perhaps on a standardized branch name or ??).

In this way many "git servers" (or git torrent clients) can participate in a parallel/ multi server "git torrent". E.g. if "nrcpus" is set to say 8, a dual cpu git mirror needs to be able to reproduce the same packfile as though it too had 8 cpus. And if it's your "fork" of the "master repo", then any branches you add to your fork need to be separated into separate pack files, so that your primary pack files match the master repo's pack files (in order to participate in the git torrent swarm). This type of setup could potentially be very useful for deduplication on a site like github - though one might expect they already have some solution for this (git clone -s ?).

Unless a particular set of "pack file" parameters is standardized, participation in any particular git torrent would just require designation of the "master" repo - so the pack file params are obtained from it. And come to think of it, the "master"'s branches need to be designated anyway.

"Non master" repos could of course use their own pack file parameters, but would not be able to participate in the swarm.

It doesn't sound too hard to conceptualize, so one would hope it's possible to implement this.

Announcing GitTorrent: A Decentralized GitHub

Posted Jun 1, 2015 12:22 UTC (Mon) by pclouds (subscriber, #76590) [Link]

Related mail from 4 years ago: http://article.gmane.org/gmane.comp.version-control.git/1...

Conceptually it may not be hard, but implementation is hard. By forcing certain object layout rules, you may have lower compression ratio, or slower pack access, and may consume more power. Git tries to reuse deltas from existing packs to produce a new pack. This makes it quick to assemble a pack, but also underterministic. There's also threads stealing jobs from one another in the above link. Resumable clone is a frequent request, and we still don't have it now.

Announcing GitTorrent: A Decentralized GitHub

Posted Jun 2, 2015 4:59 UTC (Tue) by zenaan (subscriber, #3778) [Link]

Deterministic pack file parameter sets and compression can always be tuned over time even though they change the format - that's just local policy for the authoritative git torrent server.

Also for scenarios which benefit from pack file torrents, the marginal reduction in compression (increase in pack file size) due to the need for determinism may very well be valuable (marginal increase in local storage in order to distribute downloads) - local policy strikes again.

As long as my local mirror wishes to maintain repo torrent participation, then when the authoritative server tells me it is choosing a new parameter set, then I have all the commits and the new compression parameter set, so I can re-pack. [Although it may make sense to have a new "torrent ID" (dunno what that's called sorry) - either way, participating servers can locally regenerate the torrent pack files when this is deterministic.]

It's up to the "authoritative git server" admin to make the policy decision of how long to keep with a current deterministic torrentable pack file parameter set, and when to update to a new/tuned set. This is always a local policy matter! "We can't do that because it's not the best policy for maximum compression" is not the right answer here...

As "deterministic pack file parameter set" is tuned, this is simply a new version of the deterministic pack file format. A git torrent server provides its current set of parameters to others who have configured this server to be authoritative for this repo.

The parameters e.g. pack file size, compression version, branch set included by this server etc, are all server local (or "authoritative server"-local to be precise). So any torrent scenario implies an "authoritative server" for a particular repo. If I am a torrent repo mirror, the "authoritative torrent upstream" is merely a local config.

This not only sounds easy, it is easy - even in the face of compression technology changes and "tuning" over time - that's merely a "version" increase or new parameter set provided by the "authoritative" repo server, and is local policy to that server.


Copyright © 2015, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds