|
|
Log in / Subscribe / Register

Vetter: Why Github can't host the Linux Kernel Community

Daniel Vetter describes how the kernel community scales and why he feels that the GitHub model tends not to work for the largest projects. "Unfortunately github doesn’t support this workflow, at least not natively in the github UI. It can of course be done with just plain git tooling, but then you’re back to patches on mailing lists and pull requests over email, applied manually. In my opinion that’s the single one reason why the kernel community cannot benefit from moving to github. There’s also the minor issue of a few top maintainers being extremely outspoken against github in general, but that’s a not really a technical issue. And it’s not just the linux kernel, it’s all huge projects on github in general which struggle with scaling, because github doesn’t really give them the option to scale to multiple repositories, while sticking to with a monotree."

to post comments

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 8, 2017 17:14 UTC (Tue) by mm7323 (subscriber, #87386) [Link] (25 responses)

My experience of github is that it makes the cost of forking too low so that any vaguely interesting project spawns hundreds of abandoned copies with very little flowing back. That isn't real version control or community - it's more akin to duplicating sources into date-stamped directories on your own computer, like some of us used to do before version control became a thing.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 8, 2017 17:41 UTC (Tue) by micka (subscriber, #38720) [Link]

Maybe true, but you know I do the same thing without github all the time. Got is the "culprit", not github.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 8, 2017 19:03 UTC (Tue) by zdzichu (subscriber, #17118) [Link] (17 responses)

GitHub kinda requires forking a repo if you want to submit a patch. Without a fork, you cannot create a Pull Request, and some maintainers won't accept patches otherwise. IIRC there is no way to just open an issue and attach a patch. So even trivial one-liners result in a new fork of the whole repo.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 8, 2017 21:58 UTC (Tue) by ebassi (subscriber, #54855) [Link] (7 responses)

For trivial one-liners you can edit files directly from the web interface; it will create a pull request for you, and will not require a full fork.

Anything more complicated than that should go through the usual Git workflow: fork, push remotely, link the branch for merge.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 5:20 UTC (Wed) by bronson (subscriber, #4806) [Link]

That produces a fork too, it's just done quietly. If you hover over the edit icon, you'll see the help text: "Fork this project and edit the file".

I agree, it's unfortunate. Back in the early days I made an effort to keep my github presence clean and delete forks when my patches are merged. Now I just can't be bothered.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 12:56 UTC (Wed) by ballombe (subscriber, #9523) [Link] (5 responses)

> For trivial one-liners you can edit files directly from the web interface; it will create a pull request for you, and will not require a full fork.

How do you make sure your one-liners are correct ?

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 13:21 UTC (Wed) by anselm (subscriber, #2796) [Link]

Doesn't matter. If they aren't, Github makes it so simple for somebody else to go in and fix them!

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 14:11 UTC (Wed) by ebassi (subscriber, #54855) [Link] (3 responses)

I assume you set up appropriate CI so that all pull requests get ran through a build and test suite. You know, like you ought to do.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 16:39 UTC (Wed) by aggelos (subscriber, #41752) [Link] (2 responses)

Err, this feels circular. If the specific line being fixed was getting "appropriately" tested, how did it end up broken in the first place?

I suppose the intent of the 'edit this file' button is to lower the bar for drive-by contributions. Whether that is desirable probably varies per project (or, more likely, per file). Simultaneously, the feature appears to work against best practices for code (i.e. testing that your commit effects the intended behavior change or compiles, even) and bypasses the regular workflow (how does that work with e.g. commit hooks?). Perhaps ironically, the latter aspect can also be confusing to new users when the push of their local changes gets unexpectedly rejected even though they're the only person working on the project.

So, I'm having trouble seeing this UI as a good idea for code files (at least).

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 18:57 UTC (Wed) by edgewood (subscriber, #1123) [Link] (1 responses)

I find it convenient for fixing trivial bugs in packaged programs written in interpreted languages. That is, if I'm capable of fixing a bug by editing /usr/bin/foo or /usr/lib/.../foolib/..., I tend to do that instead of downloading the package source, fixing the bug, rebuilding the package, and installing the updated package.

If the bug still exists in the upstream version at Github, I can use the "Fork and edit the file" to contribute the fix as a pull request instead of a patch.

It doesn't work for compiled languages, or for changes that are big enough to be a pain to replicate in the Github editor.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 11, 2017 10:27 UTC (Fri) by aggelos (subscriber, #41752) [Link]

If the bug still exists in the upstream version at Github, I can use the "Fork and edit the file" to contribute the fix as a pull request instead of a patch.

I find that "the bug still exists in the [tip of the development branch]" is not generally (perhaps not even usually) something one can gedanken-test reliably. Human fallibility aside, I'm sure a lot of us would convince ourselves that a merge request is obviously applicable without really having exhaustively checked the code paths. I guess your experience is different.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 21:29 UTC (Wed) by epa (subscriber, #39769) [Link] (8 responses)

Yes, the need to fork the repository to make a pull request always seems like a speed-bump in the otherwise smooth Github experience. Why can't you just 'git clone' the repository to look around, maybe commit some stuff locally, and then when you 'git push' it automatically creates a pull request from what you pushed? Or at least gives you an easy way to fork the repository and push to your fork at that point, rather than having to fork it in advance.

(Being git, I have no doubt it is possible to clone the repository, fork it after the fact, then repoint my local working copy to the new upstream. My point is that it could be easier and smoother for drive-by contributions or random fixes where you had cloned but hadn't intended to make any changes, but happened to spot something.)

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 23:53 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

I always clone upstream as origin and then add my fork later (usually as gh/mathstuf) if necessary. I also set up gh/mathstuf as the default remote for pushing which makes it just a "git push" to make things available on my fork. I have a script for adding remotes for GitHub forks which makes it easy to fetch PRs directly from other contributors as well.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 10, 2017 19:00 UTC (Thu) by kpfleming (subscriber, #23250) [Link] (6 responses)

The patch you want to contribute has to live somewhere; in Git-land, that means a ref (branch) needs to refer to it. There are really only two options: you have permission to create branches in the upstream repository, or you don't. If you do, then you can certainly push a dev branch to that repository and then send a PR from it (to the same repository). Some projects work this way.

However, most projects don't want Joe/Jane Random Developer to be able to create branches in their official repository, so that means the contributor has to make a place to put them: this is called a 'fork' :-) What would be a nice improvement, though, is if GitHub (and GitLab) would offer 'fork without branches/tags', so that the new repository doesn't look like it's an actual fork, but is instead just a place to hold in-flight branches which are being submitted to the source repository.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 11, 2017 21:20 UTC (Fri) by epa (subscriber, #39769) [Link] (3 responses)

Indeed, a fork has to exist somewhere, but it could be created on the fly to hold the merge request -- I think this is what you are also suggesting. Moreoever, you shouldn't need to hit the Fork button on the github website before you start -- you could just clone, and when you push, Github will do the necessary. Of course, you could later upgrade the temporary repository holding your changes to a full fork.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 11, 2017 22:59 UTC (Fri) by kpfleming (subscriber, #23250) [Link] (2 responses)

In that scenario, what sort of URL would you push to? The original repository, even though you can't write to it? A 'virtual' fork URL in your personal namespace? Remember, on the client side it's just Git, which doesn't know anything about forks, GitHub API, or anything else which could be used to help the process... the only thing the client can do is upload a pack to a URL and set a ref.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 13, 2017 7:56 UTC (Sun) by epa (subscriber, #39769) [Link] (1 responses)

Yes I imagined you would 'git push' back to the original repository, but since you had cloned it without write permission, the changes would be made into a lightweight fork at Github.

The server would send back a message 'your changes have been pushed to a new fork at ... and you can make a merge request by ...' giving two URIs.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 14, 2017 22:20 UTC (Mon) by AdamW (subscriber, #48457) [Link]

That sounds like an awful lot of work for the server to do, and the question is...why? What's the actual benefit, or to put it another way, what actual harm is caused by there being a ton of forks that exist solely to back pull requests? Once you *know* that that's why there are zillions of forks on Github...what's the actual problem / harm?

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 14, 2017 17:31 UTC (Mon) by cesarb (subscriber, #6266) [Link] (1 responses)

> The patch you want to contribute has to live somewhere; in Git-land, that means a ref (branch) needs to refer to it. [...] However, most projects don't want Joe/Jane Random Developer to be able to create branches in their official repository

If you use github, you have no choice, since every pull request creates a ref directly in your repository. Try it: clone a github repository with --mirror and look at it with gitk --all, and you'll see all the pull requests as refs.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 14, 2017 17:51 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

Yes, but you can't push directly there, only by updating the merge request can you modify those refs.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 8, 2017 19:04 UTC (Tue) by smckay (guest, #103253) [Link]

You can make contributing back require arbitrarily small effort and still a lot of people just won't bother. They'll fork, tweak to their needs, and use what works for them. Of course open source works both ways: nothing prevents upstream from reviewing those changes and integrating them proactively if they're any good.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 8, 2017 21:31 UTC (Tue) by Frogging101 (guest, #113180) [Link]

This sounds like a non-issue. A hundred useless GitHub forks is no different from a hundred people downloading the code onto their local disk and never being heard from again (in fact, that describes almost all users of open source software who obtain it in source form).

I'd argue that GitHub forks are better because then at least any changes are publicly available for others (including upstream) to see and use.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 0:20 UTC (Wed) by ThinkRob (guest, #64513) [Link] (2 responses)

I may be missing something, but what's the harm in having a bunch of dead forks?

For large projects it's clear what the canonical project/repo is. And for smaller ones, they likely don't have enough fork traffic for it to be an issue.

I mean, yeah, it's messy when you go to view the graph, but... so what?

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 13:58 UTC (Wed) by adobriyan (subscriber, #30858) [Link] (1 responses)

> I may be missing something, but what's the harm in having a bunch of dead forks?

Kernel source code indexers were popular at some point so searching for some kernel function name (oopses or any other reason) gave several pages of useless links to outdated code. Dead forks may be positive to Github so that they can plot number of forks and show that user engagement is growing and so on but overall they have negative value because they are useless information making it more difficult to find real information for those looking for it.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 10, 2017 23:57 UTC (Thu) by ThinkRob (guest, #64513) [Link]

I would assume that any kernel-wide search tool would simply search master or the actual release tags/branches, not *all* branches... no?

(So basically, I'm saying the bug is in the tool(s), not Github.)

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 8:51 UTC (Wed) by daniels (subscriber, #16193) [Link]

On the other hand, it's not like the kernel's immune from having abandoned throwaway forks where changes don't flow back and there's no real version control or community ...

GitHub is also an archival nightmare

Posted Aug 8, 2017 22:36 UTC (Tue) by Frogging101 (guest, #113180) [Link] (2 responses)

E-mail is decentralized and self-contained. *All* the relevant data and metadata in a patch discussion over email is contained within the messages that get sent to all participants. There's no out-of-band tagging, editing, or comments that get locked away in some database. If you receive it, then you have it. All of it. Just like a Git repository.

Mailing list relays and archives can be replaced. GitHub's issue tracker, on the other hand, is a single point of failure. If and when GitHub blows away one day, it's all gone.

GitHub is also an archival nightmare

Posted Aug 9, 2017 13:17 UTC (Wed) by jamessan (subscriber, #12612) [Link] (1 responses)

That's why tools like github-backup (https://github-backup.branchable.com/) exist. You can regularly backup all the metadata issues, PRs, etc. in the git repo itself.

GitHub is also an archival nightmare

Posted Aug 22, 2017 14:41 UTC (Tue) by jani (subscriber, #74547) [Link]

The point here is not that you'd be unable to back up your github data. The point is that if github so decides, you'd have no place to restore your backups to. If your whole workflow is based on a proprietary single-point-of-failure system, you need to be prepared to process and restore the data to a different system altogether. If you care about continuity, that is.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 11:12 UTC (Wed) by error27 (subscriber, #8346) [Link] (11 responses)

I guess I just can't imagine a faster way to review code than from inside mutt. You trigger a macro and you can view it inline. You pipe it through a script and filter out the irrelevant white space changes. It's one button to apply a patch. If you have an issue, you just press "r" for "reply". Instead of "closing a patchset with EWONTFIX" because you don't care or whatever you just hit CTRL-R and it marks the thread as read.

How can it get faster than that?

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 11:31 UTC (Wed) by daniels (subscriber, #16193) [Link] (9 responses)

> How can it get faster than that?

By not having to spend a great deal of time setting up your MUA and a fleet of scripts in the first place?

Also by keeping the history of the iterations, so you have the commentary on previous revisions, and the changes between them, in one place. As opposed to having to try to dig through your mailbox for the previous threads where it was discussed, and do manual diffs to figure out what changed.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 12:18 UTC (Wed) by pizza (subscriber, #46) [Link] (1 responses)

> Also by keeping the history of the iterations, so you have the commentary on previous revisions

...Because MUAs don't have search features?

(Seriously. 'notmuch' is amazing. I have it indexing more than two decades' worth of my email, text messages, and everything else that can archive into email..)

FWIW, I also consider github's UI rather awful to use in practice. Still a lot better than sourceforge, though.

But IME, by far the best UI for git-based workflows I've come across is gerrit.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 23:50 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

I think we've hashed this out before (I've really soured on Gerrit from previous iterations and recent interaction with Qt's Gerrit instance didn't inspire confidence either), but I do lots of code review on both Gitlab and GitHub. Personally, I find Gitlab to be better. The ability to manually mark comments as "resolved" rather than GitHub assuming things are fixed if the commented line is changed is way better for incremental review. Gitlab also keeps links available for "what changed in this push" rather than only via notifications or a one-time link on first view after an update. I also find "mark notification as solved" upon viewing very inconvenient. Gitlab has both manual dismissal, manual "add a to-do", and it removes it upon commenting in the MR.

One wish for Gitlab is the "make a bunch of comments at once" which I believe is in the pipeline. Once that is there, Github's review is strictly inferior IMO.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 12:24 UTC (Wed) by pizza (subscriber, #46) [Link] (6 responses)

> By not having to spend a great deal of time setting up your MUA and a fleet of scripts in the first place?

This is something that I still don't quite understand -- Folks simultaneously complain about complexity (eg in workflows) but at the same time, refuse to use tools that can far better manage that complexity.

But I suppose that's the difference between amateurs and professionals.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 12:34 UTC (Wed) by daniels (subscriber, #16193) [Link] (3 responses)

> This is something that I still don't quite understand -- Folks simultaneously complain about complexity (eg in workflows) but at the same time, refuse to use tools that can far better manage that complexity.

Wearing a crash helmet really cuts down on the pain caused by running head-first into a brick wall. But sometimes, not spending your days charging into brick walls can be better than investing in helmets.

> But I suppose that's the difference between amateurs and professionals.

Thanks for your honest appraisal of my skills and career; it's really a positive contribution to the conversation.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 13:46 UTC (Wed) by pizza (subscriber, #46) [Link] (2 responses)

> Wearing a crash helmet really cuts down on the pain caused by running head-first into a brick wall. But sometimes, not spending your days charging into brick walls can be better than investing in helmets.

...If your job involves a high probability of crashing into brick walls, you really don't get to complain about your head hurting when you don't invest in a suitable helmet.

As the saying goes, "It's a poor craftsman who blames their tools." -- Blaming your tools means either that you lack skill, or that you chose your tools poorly because you lack the experience and skills to choose correctly. [1] (Or that you're simply too lazy to learn, which is an even worse trait to have in this profession..)

>Thanks for your honest appraisal of my skills and career; it's really a positive contribution to the conversation.

Why are you taking what I said as an attack on you?

[1] https://news.ycombinator.com/item?id=2380679

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 10, 2017 0:17 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

If you're working on construction you do wear a helmet but you don't use motorcycle helmets, despite them obviously providing more protection in case of accidents. Perhaps a better analogy.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 13, 2017 14:35 UTC (Sun) by Wol (subscriber, #4433) [Link]

> As the saying goes, "It's a poor craftsman who blames their tools." -- Blaming your tools means either that you lack skill, or that you chose your tools poorly because you lack the experience and skills to choose correctly. [1] (Or that you're simply too lazy to learn, which is an even worse trait to have in this profession..)

Actually, that saying comes from the days when a craftsman made their own tools - so blaming your tools was still admitting to being a crap craftsman. Like Linus moaning that linux doesn't work right, or rms blaming gcc for a problem. (Yes, I know they're both no longer all their own work ...)

Cheers,
Wol

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 14:23 UTC (Wed) by rrdharan (subscriber, #41452) [Link] (1 responses)

Yes, professionals know not to dismiss or ignore setup costs.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 14:36 UTC (Wed) by pizza (subscriber, #46) [Link]

> Yes, professionals know not to dismiss or ignore setup costs.

...And professionals also know to not dismiss or ignore the benefits that stem from those costs.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 15:01 UTC (Wed) by josh (subscriber, #17465) [Link]

> Instead of "closing a patchset with EWONTFIX" because you don't care or whatever you just hit CTRL-R and it marks the thread as read.

Hopefully you also reply to the sender with some feedback.

Hopefully, the many other people also reviewing such issues don't have to re-do the same evaluation you just did.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 9, 2017 11:19 UTC (Wed) by janc (guest, #95095) [Link] (1 responses)

Yeah... try doing review for a non-trivial PR that gets few iterations before being merged.

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 10, 2017 10:35 UTC (Thu) by error27 (subscriber, #8346) [Link]

That's true. I don't often care about the diffs between two versions of patchsets in staging...

Vetter: Why Github can't host the Linux Kernel Community

Posted Aug 20, 2017 18:23 UTC (Sun) by kevynalexandre (guest, #68129) [Link]

There've been a issue open on GitLab side:

Make Monotree Merging Magnificent:
https://gitlab.com/gitlab-org/gitlab-ce/issues/36239


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds