New CPython workflow issues

By Jake Edge
May 24, 2017

As part of a discussion in 2014 about where to host some of the Python repositories, Brett Cannon was delegated the task of determining where they should end up. In early 2016, he decided that Python's code and other repositories (e.g. PEPs) should land at GitHub; at last year's language summit, he gave an overview of where things stood with a few repositories that had made the conversion. Since that time, the CPython repository has made the switch and he wanted to discuss some of the workflow issues surrounding that move at this year's summit.

He started by introducing himself as the "reason we are on GitHub"; "I'm sorry or I hope you're happy", he said with a chuckle. He wanted to focus the discussion on what's different in the workflow and what has been added (or will be) to try to make core developers more productive using the new workflow.

A bot to check whether a contributor has signed the Python contributor agreement (also known as the "CLA") was first up. Pull requests at GitHub are checked to ensure that the contributor has signed the form by "The Knights Who Say 'Ni!'", which labels the request based on the CLA status. It runs asynchronously and is not specifically tied to GitHub, Cannon said, so a switch to GitLab or elsewhere could be made some day if desired. There is also an "easter egg" in the bot regarding shrubbery, he noted.

It still takes one business day to process a newly signed CLA, so a contributor who has a pull request rejected because of the label is asked to wait that long and try again. Alternatively, any core developer can remove the CLA label, which will cause the bot to check again. There is an ongoing question about those who do not wish to agree to GitHub's terms of service, thus do not have a GitHub account. Some code contributed by a developer in that position was part of a pull request from another contributor, who credited the originator; the originator has not signed the CLA, however, so it is all in something of a "legal quagmire" at this point, Cannon said.

Another new bot is bedevere, which is meant to check pull requests for various kinds of problems. It currently checks to see if there is a reference to a bugs.python.org issue number in the title of the pull request and complains (with a reference to the Python Developer's Guide) if there is not. The issue numbers are now being placed into namespaces, in case the bug tracker changes down the road. So bedevere expects to find "bpo-NNNN" in the title (or "trivial").

The intent is to add more checks to bedevere to ensure the proper labels are present and that all of the status checks have passed for the pull request. Something that is not always being done for pull requests is to have reviewers approve changes made in response to their comments. Clicking to approve the changes is not a huge thing, he said, but it is handy and does get recorded by GitHub. In addition, references to previous pull requests should be done using the "GH-NNNN" namespace, which will still automatically link to the previous request, as the current "#NNNN" usage does. Checks for those kinds of things may be added as well.

Cannon then moved into the process to backport a pull request to an earlier version of Python. There are two different ways to do it, he said, either manually or the new, easier way. The manual way involves doing a "git cherry-pick" and then creating a pull request. The title of that request should have the branch target in it (e.g. "[3.6]") and the pull request ID from the original should be left in place. The "needs backport to X.Y" label should be removed from the pull request that was cherry-picked from.

The better way, though, is to use the cherry_picker tool that was recently written by Mariatta Wijaya. It automates most of the cherry-picking process. Developers just need to click one green button to create the pull request, then go remove the backport label from the original. The hope is that the label removal will be automated soon as well.

Plans

The goal of the GitHub switch has always been to make the workflow better, Cannon said, not just comparable to what was there before. The switch is now just three months old, which "feels like forever for me", so there are still some things to work through to make the workflow better than it was.

One area that has been problematic for some time is the maintenance of the Misc/NEWS file that contains small blurbs about changes made for a release. It is often the source of trivial, but annoying, merge conflicts, especially for backports. Moving forward, Larry Hastings has developed a blurb tool that will walk developers through the process of creating NEWS entries. Those entries will be stored as standalone files, with subdirectories for different sections of the file, and blurb can create the NEWS file on demand from this hierarchy.

There is a need for a bot to automate the backporting of pull requests based on the labels they contain. It could do the cherry-picking and create the pull request for each branch. Currently, he has four people who want to help work on the problem, so he needs to get them all working on the same code base. A more explicit link in pull requests to the bugs.python.org issue is desired as well. The current plan is to modify bedevere to add the link in the body of the pull request. He thinks that adding a message to each pull request for the link might get too noisy.

There has been some thought of automating the creation of the Misc/ACKS file, which lists all of the contributors to Python. He is not sure if anyone really cares about automating that, nor if it is particularly important, but it could perhaps be done. It could lead to the creation of something similar to thanks.rust-lang.org, he said. Grabbing the Git committer information from pull requests to go into ACKS would be a start.

Another thing that is needed is a way for reviewers to know when they should re-review a pull request. He suggested that bedevere could be changed to add a "[WIP]" (work in progress) tag to the request, which the contributor could remove when they are ready for review; they could also leave a comment that mentions the reviewer. But Guido van Rossum suggested doing it more like Phabricator, which he uses at Dropbox, does things: developers do not push their changes until they are ready for review. Others pointed out that may not work as code has to be pushed for the automated tests to run, so the developer may not know whether the code is ready until after they push. Cannon said that he had just begun to think about the problem, but it is clear that something is needed.

Another area that may need attention is a way to close stale pull requests. It is not a big problem yet, but may become one, he said. There could be a bot that measured the number of days since the last change and automatically close those that pass some threshold; the same should be done with those lacking a CLA signature. That code shouldn't really even be viewed until the CLA has been signed.

All are welcome to participate in the core-workflow project, Cannon said. There is an issue tracker on GitHub and a core-workflow mailing list that those interested should investigate.

Ned Deily asked if anyone had worked out a good strategy for dealing with GitHub email. As release manager, he would like to be able to see everything that is going on, which he used to be able to do, but is rather daunting for GitHub. There is someone working on an email bot, Cannon said, which may help. Barry Warsaw suggested sending all of the email to a mailing list that could be archived and put on Gmane, which Deily said would work as long as the threading was handled correctly.

Cannon ended the discussion by saying that he hopes to get to a point where the workflow is no longer the reason that core developers don't have the time to review pull requests. If folks simply don't have the time, that's one thing, but it should not be the workflow that holds them back. The CPython project currently has a delta of four additional pull requests every day and it would be great if the workflow improvements helped that number decline.

[I would like to thank the Linux Foundation for travel assistance to Portland for the summit.]

Index entries for this article
Conference	Python Language Summit/2017

New CPython workflow issues

Posted May 24, 2017 19:14 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (2 responses)

I've been thinking about handling backportable topics for the workflows I'm involved with as well. The easiest way I've been able to think of is to have something in the description of the PR (it's parsed as a Git trailer for those familiar):

Topic-backport: for-3.8

Now, this requires a bot to also handle the merging of PRs (no button!) to do the backport itself. For topics where merge conflicts occur or the fix is slightly different, you make a branch off of each release it is intended to go into and merge them all together into the one headed for master (using `-s ours` for the backport branches) and submit that for a PR. Then your backport can look like:

Topic-backport: for-3.8:HEAD^2

using the `:refname` syntax to indicate which commit from the HEAD of the topic is meant to be backported. This keeps all of the commits for a single change limited to a single PR rather than spread out across numerous ones. Github review doesn't make it easy to see the diff for those backport commits, but posting URLs for viewing the diff should aleviate that (though comments there are sort of out in the aether :/ ).

As for the maintenance of the `NEWS` file, collecting from a lot of files has been way better than trying to keep a single file out of numerous conflicts (CMake and Gitlab use this for their release notes).

For stale PRs, I like having an "expired" label that gets added when closing so that you know it was closed due to inactivity, not rejected. A comment describing that reopening the PR when work continues helps as well.

New CPython workflow issues

Posted May 25, 2017 15:22 UTC (Thu) by jnareb (subscriber, #46500) [Link] (1 responses)

> As for the maintenance of the `NEWS` file, collecting from a lot of files has been way better than trying to keep a single file out of numerous conflicts (CMake and Gitlab use this for their release notes).

Another way of solving the problem if conflicts in NEWS or ChangeLog file is to use special file-level merge driver that knows the structure of the file, which would automatically merge NEWS file entries, without conflicts. There is one for GNU ChangeLog; I think there is one for NEWS file, and if not it would be easy to write.

New CPython workflow issues

Posted May 25, 2017 17:55 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

While that works if you use a built-in merge driver, an external one means that you need some kind of setup script to initialize the configuration and driver itself before you can actually resolve conflicts in the file. Plus, doing the split release note files makes it easier to backport since it just drops a file rather than having to disentangle the surrounding notes from the master branch.

New CPython workflow issues

Posted May 25, 2017 6:05 UTC (Thu) by smurf (subscriber, #17840) [Link] (1 responses)

> code has to be pushed for the automated tests to run

So add a way to tell the checker to run on your feature branch in your forked repository, instead of relying on merge requests.

New CPython workflow issues

Posted May 25, 2017 10:20 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

As a maintainer, I don't mind having merge requests open for testing purposes as long as they are clearly marked as such. It makes things coming down the pipe more visible for other contributors to look at, you can catch egregious problems before too much time is spent going down the wrong road, and you typically have zero access on forks to install webhooks or even send a commit status result. Granted, before embarking on large projects, communication should be done, but there's no fundamental reason that WIP or RFC topics should be forced to go elsewhere. Unless you have a dedicated RFC process already (PEP for Python or the RFC repo style used by Rust).

I know Gitlab-CI can be configured to build on pushes to forks, but if your testing doesn't fit into that paradigm, you're kind of stuck.

New CPython workflow issues

Posted May 26, 2017 0:24 UTC (Fri) by pdmccormick (subscriber, #69601) [Link]

I love that Brett frequently wears a PyCon Canada T-shirt for public events. Thanks Brett!

New CPython workflow issues

Posted May 29, 2017 8:08 UTC (Mon) by dag- (guest, #30207) [Link] (3 responses)

Reading back on the decision to use GitHub over GitLab, I am surprised that convenience (familiarity) was considered more important than Open Source tooling. The advantage to using Open Source tooling is that specific functionality/requirements in the tooling can be contributed by the larger community (including the Python community), whereas with GitHub you depend for new functionality or integration on an external party with different objectives.

New CPython workflow issues

Posted May 29, 2017 19:15 UTC (Mon) by darwish (guest, #102479) [Link] (2 responses)

Entire businesses are earning billions because they are more friendly and streamlined than their competitors (iOS and iPhones come to mind).

I understand the argument from a code freedom perspective, but if sacrificing some freedom will make the submission experience frictionless for contributions, I'm all for it.

This is even more important in a community like Python, where ease of use, and contributors outside of the traditional FOSS/Linux culture are included. It's ironic that github, the proprietary service, has democratized open source contributions beyond the isolated culture of submitting patches by clear-text mail or as attachments in bugzilla.

These "contributor experience" things may be small for us, engineers experienced in contributing to classical projects like the kernel or the rest of the plumbing layer (minus systemd, which also uses github in full). Nonetheless I've witnessed it with my own eyes how github significantly eases the contribution barrier for the younger generation.

New CPython workflow issues

Posted May 29, 2017 19:29 UTC (Mon) by dag- (guest, #30207) [Link] (1 responses)

This may all be true, and once GitHub had no real competition in this area. But GitLab is now feature-complete and exceeds GitHub in many aspects. My remark related to convenience was more about what people already know (mindshare) rather than real convenience, as in "efficiency".

And having used both extensively, GitHub in the community and GitLab at various companies. There is a clear benefit to work with an Open Source project (to which one can troubleshoot and contribute) over a hosted service offering (with weird incomplete workflows and suboptimal implementations).

I am not saying GitLab is perfect, but at least everything that's bothering you that much, you can dive in and fix it yourself. (Which I also did a few times) I would even say that some of the stuff that people now do in bots should move back into GitLab workflows when the time is right and there's general consensus. With GitHub we have been waiting for years for something to be fixed, or some feature to appear (like issue/PR templates). GitHub's review workflow is completely broken by design...

New CPython workflow issues

Posted Jul 6, 2017 3:24 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

What's wrong with GitHub's review in your view? I prefer parts from each (I wish GitHub had resolvable subthreads and Gitlab had batch code comments), but prefer Gitlab overall. I do review much more code with Gitlab though, so maybe it's just a difference in scale.