Rethinking the OpenStack development process

By Nathan Willis
September 4, 2014

OpenStack is, by any measure, a successful open-source project—attracting dozens of companies and hundreds of individual contributors, who implement a number of services and core features. But such success has its drawbacks. Recently, the OpenStack project began debating whether or not it needs to make major changes to its development process, to hopefully reduce the backlog of contributions and bugfixes awaiting review. Everyone involved seems to agree that a more formal process is the answer, with some measure of priority-setting for each release cycle, but exactly what form the changes should take is less clear-cut.

The trouble with success

On August 5, Thierry Carrez raised the issue on the openstack-devel list. He noted several interrelated problems, including the fact that more and more contributors are putting time in on code that scratches their own itch (which he called "tactical" contributions), and fewer contributors are left to tackle the "strategic" issues. Those issues include somewhat contentious questions like which outside projects become "blessed" by the Technical Committee and make it into the OpenStack integrated release, and which are granted "incubator" status.

There is, naturally, a big advantage to being blessed and to receiving incubator status; both are perceived as validation of a project's value, and there is clear marketing clout that accompanies being designated part of OpenStack proper. But the desire of so many projects to make it to incubator or integrated status has caused considerable work to pile up for OpenStack's core team, and that endangers the project's ability to put out viable releases.

Carrez asked whether a moratorium on new feature work should be considered, and also whether it was time to tighten up the rules for officially blessing contributed projects.

We seem to be unable to address some key issues in the software we produce, and part of it is due to strategic contributors (and core reviewers) being overwhelmed just trying to stay afloat of what's happening. For such projects, is it time for a pause ? Is it time to define key cycle goals and defer everything else ?

On the integrated release side, "more projects" means stretching our limited strategic resources more. Is it time for the Technical Committee to more aggressively define what is "in" and what is "out" ? If we go through such a redefinition, shall we push currently-integrated projects that fail to match that definition out of the "integrated release" inner circle ?

By and large, the rest of the list subscribers seemed to agree with Carrez's formulation of the problem, though not necessarily on the best approach to fix it. Monty Taylor suggested it was time that the Technical Committee start being more selective about what contributions it accepts—by figuring out what exactly defines OpenStack itself and being prepared to say "thanks, but no thanks" to work that falls outside the definition. As an example, Devananda van der Veen cited the Heat and Ceilometer projects, both of which were blessed into the integrated OpenStack release while only used in a single (major) deployment, and which many in the community still seem to think have significant problems.

But what goes in to the OpenStack core and what stays out is a difficult question. As Sean Dague succinctly put it:

Everytime I turn around everyone wants the TC to say No to things, just not to their particular thing. :) Which is human nature. But I think if we don't start saying No to more things we're going to end up with a pile of mud that no one is happy with.

In contrast, Eoghan Glynn argued that cutting back on the number of projects was less important than catching up on the backlog of QA and review work that has accumulated over the past few development cycles. Taking a full release cycle to focus on solving existing quality and performance problems, as well as implementing in-project functional testing, would pay the most dividends in the long run.

Ground control

Michael Still, among others who agreed with Glynn on the value of taking a "time out" cycle, reported that the OpenStack Nova team had recently discussed the same issue, and had proposed a formal process to rate-limit the number of new features being considered for inclusion at any one time—the idea being that the number of feature "slots" (or "runways") available would be fixed, and in line with the number of reviewers who could oversee development during the cycle. The available runways could be divided up between new-feature work and efforts to make up "technical debt," and that division could be adjusted in either direction for a given release cycle.

For each new cycle, developers interested in getting their project accepted would have to define a specification, present it to the Technical Committee for approval, then begin work. But the project team would wait in a holding pattern until the Technical Committee decided that there was a free runway, at which point the project could land its code. The theory, of course, is that the gatekeepers know how many runways they can manage simultaneously, and thus how many new features should land for each release. But an important side effect of the plan would be that (in theory) developers whose project was left in a holding pattern would be free to work on something else for a while, thus maximizing the efficiency of the overall OpenStack contributor pool.

Quite a few list subscribers expressed support for the idea; Stefano Maffulli likened it to the kanban principle in manufacturing. But there were firm dissenters as well. Glynn argued that the plan hinged on developers voluntarily not pursuing their projects when those projects were in the holding pattern, which is not likely to happen often and which OpenStack cannot force them to do. Maffulli, though, countered that OpenStack actually can control quite a few contributors, in particular those who are assigned to the project by its corporate members and sponsors.

Others took issue with the idea at a more fundamental level. Daniel P. Berrange argued that treating stability work as an occasional goal for some release cycles is the wrong approach; rather, "I'd like to see us have a much more explicit push for regular stabilization work during the cycle, to really reinforce the idea that stabilization is an activity that should be taking place continuously." Others were concerned that the fixed number of runways was inflexible or would cause problems of its own. Kyle Mestery asked what would happen if one of the projects given a slot slowed down, either in development or in the review stage, and turned into a bottleneck. In reply, Joe Gordon said that the Technical Committee would just take the slot away and reassign it to someone else.

But the most often-cited objection to the plan was that it introduces a new bureaucratic process. The ultimate goal, many said, was just for the project to better communicate its development priorities for a given release cycle. Adding formal hoops to jump through might clarify the priorities, but it would do so in a decidedly inflexible way. As Berrange expressed that objection from the viewpoint of his own workflow:

I don't want to waste time waiting for a stalled blueprint to time out before we give the slot to another blueprint. On any given day when I have spare review time available I'll just review anything that is up and waiting for review. If we can set a priority for the things up for review that is great since I can look at those first, but the idea of having fixed slots for things we should review does not do anything to help my review efficiency IMHO.

Eventually, Russell Bryant suggested a less strict, alternative approach: focus on crafting a priorities list, expect reviewers to be guided by keeping an eye on that list, but do not allow the list block the development of other things.

Hashing out a plan

Bryant's proposal garnered a lot of support and few objections, but the final approach that the project will take is certainly not settled. Gordon opened an issue to track the proposal in OpenStack's Gerritt system. As of today, there have been seven patches committed to the plan, and it more closely reflects Bryant's flexible approach than anything else. The document exists as a set of guidelines for blueprints in OpenStack's upcoming Kilo release cycle.

The plan, as outlined, requires the project to "pick several project priority themes, in the form of use cases, to help us prioritize work", then generate a list of those blueprints based on the themes. The list will be published, and volunteers will be encouraged, but not forced, to work on them. Similarly, reviewers "looking for direction of where to spend there blueprint review time" are expected to use the list, but, notably, if the community and the list diverge, "it means the core team is not aligned properly and should revisit the list of project priorities".

Compared to the fixed-number-of-slots approach, this plan is certainly more flexible. But, of course, it will also need to be put into practice at least once before any final judgment can be made. Perhaps, as some fear, a project as large and rapidly-growing as OpenStack will require a heavier hand to steer in a predictable direction. Several people in the mailing list discussion noted that the Linux kernel also revisited its procedures several times as its contributor base grew substantially, and today uses a rather formal hierarchy of subsystem maintainers.

But perhaps OpenStack's architecture is too different from the kernel's, and OpenStack users view it as more of a toolkit than a tool; in that case, a looser structure might make sense. In either case, as Zane Bitter put it, "this is what success looks like". Having so many contributions that they are difficult to wrangle is certainly a difficult problem to solve, but it is also one that few open-source projects consider a bad sign.

Rethinking the OpenStack development process

Posted Sep 5, 2014 9:16 UTC (Fri) by rwmj (subscriber, #5474) [Link] (4 responses)

As an occasional OpenStack contributor, I find there are a couple of tools problems.

Firstly Gerrit is a horrible way to review code. There are multiple problems, from the user interface, to capricious failures in the automated tests that don't necessarily reflect real problems.

Secondly Launchpad is a horrible way to track blueprints and bugs. I can't work out how to find all new bugs/blueprints related to topic X (well actually, I did -- by using a big set of Google Alerts -- but that's working around the problem).

Rethinking the OpenStack development process

Posted Sep 5, 2014 22:59 UTC (Fri) by marcH (subscriber, #57642) [Link] (2 responses)

I don't find Gerrit's user interface great either but the workflow is good and is there any good free alternative anyway? (Email no thanks)

Rethinking the OpenStack development process

Posted Sep 6, 2014 0:15 UTC (Sat) by jeff_marshall (subscriber, #49255) [Link]

I haven't tried gerrit, but review board is pretty nice once you get it integrated into your workflow.

Some of the features cost money (pdf review, in particular), but the core code review is free.

Rethinking the OpenStack development process

Posted Sep 6, 2014 0:45 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

I'm not really a fan of its workflow either. Anyways, to be more specific in my main issues with Gerrit: there's no "branch view" which makes reviewing branches with reverts a pain unless you're happy with rebases, a commit can only ever belong to a single branch (which means branch A must land before branch B can even start review), keeping track of comments on not-the-latest patchset is utter shit, theres no location for branch-level comments such as a rationale for the branch in the first place or instructions for reviewers, and the emails are borderline useless.

Rethinking the OpenStack development process

Posted Sep 6, 2014 0:28 UTC (Sat) by angdraug (subscriber, #7487) [Link]

Gerrit's UI certainly takes some getting used to, but it remains the best free software code review tool out there. And you certainly can't blame OpenStack CI infrastructure problems on Gerrit, it honestly tracks passed and failed tests, and having a false positive test failure disrupt your own work by blocking a commit from getting merged is much better than disrupting everyone's work by merging a commit that really fails the test.

Rethinking the OpenStack development process

Posted Sep 5, 2014 14:12 UTC (Fri) by mjthayer (guest, #39183) [Link]

If new features were queued and reviewed on a first in first out basis this might encourage contributors to help with the review process in order to get their contribution moving faster. The quality of the reviews might affect the treatment of their own contributions of course.

Rethinking the OpenStack development process

Posted Sep 5, 2014 23:02 UTC (Fri) by marcH (subscriber, #57642) [Link]

Could a charitable person find Linus' email where he explains that here not too little but rather too much software in the world and why his job is to say "no"? I can't remember any specific enough search keyword in it...