|
|
Subscribe / Log in / New account

The costs of forks

By Jake Edge
June 17, 2015

The perils of code forks are well-known within free software communities: as the code diverges, fixes and new features are only applied to one branch or the other, which leads to further divergence—and bugs that persist far too long. The Fuel project, which is a graphical user interface (GUI) to help deploy, test, and manage OpenStack installations, has seemingly learned that lesson anew.

A request from Emilien Macchi for more collaboration between Fuel and the Puppet OpenStack project kicked off the discussion. Puppet is a configuration management utility that is used by Fuel to assist in deploying OpenStack. But Fuel has forked some of the Puppet modules it uses from the Puppet OpenStack project—which creates Puppet modules for OpenStack components—into its Fuel Library repository. Macchi noted a number of problems with how that has been handled over the last two years.

He listed three areas of concern. Bugs are reported against Fuel and fixed there, but the fixes fail to propagate back to the Puppet OpenStack modules. Sometimes fixes are submitted to Puppet OpenStack, but without tests and without any push to get them committed because they have already been fixed in Fuel. Finally, code is moving from the modules into Fuel but not using mechanisms that would maintain the history and author attribution of the patches (e.g. using Git merges). Macchi suggested that working together would be beneficial:

We have the same goals, having Puppet modules better. I think it can be win/win: you have less diff with upstream and we have more hands in our module maintenance.

Matthew Mosesohn explained that Fuel was acting similar to other OpenStack projects with regard to forking and bundling other components into its repositories. It may not be "the most community-oriented approach", but it is part of what allowed Fuel to stabilize and mature, he said. The directory structure in the Fuel Library makes it difficult to directly merge from the upstream modules, he said.

Fixes not flowing from Fuel to Puppet OpenStack is a problem, Mosesohn continued. The Fuel team has a policy adopted back in October that patches to the Puppet OpenStack modules will not be merged unless they have been submitted upstream or an upstream bug has been filed. Lastly, he suggested that bugs found in the modules that had already been reported or fixed in Fuel should result in a ping on the Fuel IRC channel "and we can try to figure out how to get this applied upstream correctly".

But Thomas Goirand (and others) found that to be a bit backward. Fuel is the downstream project and should pursue an "upstream first" strategy:

You shouldn't ask Emilien to track the work of the Fuel team, and ping them on IRC to contribute back. It should be up to them to directly fix upstream *first*, and *then* fix back in Fuel.

For his part, Macchi welcomed the discussion, but also felt that Mosesohn and the Fuel project were approaching the problems from the wrong direction.

Puppet OpenStack modules provide the original code. If there is a bug, it has to be fixed in the modules. Puppet OpenStack developers don't have time/bandwidth and moreover don't want to periodically have a look at Fuel git history. I'm not sure this is the best solution for the community.

Macchi and Dmitry Borodaenko (who picked up the discussion on the Fuel project's side) seem to have a fundamental disagreement about two things. Macchi is unhappy that the commit history is getting lost when Fuel does large copy-and-paste patches from the Puppet modules; he mentions possible license issues, but that is really a red herring, as Borodaenko pointed out. However, as James Bottomley described, forks of that nature build up a huge technical debt that can be enormously expensive to unwind:

The data that supports all of this came from Red Hat and SUSE. The end of the 2.4 kernel release cycle for them was a disaster with patch sets larger than the actual kernel itself. Sorting through the resulting rubble is where the "upstream first" policy actually came from.

The other area of contention is about what level of responsibility the Fuel project and its developers should have to not simply submit their patches upstream, but to respond to the comments, respin the patches, and shepherd them through the process of getting them into Puppet OpenStack. Borodaenko suggested that the Puppet OpenStack team take over that role as needed. He asked about finding a middle ground, perhaps:

Fuel team agrees to propose all our changes to upstream (i.e. do a better job at something we've already committed to unilaterally), and you help us land the patches we propose, and take over those that get stalled when the submitter from Fuel team has moved on to other tasks?

But Macchi would rather see Fuel developers working in the upstream project. The Puppet OpenStack team would be happy to help Fuel developers get involved with its community, as that was Macchi's goal in starting the thread. But the team does not just want to pick up whatever Fuel submits:

If I understand correctly, you're asking for Puppet OpenStack group to take over patches that are sent from Fuel team but have negative reviews (not passing unit tests, not compliant with Puppet best practices), just because they have to switch to another task and they can't take time to finish the upstream work?

This is definitely not how OpenStack works.

As it turns out, the divide is not as big as it might seem. Borodaenko is on-board with the overall goal of having Fuel use the upstream Puppet OpenStack modules directly; the difference is mostly about how to get there. In fact, he advocated working more closely with the Puppet OpenStack upstream back in March 2014. His concern is that overcommitting the Fuel project developers too quickly will not succeed:

This is exactly why I'm trying to be conservative about how much upstream integration overhead Fuel team can commit to: make it too burdensome and it will fizzle out. On the other hand, unintrusive and incremental process changes have a better chance of long-term adoption.

Other Fuel project members (including Andrew Woodward and Bogdan Dobrelya) were largely in agreement with Macchi, Goirand, and others that "upstream first" is really the only workable approach. Dobrelya has also taken some concrete steps to combat the problem, including changing the documentation to strongly discourage any additional forks of Puppet modules. It is not the kind of change that can happen overnight but, by the sounds, it is one that has lingered for too long—to the point where some momentum for a change has built up.

It is sometimes easy for development projects to get so wrapped up in solving their own problems that they forget to occasionally look up and evaluate their relationship with the rest of the ecosystem. That appears to be what happened here at some level. The problem has been known and discussed at various summits over the years, but little has changed. With luck, that should not be the case anymore.

For the most part, the "evils" of one-way forks were already known to the Fuel developers from the experiences of other projects over the years. But it can be easy to fall into the trap; thanks to Macchi's note, the process of extricating the project from it has begun in earnest.



to post comments

The costs of forks

Posted Jun 18, 2015 2:58 UTC (Thu) by fandingo (guest, #67019) [Link] (13 responses)

While I'm an experienced Puppet (and MCollective) user, I can barely understand what's going on here. In typical OpenStack fashion, there's political fiefdoms all over the place with incomprehensible names and unclear goals. The biggest question is why multiple OpenStack projects are maintaining/developing Puppet modules. Isn't the entire idea of "Puppet OpenStack" (literally the only OpenStack project that has a clear name) to write the modules for all the projects? If so, why is Fuel maintaining their own modules? If not, why does everything in OpenStack come with major caveats and disorganization?

It seems to me like Fuel should strictly be a consumer of Puppet OpenStack modules, and if they have problems, file bugs and patches with Puppet OpenStack. Maintaining modules seems so far outside Fuel's scope that the only explanation why they began maintaining their own modules is infighting and dysfunction in one project, both projects, or between projects.

I like OpenStack and have deployed it a number of times. However, it amazes me that OpenStack hasn't totally collapsed under the lack of collaboration between projects and generally selfish behavior of developers and especially leaders of the various projects. You can see it bare right here. It seems like the most substantial disagreement is who will do the integration work. Do the Fuel developers throw the grenade over the fence for the OpenStack Puppet developers to defuse, or is it the responsibility of the Fuel developers not to chuck live ordinance at other projects?

The costs of forks

Posted Jun 18, 2015 14:56 UTC (Thu) by NAR (subscriber, #1313) [Link] (12 responses)

if they have problems, file bugs and patches with Puppet OpenStack

I guess they need solutions and fixes, not filed bug reports...

Maintaining modules seems so far outside Fuel's scope that the only explanation why they began maintaining their own modules is infighting and dysfunction

My experience with an open source 3rd party dependency is that I had to maintain (in a limited way) that piece of software even when I had a good working relationship with upstream. I (or rather our users) needed the fixes yesterday, so I couldn't wait for the next official release (possibly months away), I had to use the locally maintained version. There were also some patches which were useful only for us, so upstream refused to integrate them.

The costs of forks

Posted Jun 18, 2015 17:02 UTC (Thu) by fandingo (guest, #67019) [Link] (11 responses)

> My experience with an open source 3rd party dependency is that I had to maintain (in a limited way) that piece of software even when I had a good working relationship with upstream. I (or rather our users) needed the fixes yesterday, so I couldn't wait for the next official release (possibly months away), I had to use the locally maintained version. There were also some patches which were useful only for us, so upstream refused to integrate them.

This is what I was getting at with the confusing terminology. "Upstream" in this discussion is not really an upstream how we typically think of it. They're not talking about Puppet (as in what Puppet Labs developers) as upstream. "Upstream" is "Puppet OpenStack," (PO) which is a project within OpenStack. That project only exists to provide Puppet modules for configuring all the parts of OpenStack. There shouldn't be any need to fork any work. The reason why it does happen is politics and each project (and its leadership) trying to stake as large a claim to functionality as possible to satisfy their selfish desires.

OpenStack is a technological gold rush. It's new and feature scopes are expanding unbelievably fast. Everyone is trying to maximize their stake, and it creates a lot of friction and duplicate work. Since these developers think of themselves as *either* Fuel or PO developers rather than OpenStack developers. I'm sure that the immaturity of PO created problems for Fuel due to missing/incomplete/nonfunctional features. Normally, what you would expect to see is developers "switching hats" and contributing directly to the lacking project, but with OpenStack, the normal behavior is to fork and include that duplicate work within one's own project due to inter-project friction. To make an imperfect analogy to the Linux Kernel, this would be like a new filesystem eschewing the VFS and writing their own. Fortunately, that's not tolerated in Kernel development.

The costs of forks

Posted Jun 18, 2015 21:56 UTC (Thu) by dlang (guest, #313) [Link] (10 responses)

The problem is that a Fuel developer needs a working fix now, they grab a copy of the code and "make it work"

now that change needs to get submitted back to Puppet Openstack, and that's not just a matter of sending a diff (which is easy enough to forget to do if you're busy), but to also fix the patch if needed, add tests, respin it against a new version, solve the larger problem instead of the particular pain point the Fuel developer ran into, etc.

In other words, "make it work" is only the first portion of what needs to be done, but it's all that the user (and all too much management) sees at the time.

This whole discussion is about how to make sure that the other things all get done, and as the article notes, once the issue was raised, both sides want to solve it. They are just working out the details of how.

The costs of forks

Posted Jun 19, 2015 15:19 UTC (Fri) by fandingo (guest, #67019) [Link] (9 responses)

> The problem is that a Fuel developer needs a working fix now, they grab a copy of the code and "make it work"

That sounds like an abysmal strategy for software engineering. Expediency should never be a rationalization for janky solutions. It's modus operandi in OpenStack, though.

> now that change needs to get submitted back to Puppet Openstack, and that's not just a matter of sending a diff (which is easy enough to forget to do if you're busy), but to also fix the patch if needed, add tests, respin it against a new version, solve the larger problem instead of the particular pain point the Fuel developer ran into, etc.

I entirely disagree with this notion. They should be doing this work upstream in PO in the first place. Even if that requires using an alternate PO branch while in development. We're not talking about disparate open source projects -- they're supposed to be tightly coupled.

> This whole discussion is about how to make sure that the other things all get done, and as the article notes, once the issue was raised, both sides want to solve it. They are just working out the details of how.

I got the feeling that the interest by the leaders on both sides was tepid. The discussion was dominated by each side trying to deflect any responsibilities and commitments.

The costs of forks

Posted Jun 19, 2015 17:51 UTC (Fri) by dlang (guest, #313) [Link]

>> The problem is that a Fuel developer needs a working fix now, they grab a copy of the code and "make it work"

> That sounds like an abysmal strategy for software engineering. Expediency should never be a rationalization for janky solutions. It's modus operandi in OpenStack, though.

don't blame this on OpenStack, "make it work for me" is almost always the first step in creating a fix (when it isn't, the first step is almost always "make it not work for me", and only as a last resort, "think about what may fail" :-)

> I entirely disagree with this notion. They should be doing this work upstream in PO in the first place.

No, people should not be working in the upstream repository, they need to work locally first to figure out what's wrong and what needs to change, then they can push the change upstream.

but even if you say that you pursue an "upstream first" strategy, that doesn't mean that you are willing to live with the day-to-day breakage that caused you to do the work. You are almost always going to be running your fixed code while you do the work of getting it upstream. The only question is how widely you distribute the fix.

If you refuse to give the fix to your customers until upstream has accepted it and made a new release, then you are a perfect "upstream first"

If you wait until the fix is submitted, but not released, you can still argue that you are "upstream first", but others could argue that since you haven't waited for upstream to fully test it, and what you ship doesn't match anything upstream will ever ship (they will combine your fix with other fixes), you aren't pure.

If you ship your fix while you are working to get it upstream, your customers are likely to be happier because they have working systems sooner, but you aren't "upstream first" in this case

even the most aggressive "upstream first" organizations aren't always perfect.

The costs of forks

Posted Jun 19, 2015 22:22 UTC (Fri) by angdraug (subscriber, #7487) [Link] (7 responses)

Expediency should never be a rationalization for janky solutions. It's modus operandi in OpenStack, though.

Welcome to the real world. OpenStack is far from a shining example of technical perfection and open collaboration, but you're never gonna change anyone by demanding unconditional surrender to the absolute perfection. As I said, OpenStack community is dominated by commercial interests, and unlike unsponsored free software projects done by independent contributors on their own dime, commercial entities that miss deadlines eventually lose money and go out of business. And deadlines are the worst enemy of perfection. Because of that, ability to draw the "good enough" line at the right balance between quality and timeliness is the most important quality of any engineer, more so in software where the range of acceptable quality varies much wider than in, say, civil engineering.

I got the feeling that the interest by the leaders on both sides was tepid. The discussion was dominated by each side trying to deflect any responsibilities and commitments.

No. There was some deflection early on, but in the end we've managed to get past that and agree on a constructive, and, most importantly, practically achievable way forward.

The interest from Emilien is geniune and not tepid: he was the one to start the thread, and in the face of good arguments he backed away from the hard line and agreed to help land stuck patches from Fuel contributors, with the understanding that cases where that becomes necessary will be exceptions and not the norm. Best of all, this interest is also very practical: in the space of 6 months Fuel has jumped from not being listed at all to #3 deployment tool used to install OpenStack, Fuel 6.1 contains 1995 bugfixes (compared to 232 in PO during Kilo cycle), it's easy to see Emilien's interest in getting more of that value contributed back to upstream.

The interest from Fuel team is also very practical and has been abundantly explained on the thread and in the article. We've lived with the pain of maintaining a fork for a few years now, our current process of keeping up is better than nothing but much more cumbersome than tracking upstream directly. What you call tepid I call cautious about making public commitments: this time, we want results and not just warm and fuzzy feeling of mending the fences with upstream.

The costs of forks

Posted Jun 20, 2015 2:54 UTC (Sat) by fandingo (guest, #67019) [Link] (6 responses)

> Welcome to the real world. OpenStack is far from a shining example of technical perfection and open collaboration, but you're never gonna change anyone by demanding unconditional surrender to the absolute perfection. As I said, OpenStack community is dominated by commercial interests, and unlike unsponsored free software projects done by independent contributors on their own dime, commercial entities that miss deadlines eventually lose money and go out of business. And deadlines are the worst enemy of perfection. Because of that, ability to draw the "good enough" line at the right balance between quality and timeliness is the most important quality of any engineer, more so in software where the range of acceptable quality varies much wider than in, say, civil engineering.

And, yet, the Linux kernel project, which is overwhelming dominated by corporate contributors, doesn't have this sort of rubbish, even in isolated incidents. The sprawling, isolated, self-aggrandizing, profiteering nature of OpenStack development is unique among all past and present open source projects. Projects spin up with unclear definitions, and they expand to consolidate as much political power as possible. Isolating source code repositories, and more importantly, contributor relations and permissions to individual projects was a terrible mistake by OpenStack with expediency, once again, being the prime culprit. In OpenStack, so-called "velocity" seems to be the real development goal.

Here's a few genuine questions that I've alluded to in several of my comments:

1) Why isn't Fuel a strict consumer of modules in PO? Why was the code ever forked?

2) Presumably the answer to #1 includes something about lacking functionality or bugs. Why weren't they just fixed in PO?

3) Why are OpenStack developers so siloed in their own projects? Why don't they contribute across projects instead of forking?

4) As a follow-up to #3, why do OpenStack projects have these sorts of incompatibilities and diverging goals? (In particular, it's one thing to have differences while in development, but OpenStack seems to always have them in released code.)

5) Fuel seems to fundamentally be an expansion of what PO set out to do. Why was a separate project started, and why is it trying to "drink PO's milkshake?" Was there any consideration to expanding the goals of PO, rebranding it, and essentially making Fuel without the need to duplicate so much work?

6) What's the corporate alignment behind Fuel and PO? Who's sponsoring major portions of development? Which corporation(s) started Fuel? Did they have a beef with the corporation(s) behind PO, or more generally, what's their "angle?"

I guess my fundamental opinion is that it's always improper to do this sort of forking within a project. Whether it's the Linux Kernel mm and netdev maintainers or OpenStack Fuel and PO developers, they're fundamentally colleagues and should work towards a unified purpose. If the project in charge of Puppet modules isn't providing the modules the way the consumer wants, then either the consumer is doing it wrong or the producer has a fault design/implementation. The correct decision was always to do work directly within PO, and if the PO developers didn't want to follow your design, Fuel should've been modified to handle that. There shouldn't be Puppet module development in Fuel at all; it needs to be directly contributed to PO. That avoids all the fragmentation, all the stupid merging problems (changing folder structure? that's a paddlin'), and negotiation.

> Best of all, this interest is also very practical: in the space of 6 months Fuel has jumped from not being listed at all to #3 deployment tool used to install OpenStack, Fuel 6.1 contains 1995 bugfixes (compared to 232 in PO during Kilo cycle)

The very last thing I'm interested in seeing from the any OpenStack project or as a whole at this point is, "look at how much our expediency is paying off in the short-term!" This gets back to the political tensions and infighting: Is the goal to entirely displace PO? It sure seems like it, and you bugfix numbers suggest so. This seems like exactly what happened with KHTML. It's forked by Apple and renamed Webkit, which over a much longer timespan, saw essentially all (and overwhelmingly new) development go into Webkit and nothing flowing back to KHTML. I guess, at least, it wasn't another KDE project that ate their lunch and kicked them out of the club.

But, hey, I'm sure they'll be some new project (probably called Gasconade) that will usurp Fuel come the next blue moon.

Of course those bugfixes tilt in Fuel's favor: They're not freaking contributing their changes of forked PO code to PO! How many of those bugfixes would be more correctly categorized as fixes to PO?

A project with a Torvalds, van Rossum, or Poettering would never tolerate this sort of behavior. A submodule implementing their own incompatible fork of something contemporaneously provided is flatly unthinkable. They'd step up and require people to contribute to the lacking submodule directly. The difference is that those projects (and the vast majority of all open source projects) either maintain some independence at the top (eg. Torvalds is employed by the Linux Foundation) or strong commitment to sound engineering principles. Unfortunately, OpenStack is led by a cabal of corporations with extremely selfish, inimical goals.

I do want to end this comment by saying that, while I do use direct language and descriptive words, I do like OpenStack. I just wish that they'd slow down a little bit and congeal the whole project together better. It's getting so big, and you can tell that it's just the corporate sponsors behind development trying to differentiate what their spin can do and outmaneuver their fellow corporate sponsors. Overall, it's just disappointing that the project seems to be turning into a cesspool. I can say pretty confidently where many open source projects will be in 10 years, but I don't know whether OpenStack will establish a more solid base between projects, be a phoenix reborn as something else entirely, or be the abandoned shopping mall where you grew up.

The costs of forks

Posted Jun 20, 2015 8:43 UTC (Sat) by dlang (guest, #313) [Link] (4 responses)

> And, yet, the Linux kernel project, which is overwhelming dominated by corporate contributors, doesn't have this sort of rubbish, even in isolated incidents.

umm, you haven't been watching linux development very long have you?

the kernel faces this sort of thing all the time. While the distros have cut down the number of patches they maintain from the linus kernel since the 2.4 days, they still maintain quite a few. And if you haven't noticed how some people dislike and distrust Ubuntu, RedHat and Oracle, go back and look again.

frankly, this is such a monstrous misstatement of thing that I didn't bother reading the rest of your post.

The costs of forks

Posted Jun 20, 2015 11:52 UTC (Sat) by fandingo (guest, #67019) [Link] (3 responses)

That's an entirely different situation. I'm talking about forked, incompatible code as part of the official code release, not what some random ISV decides to do. (If I were talking about that, surely what Android ISVs do is a better example, but of course, all that lives outside the official release bundle.) The code you describe lives outside the release branch of the source code. Even those ISVs don't have patches that does what Fuel does.

What Fuel is doing is akin to a file system shipping its own VFS or a wifi driver with its own netdev stack. That simply doesn't happen in Linux upstream. We can all imagine the rage and colorful language Linus would use to describe such a pull request.

The costs of forks

Posted Jun 20, 2015 22:04 UTC (Sat) by angdraug (subscriber, #7487) [Link]

Like most analogies, yours is deeply flawed on many levels. For starters, analogies are good for explaining novel concepts to novices, and misleading when examining fine nuances of concepts that your audience is at least as familiar with as you are. In an argument, all an analogy does is increase emotional temperature of the conversation. If that's what you're going for, you're on the wrong web site.

Fuel project isn't doing anything related to Puppet OpenStack that could be likened to replacing Linux kernel VFS layer. Puppet OpenStack is a collection of modules such as puppet-nova, puppet-neutron etc., the only common layer underneath these modules is Puppet itself, and we absolutely are not patching Puppet, just like we're not patching Ruby or Python interpreters that we're using to run our code.

The costs of forks

Posted Jun 25, 2015 12:37 UTC (Thu) by Funcan (subscriber, #44209) [Link] (1 responses)

There absolutely *were* multiple wifi stacks in kernel for ages, with incompatable userspace tools. Took ages to clean up. Ditto sound interfaces - OSS, ALSA, some other

The costs of forks

Posted Jun 25, 2015 17:06 UTC (Thu) by flussence (guest, #85566) [Link]

I guess the "some other" there is the Firewire audio stack, which IIRC is only accessible via the JACK daemon. On a similar tangent, there used to be two *normal* FW stacks until recently. And then there's USB, where device drivers can be written in-kernel and/or userspace, or bits of both at the same time...

The costs of forks

Posted Jun 21, 2015 1:20 UTC (Sun) by angdraug (subscriber, #7487) [Link]

Please start doing your own research. Answers to all your questions are right there in the thread on openstack-dev, the article that has summarized it, the comments here, or at most a quick google search away. If you're not interested in answers, please explicitly state that your questions are rhetorical. It would also be a good form to confirm whether or not you have a personal stake in this discussion, like I did in my first comment here. All that would help reassure me that you're genuinely interested in a two-way conversation and I shouldn't follow dlang's example and stop reading.

Why isn't Fuel a strict consumer of modules in PO? Why was the code ever forked?

The answer is right there in the comment you were replying to: deadlines.

Presumably the answer to #1 includes something about lacking functionality or bugs. Why weren't they just fixed in PO?

The answer is in one of my emails to the thread that were linked from the article above: additional effort that it would require from Fuel developers. Additional 5x to 10x times worth of effort, as I have illustrated in another email on that thread.

Why are OpenStack developers so siloed in their own projects? Why don't they contribute across projects instead of forking?

That is not true, and you would know that if you tried to confirm that accusation using Stackalytics, the online tool I linked in my previous comment. If you look at the top individual bug fixers in Kilo and check their commit statistics you'll find that the most active contributors have commits in dozens of projects outside of their main area of interest. OpenStack is complex, it takes time to learn even one component, so you can't blame less experienced contributors for sticking to the areas they know well.

Heck, if you bothered to RTFA you'd find the evidence of Fuel developers contributing to Puppet OpenStack in another linked email.

Fuel seems to fundamentally be an expansion of what PO set out to do. Why was a separate project started, and why is it trying to "drink PO's milkshake?"

No, it isn't. I already tried to explain that in another comment here. Just as Fuel is not just a GUI front-end to Puppet OpenStack, it's also not a replacement for Puppet OpenStack. The latter is a swiss army knife style collection of individual modules that allow you to set up individual OpenStack components in any way you like, and leave all the decisions and all the integration work up to you. The former is an integrated system that combines a local fork of the latter, a GUI, a great deal of orchestration logic, and a whole reference architectures guide's worth of configuration decisions that, all together, offers a completely different balance between flexibility and the effort and expertise required to get OpenStack up and running.

Was there any consideration to expanding the goals of PO, rebranding it, and essentially making Fuel without the need to duplicate so much work?

Wrong question. Turning Puppet OpenStack into what Fuel is now would have been a perversion of Puppet OpenStack purpose as it would severely narrow down the range of configurations it could support. Making Fuel out of just Puppet OpenStack without all the configuration choices, orchestration, UI, and operational tooling would have been impossible, see the code stats I've already posted here. A better question would have been, could the effort duplication specifically in the Puppet code of Fuel (fuel-library) be reduced? Maybe. Should the effort to reconcile fuel-library with upstream have been started earlier than 2014? I think so. Should "upstream first" have been the requirement from the very beginning? No, I don't believe that would have worked out at the time, due to the same expediency considerations that you chose to dismiss.

What's the corporate alignment behind Fuel and PO? Who's sponsoring major portions of development? Which corporation(s) started Fuel?

Are all those rhetorical questions? If you have time to write lengthy comments like this surely you have 5 minutes to find the answers on Stackalytics?

Did they have a beef with the corporation(s) behind PO, or more generally, what's their "angle?"

For someone who claims to use direct language, you sure chose a roundabout way to build up to this accusation. Is that why you're ignoring all technical reasons for the current situation? You think there's some sinister corporate conspiracy that hundreds of OpenStack contributors have been covering up all these years? Sorry, but the answer is no, I can't think of any real or imagined "beef" or "angle" that could have influenced that decision, it's always been a matter of expediency.

This gets back to the political tensions and infighting: Is the goal to entirely displace PO? It sure seems like it, and you bugfix numbers suggest so.

So far, the infighting has only been in your imagination, and I seriously don't appreciate your efforts to make the "political tensions" a reality by trying to invent reasons for Fuel and Puppet OpenStack to compete, when I have abundantly demonstrated (using bug numbers among other data) that collaboration is in both project's best interests, and key developers from both projects have publicly acknowledged the same and came up with process changes to improve that collaboration and eventually un-fork the projects.

Fuel is not just a GUI

Posted Jun 19, 2015 5:07 UTC (Fri) by angdraug (subscriber, #7487) [Link]

Thanks for an excellent summary of that thread! I expect great things to come out of this discussion in the medium and long term for both projects. Here's a few knitpicks that I hope clear some confusion about Fuel and why maintaining Puppet modules is very much in its scope. Disclaimer: I'm one of the Fuel developers mentioned in the article.

The Fuel project, which is a graphical user interface (GUI) to help deploy, test, and manage OpenStack installations.

This makes Fuel seem like it's mostly a GUI, and may have contributed to fandingo's initial confusion. If one makes that mistake it becomes really hard to understand how that's related to Puppet and OpenStack (or even to Puppet OpenStack, once you've figured out that that is thing of it's own).

Fuel is much more than a GUI. Here's CLOC breakdown by language.

  • 133k lines of Python + 31k YAML + 12k JSON + 20k XML, most in management backend, orchestration, integration tests, as well as configuration for many data-driven components.
  • 42k lines of Puppet + 113k Ruby, most of it in Puppet manifests, resource providers, and their unit tests.
  • 23k lines of JavaScript + 10k HTML + 1k CSS, that's pretty much it for the GUI, all the logic is in the backend and is also exposed via REST API and CLI.
  • 17k lines of shell scripts all over the place, including deployment scriptlets, HA glue, management utilities, and whatnot.

Counting the commits in the upcoming 6.1 release of Fuel, the breakdown between management layer, Puppet, and GUI looks like this:

  • Management: 593
  • Puppet: 656
  • GUI: 249

So while I like to think that Fuel's got an awesome GUI, it remains only the tip of the iceberg, while Puppet code remains the busiest part of Fuel codebase.

It is sometimes easy for development projects to get so wrapped up in solving their own problems that they forget to occasionally look up and evaluate their relationship with the rest of the ecosystem. That appears to be what happened here at some level. The problem has been known and discussed at various summits over the years, but little has changed.

I'm inclined to think that Emilien's email has reinvigorated Fuel's efforts to move towards upstream that me and Andrew have helped start in March 2014 (and which bore first fruit in October 2014), rather than started something that wasn't already happening. In that light, I feel that it's too harsh to say that it has all remained just talk until now.

Pushing the open source agenda with OpenStack vendors remains an uphill battle that is far from won, and not only because we're too busy to look up. As fandingo has noted, OpenStack community is viciously competitive and dominated by contributors with strong commercial interests, some with big egos, some with very superficial understanding of how free software communities work (and don't work). Downplaying even feeble attempts at coopetition isn't helping with any of these challenges.

History and licensing isn't really a red-herring

Posted Jun 20, 2015 9:36 UTC (Sat) by paulj (subscriber, #341) [Link] (12 responses)

If "SCM with history" is now the preferred form for modifications, then under the GPL that is the form that should be distributed.

That RedHat got away with deliberately withholding history for the source of their Linux kernel RPMs is wholly irrelevant to what is appropriate for the licence of another GPL project.

History and licensing isn't really a red-herring

Posted Jun 20, 2015 20:28 UTC (Sat) by angdraug (subscriber, #7487) [Link] (11 responses)

While I agree that it's reasonable to consider "SCM with history" the preferred form of modification, I don't see how you can hold different parties to different standards when it comes to GPL compliance. And even if you could, of all people Red Hat with its virtually unchallenged dominance of Linux server market should be held to higher standard than anybody else, not lower.

Not that any of that were relevant in the context of OpenStack where all projects use Apache License v2, which is not copyleft.

History and licensing isn't really a red-herring

Posted Jun 20, 2015 22:05 UTC (Sat) by paulj (subscriber, #341) [Link] (10 responses)

Why can't the preferred form of modification be different for different projects? I don't see why the entire free software world must intrinsically evolve their preferences in lock-step anyway.

On OpenStack, I was assuming the Puppet OpenStack component being discussed would have the same licence as Puppet generally, which seems to be GPLv2. However, I'll admit I didn't go look at the OpenStack Puppet code concerned! If it's Apache, never mind! :)

History and licensing isn't really a red-herring

Posted Jun 20, 2015 22:09 UTC (Sat) by angdraug (subscriber, #7487) [Link] (9 responses)

Well, they're not evolving their preferences at all, I'm yet to see a project that would explicitly declare a git repository as the preferred form, which is why I assumed you're trying make that declaration for them. My mistake :)

History and licensing isn't really a red-herring

Posted Jun 21, 2015 12:36 UTC (Sun) by paulj (subscriber, #341) [Link] (8 responses)

Well, Emilien Macchi in the mail that started off the discussion this story is about does seem to suggest that he'd prefer the Fuel people did their work in a way that preserved the history in the SCM.

The code concerned does seem to be under the Apache licence. The Apache Licence does have "preferred form of modification" type language in its definition of "Source" and does seem to consider that if works are distributed that this can only be in either "Source" or "Object" form. Whether that means it is /required/ that distributions of modifiable code meet the "Source" definition, I don't know.

However, what is important here is that Borodaenko's claim that:

"Besides, there's a historic precedent that stripping commit history is
acceptable even with GPL: https://lwn.net/Articles/432012/"

is refuted.

It is *not* the case that because of RedHat and their deliberate collapsing of patches in their Linux kernel SRPM that therefore deliberate stripping of commit history is generally acceptable under the GPL. The RedHat case was quite specific to the preparation of Linux kernel RPMs from a set of patches (kept in a git repo, but that's almost incidental) to a base kernel version and publishing SRPMs that didn't (don't?) reflect that. That no one of consequence objected doesn't really generalise out to the GPL for all projects, in all situations.

(Note: Objecting to what RedHat did does not mean you think Linux, or any GPL project, can only be distributed as a git repo with full history either).

History and licensing isn't really a red-herring

Posted Jun 22, 2015 3:57 UTC (Mon) by angdraug (subscriber, #7487) [Link] (7 responses)

It seems to me that the way this discussion is going fits the proverbial textbook definition of red herring: a seemingly plausible, though ultimately irrelevant, diversionary tactic. What's the point of reintroducing the question of licensing into the discussion of collaboration between two free software projects? Which part of the outcome described in the article do you find unsatisfactory?

"seems to suggest that he'd prefer" is nowhere near "explicitly declare a git repository as the preferred form".

Fuel people did their work in a way that preserved the history in the SCM

Which we do, in a way that not only establishes code provenance, but also allows to create a branch in upstream git from Fuel specific changes. Try to declare that level of history preservation not legally satisfactory and you end up with a license that forbids all kinds of useful behaviours such as converting to a different SCM.

what is important here is that Borodaenko's claim (...) is refuted.

If you want to refute it, you have to do better than simply declare it refuted, give a proof that isn't full of fallacies:

deliberate stripping of commit history is generally acceptable under the GPL

Strawman. The claim is that there is a precedent, not that it's generally acceptable. Precedent proves that you can't summarily declare all cases of stripping commit history unacceptable, which is not the same as declaring that all cases of stripping commit history are acceptable. I think you're mislead by use of "is acceptable" in the claim in question, a more logically precise form would have been "can be acceptable". (Which does not invalidate its relevance for the case of Fuel.)

The RedHat case was quite specific to the preparation of Linux kernel RPMs

Special pleading. You failed to demonstrate how the specifics of the Red Hat kernel SRPMs case are relevant to the compatibility of stripping commit history with GPL.

That no one of consequence objected doesn't really generalise out to the GPL for all projects, in all situations.

Appeal to authority combined with cum hoc ergo propter hoc, followed by the same precedent vs generalization strawman. "That no one of consequence objected" makes a false implication that if they did, it wouldn't have been found legally acceptable anyway.

Can we just agree that the claim about the Red Hat precedent is instrumental to removing the license compliance question out of the scope of Fuel and Puppet OpenStack discussion, but that it shouldn't be taken as generalization of this precedent to all GPL projects?

History and licensing isn't really a red-herring

Posted Jun 22, 2015 10:45 UTC (Mon) by paulj (subscriber, #341) [Link] (5 responses)

Again, here is what Borodaenko wrote:

"Besides, there's a historic precedent that stripping commit history *is acceptable*" (emphasis mine).

You're attacking my comment with all kinds of claims about logical fallacies, except your attribution of logical fallacies is predicated on your claim that I should have read Borodaenko's comment in a different way:

“ I think you're mislead by use of "is acceptable" in the claim in question, a more logically precise form would have been "can be acceptable"”

Basically, you're making the bizarre argument that my claim about the generality of Borodaenko's comment is incorrect because I should have interpreted in some way *other* than what the words he wrote indicated. If you're going to start attacking people with nit-picking deconstructions of the logical consistency of their arguments, then you should perhaps not base your claims of inconsistencies on your own modifications to the facts being argued.

As for refuting the claim: Noting that the RedHat case related to downstream preparation of SRPMs from an upstream tarball + patches, stored somewhere (in a git repo - but that actually was *NOT* of any relevance - it was *not* the git or CVS history that they stopped providing), and that that case is very different from the normal case of software development repositories of upstreams, should be enough to make any reasonable person realise you can't automatically generalise the RedHat case to the other. When two situations are quite different from each other, simply highlighting that fact is sufficient to refute any trivial attempt to generalise from one to the other that doesn't account for the differences.

As for your last paragraph, yes, I have no opinion about the licensing history question wrt Fuel and OpenStack Puppet. However, the RedHat SRPM thing has little to do with it, I'd agree, and nor does the RedHat SRPM thing generalise to other GPL projects, I'd agree.

History and licensing isn't really a red-herring

Posted Jun 22, 2015 16:01 UTC (Mon) by angdraug (subscriber, #7487) [Link] (4 responses)

Basically, you're making the bizarre argument that my claim about the generality of Borodaenko's comment is incorrect because I should have interpreted in some way *other* than what the words he wrote indicated.

I hope you agree that it looks less bizarre after you realize that I am Dmitry Borodaenko (as I have indicated in my first comment here), and I know that I didn't mean to generalize this precedent, and I have already conceded that the choice of words I used was imprecise.

Noting that the RedHat case related to downstream preparation of SRPMs from an upstream tarball + patches, stored somewhere (in a git repo - but that actually was *NOT* of any relevance - it was *not* the git or CVS history that they stopped providing)

No, git history is something they weren't publishing in the first place, but having that modification history reflected in patches in SRPM was good enough for practical purposes. Removing patches from SRPM removed the last public instance of that modification history, which is why it was functionally the same as taking away SCM history.

History and licensing isn't really a red-herring

Posted Jun 23, 2015 14:01 UTC (Tue) by paulj (subscriber, #341) [Link] (3 responses)

Well, I don't know how I could have known that! You never identified yourself to me or others in this thread before I posted!

You posted before me, but said only "Disclaimer: I'm one of the Fuel developers mentioned in the article.", neither that nor your nick are likely to make it immediately obvious to others that you are Dmitry Borodaenko. And I wasn't replying to that comment of yours, so I probably didn't even read it fully. In another comment on another thread you did say "The answer is in one of my emails" which was linked to an email with your name - but that was the day *after* I posted to this thread. Further, I read only the comments made in reply to me, in this thread, of which I was notified - not the others.

So yes, my comment was bizarre, as I should have been able to make the non-obvious connection between your nick and your real name, or read your comments from the future. :) I responded to text linked to directly in the article, bizarrely ignoring clarifications to that text from its author in comments made to me in the future. :)

The RedHat SRPM issue is that the "source" of that SRPM is a base upstream kernel + a set of patches to it. That is what the RedHat kernel RPM maintainers work on. However, the SRPM they release to satisfy the GPL source requirements on the SRPM go through a process that deliberately collapses the patches and folds them together (or folds them into the tarball - I don't remember). The SRPM released does not reflect the source input files that the RedHat people work on and, clearly, prefer to maintain (useful enough that RedHat considers it a commercial advantage).

Where those patches are stored, whether it's an SCM or just a folder in dumb directory, is a by the by and not really relevant. This isn't about SCM history. You could copy those files away from the SCM and run the same process (after the SCM checkout anyway), and get the same result: The SRPM does not contain the inputs in their preferred format (certainly not RedHats' preferred format).

The SCM has nothing to do with it, other than RedHats' SRPM build process happens to have "do a checkout from an SCM" at an early, irrelevant stage (AIUI).

History and licensing isn't really a red-herring

Posted Jun 23, 2015 18:03 UTC (Tue) by angdraug (subscriber, #7487) [Link] (2 responses)

So yes, my comment was bizarre

You've completely misread my previous comment. You said "you're making the bizarre argument", to which I responded "it looks less bizarre", by "it" referring to the same argument you were referring to. So I did not call your comment bizarre, you did that to mine. By saying that now that I confirmed my identity it should look less bizarre I implicitly confirmed that I agree that it can look bizarre if you don't know who I am. I hope this helps close this bizarre meta-conversation ;-)

The SCM has nothing to do with it

What's important here, and what makes this story relevant to the topic, is the history of modifications. It doesn't matter how this history is represented, what matters is whether that history is publicly available or not.

History and licensing isn't really a red-herring

Posted Jun 23, 2015 22:15 UTC (Tue) by paulj (subscriber, #341) [Link] (1 responses)

Fair enough, and yes it's been a funny meta-conversation. :)

On history thing. It's an interesting question. Normally, it is pretty obvious what the norms and preferred modifications are: what a project distributes or puts together in "releases" is the practice others should follow. So if a project's releases are tarballs, then a tarball of the source of others making releases of that project's code in a similar vein should be enough.

Things possibly get fuzzier though as you move away from the original project.

So if someone adds a whole layer more of meta-data and associated tooling around the original's release file, allowing the original upstream's release file to be combined with yet other's people patch-files and built and packaged: Is it sufficient to release just the tarball with those patches collapsed in? (The RedHat case).

Lots of interesting questions here. I suspect norms are in flux somewhat now. With older SCMs you did not distribute the SCM with the code. With git, however, you do. Things could develop somewhat. Certainly, since git has added an "Author" field to commit messages, distinct from the commiter, some take that field very very seriously.

History and licensing isn't really a red-herring

Posted Jun 23, 2015 22:50 UTC (Tue) by angdraug (subscriber, #7487) [Link]

Interestingly enough, I tried to raise similar questions as far back as 12 years ago, when git didn't yet exist and distributed SCMs were a subject of research, not a standard part of software engineering process:

Define information source as a full history of published modifications to the licensed piece of information in preferred form for making modifications.
Distribution of modified versions of the piece should require that such source is preserved by respective authors of individual modifications, and its availability is guaranteed by (a) providing a valid reference to how the source can be obtained for a charge no more than the cost of physically performing source distribution, (b) updating the reference on request in case it becomes invalid, and (c) providing the source on request in case no other valid reference can be provided.

Now that git is something every software engineer is expected to be fluent with, and distributing source in the form of a cloneable git repository is trivial, some of the challenges that came up in the conclusion of that discussion might have disappeared :)

History and licensing isn't really a red-herring

Posted Jun 22, 2015 10:56 UTC (Mon) by paulj (subscriber, #341) [Link]

Oh, on:

"Try to declare that level of history preservation not legally satisfactory and you end up with a license that forbids all kinds of useful behaviours such as converting to a different SCM."

Agreed. Making the SCM history be part of the "source" would probably make life difficult in many common cases. The place for any legally significant attribution, history, etc., should be in the non-SCM source itself.

That said, it is *possible* that the community could one day consider the full SCM data to be the source wrt "preferred form of modification" for the purposes of the GPL. Whether the community around a specific project could set that expectation, or whether it'd need to be set by wider community and/or industry practices, I don't know.

Advisable as things stand? Probably not, as you say. Possible, yes.

(Again, to be 100% clear, the RedHat case was *NOT* about SCM history!)


Copyright © 2015, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds