Estimating the costs of open-source development
At the 2015 Embedded Linux Conference Europe in Dublin, Paul Sherwood from Codethink presented an intriguing take on the common problem of estimating the length of time that a development project will take. In particular, he called out the widespread Constructive Cost Model (COCOMO) as being demonstrably unscientific and, thus, useless at predicting the future, then presented a far simpler alternative metric based on Git activity. The proposed new metric, he said, can be shown as useful by examining well-known open source software projects, including the Linux kernel.
At the beginning, Sherwood acknowledged that he was "biting the hand that feeds him" by challenging some statistics frequently cited by, among others, the Linux Foundation (LF). He has seen it claimed [note: signup required at link], he said, that it costs an average of $250,000 per year to maintain an out-of-tree kernel patch, and that the total amount of developer time [PDF] invested in creating core Linux projects is the equivalent of 1,356 developers working for 30 years. But the $250,000 number sounds dubious and the 1,356 sounds far too specific to be an "estimate"—and neither value comes with any supporting documentation.
Similarly, estimates of the value of Linux or even the number of lines of code (LOC) in the kernel are often quoted without hard evidence to back them up, a practice that clearly ruffles Sherwood's feathers. In fact, the LF said in 2013 that the kernel contained about 17 million lines of code, Sherwood said. But when he ran David Wheeler's SLOCCount against it, he got a significantly different number: 12 million lines of code (a 30% discrepancy). The numbers reported by OpenHub are even larger still, and no one can explain the difference when he asks.
In the absence of real statistics, he said, the software industry relies on vague and suspicious estimates when assessing the size and complexity of projects. And the problem of precision is not limited to pull quotes in promotional material. Companies regularly rely on estimation tools to predict the time and budget that a new project will require, and those estimates, by and large, seem to be founded on unsupported numbers. Sometimes such corporate estimates are based on past internal projects, but it is increasingly common for companies to compare their projects to an open-source project of similar complexity and scope. For example, a company embarking on an effort to write a new Linux graphics driver may look at several recent graphics-driver projects to approximate the time and engineering resources it will take. Finding a valid metric to serve in such situations is a hard problem, Sherwood said, but he has at least worked out a plausible alternative by analyzing several open-source software projects.
The prediction problem
To predict project costs, people have attempted to count all manner of indicators: LOC, number of commits, time sheet hours, even "WTFs per week." LOC is a naive measurement, although it is an easy one to understand. Commits and time sheet hours are also simple, but they are difficult to compare across projects and companies, and can be gamed (such as by breaking changes up into an excessive number of commits). The most common estimation tool in the industry, though, is the complicated COCOMO formula, he said—a method that routinely generates "enormous numbers" that no one can support with data. COCOMO dates back to the 1980s, but it has continued to be discussed, written about, and used in business environments up through the present day. It is the source of OpenHub's estimates about the value of open-source software, for example.
COCOMO has obvious flaws, he said, such as counting only positive increases in LOC as its measurement of progress. Blank lines and comments thus get taken into account, while removing dead code or rewriting existing code does not count at all. But the trouble is worse than that: COCOMO is inherently problematic because it is a function of multiple unreliable input variables. Many of these variables take predictive powers to fill (such as knowing the eventual hardware attributes that will be available on the deployed system) or are subjective (such as developer efficiency).
In reality, he said, project managers can tweak all of these inputs enough to fully control the output of the COCOMO function, so it is meaningless. Engineers and managers have learned to over-estimate their COCOMO numbers so that they appear more productive when they beat the estimate. Just as importantly, he said, looking at COCOMO estimates from past projects invariably shows them to be unreliable. But the method still gets used, in large part because there is no proven alternative. One can complain about COCOMO, and its users will shrug and say "what else do we have?"
Moreover, even those organizations that do not subscribe to COCOMO estimates often fall victim to them because the method is so widespread. A widely cited statistic, for example, is that the "Agile" development methodology reduces many of the costs of fixing bugs the "old-fashioned way"—in which, supposedly, fixing a bug in the planning phase is ten times cheaper than fixing it in the development phase. But that 10x factor originates in COCOMO estimates, he said, so comparing a new method to it has no value. "I think there was no scientific basis for COCOMO," Sherwood said in conclusion, "and there is no proof that is has worked historically. Until someone can prove otherwise, that's what I'm calling it."
Sherwood pointed the audience to several external resources on the unreliability of business prediction models, such as Nassim Taleb's The Black Swan: The Impact of the Highly Improbable and Dan Gardner's Future Babble. But, he continued, decrying bad numbers from COCOMO is not enough: the industry clearly needs to find a way to measure development costs. Without an alternative, "guesswork" like COCOMO will persist.
By GAD
In response to the shortage of hard evidence, Sherwood has been researching alternative methods of counting the effort required to complete software projects. His research looked at internal projects at Codethink as well as at well-known open-source software like the kernel, QEMU, systemd, GStreamer, OpenSSH, and various GENIVI projects (to which Codethink is a contributor).
Ultimately, he said, the one metric that he has found to most accurately reflect the time it takes to complete a software project is what he calls 2GAD. "GAD" stands for "Git active days" and is simply the number of days on which a developer makes a Git commit—of any kind. Summing up all of the GADs for every project participant and multiplying by two produces a relatively accurate count of how long it takes to move a project to completion.
He admitted that there are factors that could make 2GAD numbers too high—for instance, when some developers commit to multiple repositories or branches in one day, they may be counted twice. And it would be possible for developers to "game" the system to a degree by making meaningless commits. But the historical numbers he surveyed were surely not being so gamed, and they hold up well across the projects examined. Furthermore, there are also factors that could make 2GAD numbers too low, such as a project culture that favors large patches over small commits, or tools that squash many commits together. In the end, he said, those factors seem to balance each other out.
Sherwood's calculations showed that 2GAD was within 10% of the time sheet hours for Codethink's internal projects. When compared to COCOMO numbers for open-source projects, 2GAD produces somewhat smaller numbers for the largest software projects (like the kernel) and somewhat larger numbers for the smallest projects. But at least it does not rely on fudge-able input factors and it is based on hard data. Furthermore, he said, 2GAD counts activity independent of the language and type of content involved (e.g., commits to project documentation are counted). And it can easily be calculated on any subset of a project: a certain time frame, a particular branch, or a subset of contributors. It can also be used to measure the effort required to maintain an existing code base.
None of those qualities are true of COCOMO, Sherwood said. 2GAD may not be perfect, but he invited everyone present to give it a try against their own projects and see how it compares to other estimation tools. All one needs to calculate Git active days is access to simple tools like GitStats or git summary.
Where to from here
The session ended with a spirited round of questions from the audience. Most asked about factors that do not seem to be captured by the 2GAD metric—such as development styles. For example, one commenter noted that when working on a kernel project, he tends to make lots of small commits along the way in his own private branch, then rebase and merge his commits into a few larger patches. Just looking at the contributions he makes to the mainline kernel would necessarily overlook his "Git Active Days" spent on the private branch.
Sherwood replied that he was aware of the effect and that it perhaps explains why 2GAD numbers are lower than expected for the kernel. But he emphasized again that, flawed though it may be, it is a far more defensible measurement of project effort than COCOMO. There may be plenty of room for improvement, but at least the method can provide a solid baseline. In an industry where budgets and timelines are often established by comparing a proposal against some similar-sounding open-source project, Sherwood argued, any improvement over the unreliable "state of the art" is a welcome change.
[The author would like the thank the Linux Foundation for
travel assistance to attend Embedded Linux Conference Europe.]
| Index entries for this article | |
|---|---|
| Conference | Embedded Linux Conference Europe/2015 |
