|
|
Log in / Subscribe / Register

Estimating the costs of open-source development

By Nathan Willis
October 7, 2015

ELCE

At the 2015 Embedded Linux Conference Europe in Dublin, Paul Sherwood from Codethink presented an intriguing take on the common problem of estimating the length of time that a development project will take. In particular, he called out the widespread Constructive Cost Model (COCOMO) as being demonstrably unscientific and, thus, useless at predicting the future, then presented a far simpler alternative metric based on Git activity. The proposed new metric, he said, can be shown as useful by examining well-known open source software projects, including the Linux kernel.

At the beginning, Sherwood acknowledged that he was "biting the hand that feeds him" by challenging some statistics frequently cited by, among others, the Linux Foundation (LF). He has seen it claimed [note: signup required at link], he said, that it costs an average of $250,000 per year to maintain an out-of-tree kernel patch, and that the total amount of developer time [PDF] invested in creating core Linux projects is the equivalent of 1,356 developers working for 30 years. But the $250,000 number sounds dubious and the 1,356 sounds far too specific to be an "estimate"—and neither value comes with any supporting documentation.

[Paul Sherwood]

Similarly, estimates of the value of Linux or even the number of lines of code (LOC) in the kernel are often quoted without hard evidence to back them up, a practice that clearly ruffles Sherwood's feathers. In fact, the LF said in 2013 that the kernel contained about 17 million lines of code, Sherwood said. But when he ran David Wheeler's SLOCCount against it, he got a significantly different number: 12 million lines of code (a 30% discrepancy). The numbers reported by OpenHub are even larger still, and no one can explain the difference when he asks.

In the absence of real statistics, he said, the software industry relies on vague and suspicious estimates when assessing the size and complexity of projects. And the problem of precision is not limited to pull quotes in promotional material. Companies regularly rely on estimation tools to predict the time and budget that a new project will require, and those estimates, by and large, seem to be founded on unsupported numbers. Sometimes such corporate estimates are based on past internal projects, but it is increasingly common for companies to compare their projects to an open-source project of similar complexity and scope. For example, a company embarking on an effort to write a new Linux graphics driver may look at several recent graphics-driver projects to approximate the time and engineering resources it will take. Finding a valid metric to serve in such situations is a hard problem, Sherwood said, but he has at least worked out a plausible alternative by analyzing several open-source software projects.

The prediction problem

To predict project costs, people have attempted to count all manner of indicators: LOC, number of commits, time sheet hours, even "WTFs per week." LOC is a naive measurement, although it is an easy one to understand. Commits and time sheet hours are also simple, but they are difficult to compare across projects and companies, and can be gamed (such as by breaking changes up into an excessive number of commits). The most common estimation tool in the industry, though, is the complicated COCOMO formula, he said—a method that routinely generates "enormous numbers" that no one can support with data. COCOMO dates back to the 1980s, but it has continued to be discussed, written about, and used in business environments up through the present day. It is the source of OpenHub's estimates about the value of open-source software, for example.

COCOMO has obvious flaws, he said, such as counting only positive increases in LOC as its measurement of progress. Blank lines and comments thus get taken into account, while removing dead code or rewriting existing code does not count at all. But the trouble is worse than that: COCOMO is inherently problematic because it is a function of multiple unreliable input variables. Many of these variables take predictive powers to fill (such as knowing the eventual hardware attributes that will be available on the deployed system) or are subjective (such as developer efficiency).

In reality, he said, project managers can tweak all of these inputs enough to fully control the output of the COCOMO function, so it is meaningless. Engineers and managers have learned to over-estimate their COCOMO numbers so that they appear more productive when they beat the estimate. Just as importantly, he said, looking at COCOMO estimates from past projects invariably shows them to be unreliable. But the method still gets used, in large part because there is no proven alternative. One can complain about COCOMO, and its users will shrug and say "what else do we have?"

Moreover, even those organizations that do not subscribe to COCOMO estimates often fall victim to them because the method is so widespread. A widely cited statistic, for example, is that the "Agile" development methodology reduces many of the costs of fixing bugs the "old-fashioned way"—in which, supposedly, fixing a bug in the planning phase is ten times cheaper than fixing it in the development phase. But that 10x factor originates in COCOMO estimates, he said, so comparing a new method to it has no value. "I think there was no scientific basis for COCOMO," Sherwood said in conclusion, "and there is no proof that is has worked historically. Until someone can prove otherwise, that's what I'm calling it."

Sherwood pointed the audience to several external resources on the unreliability of business prediction models, such as Nassim Taleb's The Black Swan: The Impact of the Highly Improbable and Dan Gardner's Future Babble. But, he continued, decrying bad numbers from COCOMO is not enough: the industry clearly needs to find a way to measure development costs. Without an alternative, "guesswork" like COCOMO will persist.

By GAD

In response to the shortage of hard evidence, Sherwood has been researching alternative methods of counting the effort required to complete software projects. His research looked at internal projects at Codethink as well as at well-known open-source software like the kernel, QEMU, systemd, GStreamer, OpenSSH, and various GENIVI projects (to which Codethink is a contributor).

Ultimately, he said, the one metric that he has found to most accurately reflect the time it takes to complete a software project is what he calls 2GAD. "GAD" stands for "Git active days" and is simply the number of days on which a developer makes a Git commit—of any kind. Summing up all of the GADs for every project participant and multiplying by two produces a relatively accurate count of how long it takes to move a project to completion.

[Paul Sherwood's 2GAD results]

He admitted that there are factors that could make 2GAD numbers too high—for instance, when some developers commit to multiple repositories or branches in one day, they may be counted twice. And it would be possible for developers to "game" the system to a degree by making meaningless commits. But the historical numbers he surveyed were surely not being so gamed, and they hold up well across the projects examined. Furthermore, there are also factors that could make 2GAD numbers too low, such as a project culture that favors large patches over small commits, or tools that squash many commits together. In the end, he said, those factors seem to balance each other out.

Sherwood's calculations showed that 2GAD was within 10% of the time sheet hours for Codethink's internal projects. When compared to COCOMO numbers for open-source projects, 2GAD produces somewhat smaller numbers for the largest software projects (like the kernel) and somewhat larger numbers for the smallest projects. But at least it does not rely on fudge-able input factors and it is based on hard data. Furthermore, he said, 2GAD counts activity independent of the language and type of content involved (e.g., commits to project documentation are counted). And it can easily be calculated on any subset of a project: a certain time frame, a particular branch, or a subset of contributors. It can also be used to measure the effort required to maintain an existing code base.

None of those qualities are true of COCOMO, Sherwood said. 2GAD may not be perfect, but he invited everyone present to give it a try against their own projects and see how it compares to other estimation tools. All one needs to calculate Git active days is access to simple tools like GitStats or git summary.

Where to from here

The session ended with a spirited round of questions from the audience. Most asked about factors that do not seem to be captured by the 2GAD metric—such as development styles. For example, one commenter noted that when working on a kernel project, he tends to make lots of small commits along the way in his own private branch, then rebase and merge his commits into a few larger patches. Just looking at the contributions he makes to the mainline kernel would necessarily overlook his "Git Active Days" spent on the private branch.

Sherwood replied that he was aware of the effect and that it perhaps explains why 2GAD numbers are lower than expected for the kernel. But he emphasized again that, flawed though it may be, it is a far more defensible measurement of project effort than COCOMO. There may be plenty of room for improvement, but at least the method can provide a solid baseline. In an industry where budgets and timelines are often established by comparing a proposal against some similar-sounding open-source project, Sherwood argued, any improvement over the unreliable "state of the art" is a welcome change.

[The author would like the thank the Linux Foundation for travel assistance to attend Embedded Linux Conference Europe.]

Index entries for this article
ConferenceEmbedded Linux Conference Europe/2015


to post comments

Estimating the costs of open-source development

Posted Oct 8, 2015 7:10 UTC (Thu) by hickinbottoms (subscriber, #14798) [Link] (3 responses)

While I'm not trying to defend COCOMO I think suggesting it doesn't count lines changed or removed may be missing its intended purpose, as I understand it.

When COCOMO was developed I think we were in the mindset of developing complete, working and stable systems. Thus, COCOMO predicts the number of person-hours and number of people required to get to a point of completion for project of a particular size from a standing start. Necessarily, to get to a point of completion the project will have been through its internal defect fixing, rewriting, pruning failed experimental or replaced code etc, and therefore that was in the originally gathered metrics and therefore factored into the overall COCOMO calculation.

It's clear to me, though, that COCOMO just fails to match modern open source development methods of projects that are, for practical purposes, never 'finished', and almost never start from nothing. While it may (or may not) be able to usefully give some idea (however inaccurate that may be) of how much time/effort was required to get from zero to any point in time, it's very poor at predicting remaining effort for future work because it's not a method that, as the article makes clear, is driven by change metrics.

Perhaps it's a matter of choosing the right tool for the right job, which is surely something we all understand!

Estimating the costs of open-source development

Posted Oct 8, 2015 9:45 UTC (Thu) by devcurmudgeon (guest, #84099) [Link] (1 responses)

It's not just open source projects that 'are, for practical purposes, never 'finished', and almost never start from nothing'. I think that describes almost all significant software projects today?

Estimating the costs of open-source development

Posted Oct 8, 2015 20:50 UTC (Thu) by rriggs (guest, #11598) [Link]

> I think that describes almost all significant software projects today?

"Today" and, if the number of claimed active lines of code written in COBOL are true, the entirety of the computing era.

Pie

Posted Oct 8, 2015 12:24 UTC (Thu) by ncm (guest, #165) [Link]

Every time I have encountered the expression "choosing the right tool for the right job" it has been used in support of a specious, misleading, or actively wrong thesis. (Most commonly it seems to appear in support of Java as a development platform.). That might seem odd, because there is nothing wrong with the principle. There is also nothing wrong with apple pie. When apple pie is your best argument, it implies you don't really have one, and generally don't have anything to say, however many column-inches you needed not to say it.

When you find yourself tempted to conclude with apple pie, please delete your whole posting immediately. If you started out with something to say, it has been reduced to mush. Posting mush advances neither the discussion nor your reputation. If you think it is not mush, it means you need to work harder to learn to recognize mush.

How does this help?

Posted Oct 8, 2015 20:57 UTC (Thu) by rriggs (guest, #11598) [Link] (6 responses)

Unless I am missing something, this has no predictive value. It is purely backwards-looking. Interesting to be sure, but nothing that will help me more accurately *predict* the cost or completion date of software.

How does this help?

Posted Oct 9, 2015 8:13 UTC (Fri) by fishface60 (subscriber, #88700) [Link] (5 responses)

You make the prediction by comparing the project you want to do against a similar project (i.e. a graphics driver if you're making a graphics driver from scratch for a new chipset), either an internal project or an existing Open Source project.

What is 2GIT for?

Posted Oct 11, 2015 3:26 UTC (Sun) by giraffedata (guest, #1954) [Link] (4 responses)

Thanks; I had the same confusion. But I'm still not completely edified.

What are the units of the 2GIT number?

It seems to me that the obvious way to predict the completion date of a project based on knowledge of a previous similar one would be to subtract the start date of the previous project from its completion date and add that to the start date of the new project. How does one use the GIT2 number to do better than that?

What is 2GIT for?

Posted Oct 11, 2015 8:16 UTC (Sun) by dlang (guest, #313) [Link]

> t seems to me that the obvious way to predict the completion date of a project based on knowledge of a previous similar one would be to subtract the start date of the previous project from its completion date and add that to the start date of the new project. How does one use the GIT2 number to do better than that?

It takes into account the number of people who worked on the prior project and the current one (which your estimate doesn't do)

What is 2GIT for?

Posted Oct 11, 2015 11:11 UTC (Sun) by devcurmudgeon (guest, #84099) [Link] (2 responses)

"What are the units of the 2GIT number?"

Assuming you mean 2GAD, it's an estimate of person-days.

What is 2GIT for?

Posted Oct 11, 2015 12:42 UTC (Sun) by micka (subscriber, #38720) [Link] (1 responses)

So, if you get an estimate of 1000 'person-days’, you just need to get 1000 persons to have it ready by tomorrow?

What is 2GIT for?

Posted Oct 11, 2015 16:21 UTC (Sun) by jospoortvliet (guest, #33164) [Link]

... We all know how that would play out ...

Estimating the costs of open-source development

Posted Oct 10, 2015 16:55 UTC (Sat) by david.a.wheeler (subscriber, #72896) [Link] (4 responses)

As the author of SLOCCount, I feel I should respond.

Actually, I agree that better metrics would be a great thing. COCOMO has been around for a long time, and really, it's time to replace it with something better. Its problems are well-known. For example, COCOMO is based on older project data, and doesn't take advantage of the data from sources such as version control repositories. COCOMO is just one estimation approach; let the alternatives bloom, and if there's a better one, then let's use it.

I'm skeptical of the "2GAD" measure on first glance; I'd like to see more information.

However, I think there are a lot of misunderstandings here. COCOMO (including COCOMO II) is a well-known estimation approach; its justification, including the project data it's based on and estimation accuracy, are all published. Also, not everyone uses GitHub; Free Software developers will generally shun it, since it's not Free Software nor open source software (*git* is, but that's different). Indeed, in many cases (especially with older projects) the version control information isn't available at all; I think it's criminal today to *not* have a version control program, but it's hard to use data you don't have. COCOMO is intended to only be used to measure how much effort to get from "0" to "project of this size" - so expecting it to measure what it wasn't intended to measure seems odd. Also, there are multiple ways to measure lines of code (LOC); you have to identify which one you're using (and if you don't, that would explain the variance).

Estimating the costs of open-source development

Posted Oct 10, 2015 18:14 UTC (Sat) by devcurmudgeon (guest, #84099) [Link] (3 responses)

David, thanks for your input. You may be right that I'm misunderstanding, but let's clear up some things:

'I'm skeptical of the "2GAD" measure on first glance; I'd like to see more information.'

Fair enough. I'm hoping some bright young things will get around to doing formal research. For now, there's code at https://github.com/CodethinkLabs/research-git-active-days.

'[COCOMO's]... justification, including the project data it's based on and estimation accuracy, are all published'

I'd be pleased to see the data, but I'm not going to spend a load of time trying to chase it down. 'The Leprechauns of Software Engineering' reinforced my natural skepticism. In any case, as I said in my slides - if code is re-written to make it smaller, COCOMO assumes the project took less effort than it did before the re-write, which is clearly broken.

'...not everyone uses GitHub'

true, but so what? The approach I'm describing doesn't need github. In practice one of the scripts my colleague wrote takes data from a set of mirrors at git.baserock.org (which is entirely open source software), including conversions to git of many projects whose upstreams user other VCS. In fact the same information could be extracted from any VCS, but it's more convenient (and faster) to operate on git repos.

'Indeed, in many cases (especially with older projects) the version control information isn't available at all'

True. But there are many many more cases where the data is available, in one version control system or another.

'COCOMO is intended to only be used to measure how much effort to get from "0" to "project of this size" - so expecting it to measure what it wasn't intended to measure seems odd.'

In the absence of anything else, after several decades, folk seem to be using COCOMO for all kinds of things, including justification of 'A $5 Billion Value' for example.

'there are multiple ways to measure lines of code (LOC); you have to identify which one you're using (and if you don't, that would explain the variance).'

I'm only ever using SLOCCount. The odd thing is that Open Hub and LF claim to be using it too, but it seems to give smaller numbers for me.

Estimating the costs of open-source development

Posted Oct 10, 2015 20:35 UTC (Sat) by david.a.wheeler (subscriber, #72896) [Link] (2 responses)

The original COCOMO model and dataset were published in Barry Boehm's book "Software Engineering Economics". The updated COCOMO II, with model and justifying dataset, were published in the later "Software Cost Estimation with COCOMO II". These books describe in detail the model, the datasets, and the statistics used to create the parameter values from the model and the original datasets. The books also provide information on how to re-calibrate the values for different scenarios; sadly, few people have enough data to do a re-calibration, so everyone keeps using the uncalibrated values from the Boehm's original (old) dataset. It's important to not oversell COCOMO; it's just a simple mathematical model that uses past project data to predict effort and time for other projects.

SLOCCount uses the original COCOMO model because its input (physical SLOC) was much easier to calculate over a large set of programming languages. COCOMO II uses logical SLOC, which is arguably more accurate but also takes much more code to implement (patches welcome). I originally created SLOCCount to do the analysis for my paper More than a Gigabuck, which provides more background and discussion about it. Both versions of COCOMO have a set of parameters called cost drivers to help refine estimation, however, my original purpose was to create estimates over a very large number of OSS programs, so I had no way to apply those refinements in general. Besides, I was just trying to find a rough estimate of "how much code is there?" in a form that people who don't develop software can understand; dollars communicate more easily than "lines of code". Uncalibrated parameters were all I could do for that paper, and it's arguable that averaging values over many projects reduces the error, so that's what I used.

Even if you're going to use COCOMO, I would expect it to work better if cost drivers were used, and MUCH better if you calibrated it. Boehm often recommended calibration himself. In any case, software development has changed over the years... you would EXPECT that things would be different enough that re-calibration is needed anyway.

That said, again, it would great to have better models to estimate effort. I'm skeptical of 2GAD specifically, but version control information certainly could be a real improvement over SLOC models like COCOMO. I wish you success!

Estimating the costs of open-source development

Posted Oct 14, 2015 2:41 UTC (Wed) by glandium (guest, #46059) [Link] (1 responses)

SLOCCount also uses 15 years old figures for the default annual salary, which was $56,286 as per a 2000 ComputerWorld salary survey. The 2014 ComputerWorld salary survey gives a programmer/analyst base salary of $73,463, which is 30% more.

Estimating the costs of open-source development

Posted Oct 17, 2015 18:45 UTC (Sat) by magsilva (guest, #378) [Link]

That is not really an issue of SLOCCount, as it can be configured for different values. Users can configure the salary (--personcost) and the factors for effort (--effort F E) and timing (--schedule) estimations. The default values consider the project under estimation as organic, but users should set them considering the proper project classification (organic, semi-detached, or embedded). Actually, the best solution would be to recalculate the factors based upon historical data, reflecting specific characteristics of the organization and type of software under development.


Copyright © 2015, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds