LWN.net Logo

Ten simple rules for the open development of scientific software

Ten simple rules for the open development of scientific software

Posted Dec 30, 2012 0:46 UTC (Sun) by paulj (subscriber, #341)
In reply to: Ten simple rules for the open development of scientific software by dskoll
Parent article: Ten simple rules for the open development of scientific software

And this kind of thing illustrates that what is *really* needed is to fully describe, in the most natural, concise but precise language the author can manage, the essential methodology of the experiment in the paper. Releasing the software does NOT substitute for that, in terms of increasing the reproducibility of the experiment.

Even seasoned software engineers will find it difficult to distribute software that will just run on a wide variety of machines - unless they do so as something that will boot on something that is close to a universal machine (e.g. x86 VMs). Even then, it's far from guaranteed.


(Log in to post comments)

Ten simple rules for the open development of scientific software

Posted Jan 4, 2013 0:40 UTC (Fri) by JoeBuck (subscriber, #2330) [Link]

I was privileged to do my graduate research in a culture (UC Berkeley EECS department) that did rock-solid open source development and released a whole lot of software that was built upon by other groups. I agree that research software should be released, ideally open source, and if the university legal department sets up roadblocks, at least it should be made available on a restricted-use basis. However, it's a mistake to over-emphasize the software, and there may be advantages in having other groups re-implement the algorithms rather than just use the same code.

If Research Group A publishes a paper and releases software, Research Group B can run the software and observe the same result. But this doesn't mean that the result is correct; the software might be wrong. Similarly, claims that algorithm A is superior to algorithm B can be confused with the fact that implementation A is better than implementation B, but a bug in B's implementation led to worse performance than could have been achieved.

Ten simple rules for the open development of scientific software

Posted Jan 4, 2013 10:44 UTC (Fri) by dark (subscriber, #8483) [Link]

I'd find this argument more convincing if it didn't also apply to publishing the data.

It's enough if scientific papers just describe the experimental protocol and their conclusions. It's a mistake to over-emphasize publishing the data; after all, research groups who are interested in verifying the result should run their own experiment instead of re-analysing the same data.

The flaw in the argument here is that if there are mistakes in the original group's analysis then they are exposed by publishing the data along with the conclusions, just like mistakes in software implementation would be exposed by publishing it. Forcing other groups to re-do the work and then guess why their results are different will instead hide these problems.

Publishing experimental data along with the conclusions drawn from it is considered essential; publishing the software used should be considered essential for the same reasons. In both cases, it makes sense to provide only a summary if there's no space for all of it (as in a print article); in that case, showing the implementation of the crucial parts of the algorithm would suffice. We can take the command-line parsing on faith :)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds