LWN.net Logo

Advertisement

MultiTail allows you to monitor logfiles and command output in multiple windows in a terminal, colorize, filter & merge.

Advertise here

Ten simple rules for the open development of scientific software

Ten simple rules for the open development of scientific software

Posted Jan 4, 2013 0:40 UTC (Fri) by JoeBuck (subscriber, #2330)
In reply to: Ten simple rules for the open development of scientific software by paulj
Parent article: Ten simple rules for the open development of scientific software

I was privileged to do my graduate research in a culture (UC Berkeley EECS department) that did rock-solid open source development and released a whole lot of software that was built upon by other groups. I agree that research software should be released, ideally open source, and if the university legal department sets up roadblocks, at least it should be made available on a restricted-use basis. However, it's a mistake to over-emphasize the software, and there may be advantages in having other groups re-implement the algorithms rather than just use the same code.

If Research Group A publishes a paper and releases software, Research Group B can run the software and observe the same result. But this doesn't mean that the result is correct; the software might be wrong. Similarly, claims that algorithm A is superior to algorithm B can be confused with the fact that implementation A is better than implementation B, but a bug in B's implementation led to worse performance than could have been achieved.


(Log in to post comments)

Ten simple rules for the open development of scientific software

Posted Jan 4, 2013 10:44 UTC (Fri) by dark (subscriber, #8483) [Link]

I'd find this argument more convincing if it didn't also apply to publishing the data.

It's enough if scientific papers just describe the experimental protocol and their conclusions. It's a mistake to over-emphasize publishing the data; after all, research groups who are interested in verifying the result should run their own experiment instead of re-analysing the same data.

The flaw in the argument here is that if there are mistakes in the original group's analysis then they are exposed by publishing the data along with the conclusions, just like mistakes in software implementation would be exposed by publishing it. Forcing other groups to re-do the work and then guess why their results are different will instead hide these problems.

Publishing experimental data along with the conclusions drawn from it is considered essential; publishing the software used should be considered essential for the same reasons. In both cases, it makes sense to provide only a summary if there's no space for all of it (as in a print article); in that case, showing the implementation of the crucial parts of the algorithm would suffice. We can take the command-line parsing on faith :)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds