LWN.net Logo

Ten simple rules for the open development of scientific software

Ten simple rules for the open development of scientific software

Posted Dec 29, 2012 21:45 UTC (Sat) by oever (subscriber, #987)
In reply to: Ten simple rules for the open development of scientific software by JMB
Parent article: Ten simple rules for the open development of scientific software

The whole point of publishing software along with articles is so that others may easily check the published results. The PLOS article notes that "few papers are accompanied by open software". From experience as researcher and developer in physical chemistry ('98-'03), bioinformatics ('04-'07) and X-Ray crystallography ('07-'09) I can say that this is valid.

Scientists love to use FOSS stacks, but mostly do not publish their own code. This is justified by saying that others could implement the described algorithms and achieve the same results that way. This is true when the algorithms have been documented completely and resources are infinite. Making a second implementation would indeed be a good check, but also takes a lot of work. Incentive for recreating the software that would yield no new publishable material is low.

Journals should and sometimes do require that source code is published with articles.

Of course there are exceptions to the rule. Tartini is a nice desktop application for analyzing musical performance. NSGT toolbox is a library for transforming audio from the time domain to the logarithmic frequency domain (FFT transforms to linear frequency domain).


(Log in to post comments)

Ten simple rules for the open development of scientific software

Posted Dec 29, 2012 22:17 UTC (Sat) by dskoll (subscriber, #1630) [Link]

Your link to Tartini looked interesting, but when I tried compiling it on Debian Squeeze, it failed miserably. It looks like Tartini illustrates the author's original point: A lot of academic software is not easily portable to any machine other than the author's workstation, it doesn't use standard tools like autoconf, and it bit-rots.

Too bad, because Tartini looks really cool...

Ten simple rules for the open development of scientific software

Posted Dec 30, 2012 0:07 UTC (Sun) by oever (subscriber, #987) [Link]

The fact that Tartini does not compile under current Debian shows that an application that was working fine with a previous version of the libraries stopped being compilable, let alone usable, because Debian has changed so much in just a few years that Tartini does not compile. Is that Tartini's fault or Debian?

Should we expect researchers to keep software up to date with changes in compilers and available libraries? Tartini uses Qt4, a perfectly fine library as is Qt3 and Qt2. Yet, software that relies on Qt2 has a hard time working on a current Linux system.

Ten simple rules for the open development of scientific software

Posted Dec 30, 2012 0:30 UTC (Sun) by dskoll (subscriber, #1630) [Link]

The fact that Tartini does not compile under current Debian shows that an application that was working fine with a previous version of the libraries stopped being compilable, let alone usable, because Debian has changed so much in just a few years that Tartini does not compile. Is that Tartini's fault or Debian?

Clearly, Tartini's. Debian is about the least bleeding-edge you can get with Linux. The build files for Tartini contain hard-coded paths to specific directories like /home/inferno/research/pitch/lib

I've worked in both academia and industry and know that academic software is not often built with the thought of actually distributing it or maintaining it in mind. It's just an unfortunate fact.

Ten simple rules for the open development of scientific software

Posted Dec 30, 2012 2:49 UTC (Sun) by yarikoptic (subscriber, #36795) [Link]

> ..., Debian, the least bleeding-edge you can get with Linux

I beg a pardon... Debian is not only Debian stable -- there is also testing, unstable and even experimental. With unstable+experimental you might be as close to being bleeding as possible, while maintaining still usable and relatively stable system.

But this example is indeed a very nice to point out that source code itself, although a huge step forward, is not all what is needed for proper scientific methods dissemination since building/deploying of the "code" might be quite involved at times. Happened authors created proper Debian packages, uploaded them to Debian unstable (the entry point for new packages into Debian) -- it would have resolved many of those benefits others have mentioned:

-The "code" could immediately being used by Debian (and thus its >130 derivatives) users,
-its hardware platform agnosticism would be verified by building across >10 of those Debian supports
- happen there would be unittests ran at build-time -- at least some aspects of hardware platform "reproducibility" would also come "for free"
- longevity of such "code" would be in years due to inclusion/maintenance in Debian stable later on,

Want to read more on our (neuro.debian.net) position/experience -- you are welcome to read
http://www.frontiersin.org/Neuroinformatics/10.3389/fninf...
Open is not enough. Let’s take the next step: an integrated, community-driven computing platform for neuroscience

Ten simple rules for the open development of scientific software

Posted Jan 4, 2013 20:16 UTC (Fri) by pboddie (subscriber, #50784) [Link]

What you and others are saying is that what's missing is the software engineering. People can write code to consume and produce data in order to demonstrate something, get published, and so on, but if others are to benefit from that code in any convenient way, there's the usual amount of software engineering required to achieve this.

Some might dispute whether sharing the code is necessary, but if any algorithm is going to be described in detail - and I doubt that they are described in sufficient detail, especially in disciplines other than pure computer science - then it would be better if the code were available, better still if it could be conveniently used in order to rule out coincidental hardware- or infrastructure-related effects, and even better still if it were well-structured and well-documented. Once again, software engineering is the missing ingredient.

Unfortunately, the funding in many environments probably doesn't cover anything beyond getting something working and getting a paper out the door (and thus attracting more funding). After all, there's always another Web service to use or another bundle of Java class files to stuff into the JVM to massage one's data and produce a "result", and nobody's asking for money, so what's the problem? Right? That's probably the prevailing attitude that needs changing.

Ten simple rules for the open development of scientific software

Posted Dec 30, 2012 0:46 UTC (Sun) by paulj (subscriber, #341) [Link]

And this kind of thing illustrates that what is *really* needed is to fully describe, in the most natural, concise but precise language the author can manage, the essential methodology of the experiment in the paper. Releasing the software does NOT substitute for that, in terms of increasing the reproducibility of the experiment.

Even seasoned software engineers will find it difficult to distribute software that will just run on a wide variety of machines - unless they do so as something that will boot on something that is close to a universal machine (e.g. x86 VMs). Even then, it's far from guaranteed.

Ten simple rules for the open development of scientific software

Posted Jan 4, 2013 0:40 UTC (Fri) by JoeBuck (subscriber, #2330) [Link]

I was privileged to do my graduate research in a culture (UC Berkeley EECS department) that did rock-solid open source development and released a whole lot of software that was built upon by other groups. I agree that research software should be released, ideally open source, and if the university legal department sets up roadblocks, at least it should be made available on a restricted-use basis. However, it's a mistake to over-emphasize the software, and there may be advantages in having other groups re-implement the algorithms rather than just use the same code.

If Research Group A publishes a paper and releases software, Research Group B can run the software and observe the same result. But this doesn't mean that the result is correct; the software might be wrong. Similarly, claims that algorithm A is superior to algorithm B can be confused with the fact that implementation A is better than implementation B, but a bug in B's implementation led to worse performance than could have been achieved.

Ten simple rules for the open development of scientific software

Posted Jan 4, 2013 10:44 UTC (Fri) by dark (subscriber, #8483) [Link]

I'd find this argument more convincing if it didn't also apply to publishing the data.

It's enough if scientific papers just describe the experimental protocol and their conclusions. It's a mistake to over-emphasize publishing the data; after all, research groups who are interested in verifying the result should run their own experiment instead of re-analysing the same data.

The flaw in the argument here is that if there are mistakes in the original group's analysis then they are exposed by publishing the data along with the conclusions, just like mistakes in software implementation would be exposed by publishing it. Forcing other groups to re-do the work and then guess why their results are different will instead hide these problems.

Publishing experimental data along with the conclusions drawn from it is considered essential; publishing the software used should be considered essential for the same reasons. In both cases, it makes sense to provide only a summary if there's no space for all of it (as in a print article); in that case, showing the implementation of the crucial parts of the algorithm would suffice. We can take the command-line parsing on faith :)

Ten simple rules for the open development of scientific software

Posted Jan 24, 2013 20:41 UTC (Thu) by raalkml (guest, #72852) [Link]

FWIW, it wasn't particularly hard to fix for Debian testing.
If someone is still interested, I could send the fixes I had to do your way.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds