From lab to libre software: how can academic software research become open source?

Posted Oct 26, 2017 18:44 UTC (Thu) by sfeam (subscriber, #2841)
In reply to: From lab to libre software: how can academic software research become open source? by nim-nim
Parent article: From lab to libre software: how can academic software research become open source?

There seem to be two discussions working at cross-purposes here, perhaps because of a subtle distinction between "academic software research" and "academic research software". I have decades of history and publication as an academic researcher with a focus on the development of software for structural biology. Of course we have an interest and incentive in sharing the object of our research. But the software is not by itself the object of our research, it is a tool which allows us to conduct that research. Toolmakers are sadly under-appreciated by government funding agencies, but good tools are recognized and appreciated by the target research community and I have never found that publication and consequent academic recognition was hindered by developing the code as open source. The GPL is not a good match for many academic software projects, but that's a separate discussion. So no, for the case of "academic research software" I think even your reformulated statement misses the mark. I'll go further and say that in my field shared software has a much better track record and success rate than equivalent competing projects that are kept purely in-house by the originating research group. Because of this, research proposals that do not incorporate plans for making the software tools public are dinged both by reviewers and by funding agencies.

From lab to libre software: how can academic software research become open source?

Posted Nov 2, 2017 6:29 UTC (Thu) by einar (guest, #98134) [Link] (5 responses)

> Of course we have an interest and incentive in sharing the object of our research.

Is this true for smaller institutions? In my field (bioinformatics), unless you're in a large institution that can handle these things (and often will cripple them with ridicolous licensing, but that's another story) I've seen tools kept as extremely guarded secrets. Because you *might* (but often, never) publish them one day.
(That said, I actively engage in communities that understand that collaboration, even on software, is important).

Also, you briefly touched on an important point for fields where software itself is not the object of research: you will never (or very rarely) get funding for maintaining software. So it's up to the Ph.D. students and the post-docs to create software, if needed, only to abandon it after publication or because they move elsewhere. A sad state of affairs.

From lab to libre software: how can academic software research become open source?

Posted Nov 3, 2017 16:22 UTC (Fri) by pboddie (guest, #50784) [Link] (4 responses)

Is this true for smaller institutions? In my field (bioinformatics), unless you're in a large institution that can handle these things (and often will cripple them with ridicolous licensing, but that's another story) I've seen tools kept as extremely guarded secrets. Because you *might* (but often, never) publish them one day.

It disappoints me slightly that the article focused largely on what might be regarded as computer science research or artefacts thereof, whereas there are huge challenges in other disciplines in the delivery of "sustainable science". Interestingly, Jupyter (silly name, I think) attempts to tackle this challenge.

As to whether software is publishable, I think that especially in disciplines like bioinformatics (where I have some experience), software is frequently and disappointingly seen as being a disposable means to an end. It seemed to me that people were quite happy to surf around looking for a Web service that would tell them what they wanted without any curiosity about how that particular tool was made; if it was backed by a publication they'd be reassured, but then that publication might not have any software or data attached to it, perhaps inviting inquirers to make contact to "collaborate".

I did see some interest in the tools I was writing for the group in which I worked. People certainly saw the need to take advantage of things that process existing datasets, and the different public databases seemed to realise that people wanted software to work with the data, even though "Web services" were also touted as an option, which really isn't viable at all for anything more than ad-hoc queries unless the database supplier likes being hammered with HTTP requests (another idiotic habit we had to deal with, in one case involving some US research group whose IP addresses I eventually ended up banning).

Fortunately, my boss kept the rights to the software we developed and released it as Free Software, thus undermining the stupid and greedy "commercialisation" doctrine and organs of the university that were busy inhibiting other people's non-software work. In theory it lives on, but then the matter of sustaining development once everyone has moved elsewhere becomes a problem.

From lab to libre software: how can academic software research become open source?

Posted Nov 3, 2017 17:58 UTC (Fri) by sfeam (subscriber, #2841) [Link] (3 responses)

The situation is much better than you make it appear. Sure there's junk bioinformatics software, just as there's junk wherever Sturgeon's Law applies, which is nearly everywhere. And if you rely on randomly found web services you should indeed be concerned about their quality. But contrary to what you imply, the key software tools are published, reviewed, shared, and often accompanied by test data sets. Decent documentation is admittedly often a sore point. With specific regard to bioinformatics software provided as a web service, I suggest you look at the special issues devoted annually to exactly this class of software in the journal Nucleic Acids Research [*]. Their requirements for publication, documentation, validation suites, demonstrated community use, etc are all very solid.

[*] Yeah the name of the journal does not make it the obvious place to look for such a focus, but it has become a first rank quality touchstone for web-based bioinformatics software.

From lab to libre software: how can academic software research become open source?

Posted Nov 5, 2017 16:45 UTC (Sun) by pboddie (guest, #50784) [Link] (2 responses)

I'm not a true bioinformatician, merely a software engineer who got involved with people doing bioinformatics, but while I agree that many of the tools and services I had to work with were largely robust in terms of methods and transparency of operation, I would have reservations about whether the audience of those tools and services have the means or even the inclination to review what those tools and services do.

Maybe I should clarify my remarks about the audience for such things, though. While I might expect bioinformaticians to be equipped to evaluate software and services, largely because such people should be familiar with the computational and engineering aspects of such work, there are plenty of other people who use things like Web services to get "answers". Admittedly, this doesn't affect the "bread and butter" services like databases, but is more of a concern with services performing some level of analysis. (I remember a newly-pitched literature-mining service using "what causes cancer" as its Google-style front page example.)

Maybe things are actually great and I never realised it, or maybe I was working more in niches where people were more cavalier about what constitutes the validation of a particular tool, so that instead of just publishing their software and data and allowing people to get their hands dirty, there had to be a back-and-forth to get the code, not necessarily constructed using great engineering practices.

I will say that I was encouraged by the software tool use by some institutions, even though some of their choices weren't always to my liking. But then I must also say that even though our group made all our code and data available, there really wasn't much interest in reproducing what we did. For the most part people wanted us to do everything for them, and I certainly got more confirmation that software development is hardly valued in the field: people will drop big bucks for sequencing equipment but expect the analysis to happen for free by random people elsewhere.

From lab to libre software: how can academic software research become open source?

Posted Nov 5, 2017 19:18 UTC (Sun) by sfeam (subscriber, #2841) [Link] (1 responses)

You have shifted to highlighting a separate problem that is indeed serious. A very real problem when developing tools for sophisticated data analysis is that the more you make them easy to use, e.g. providing a web interface, the more likely it is that you attract users who do not understand when the tool is or is not appropriate. This is true regardless of how well engineered, documented, or reviewed the software behind that easy-to-use web interface may be.

To the extent it is even possible to address this problem, I think it must come through better education of the target user group. That is partly what I had in mind when I said that good documentation is a recurring sore point. It is too often hard for non-experts, even in the same field, to understand under which set of conditions their data is best analysed by method A rather than method B, or tool C rather than tool D. The wrong choice may lead to an erroneous scientific conclusion even when both A and B are perfectly valid methods and tools C and D are correctly implemented, each in their own respective domain of applicability.

Hiring more programmers during development is not going to solve this at all. The open/closed status of the source code is also irrelevant to this end of the problem. An example of something that does help, though it only works if you can attract outside funding to set it up, is to hold well-publicized and "fun" contests that pit competing tools against each other. For example the CASP competitions that pit competing approaches to predicting what 3D shape is formed by a protein produced from a particular DNA sequence. The larger community tends to remember, and use, the winning tools even if they don't understand the details of why it performed better. It is notable that if you look at the CASP winners, many of them use open source toolkits and libraries. And those shared code bases are improved by feedback from competition between comtributing groups. We need more of this.

From lab to libre software: how can academic software research become open source?

Posted Nov 6, 2017 0:25 UTC (Mon) by pboddie (guest, #50784) [Link]

You have shifted to highlighting a separate problem that is indeed serious.

Don't worry: I'm willing to discuss all problems here!

A very real problem when developing tools for sophisticated data analysis is that the more you make them easy to use, e.g. providing a web interface, the more likely it is that you attract users who do not understand when the tool is or is not appropriate. This is true regardless of how well engineered, documented, or reviewed the software behind that easy-to-use web interface may be.

Yes, but there is arguably more of a demand for attractive, "easy to use" Web services rather than tools. Experiences may vary with regard to what is publishable or not and what the expectations of the reviewers are.

On the former topic, I have my name on a publication about a database that I doubt my previous boss would have regarded interesting enough for publication, but for a publication venue for my then boss it passed the threshold. That is the difference between more bioinformatics-related journals, where the computational techniques would be emphasised, and biology-related journals who probably want a greater emphasis on, say, experimental techniques or theory. (For all I know. What I did perceive, however, was that in the evaluation of research, if you have people who don't "rate" bioinformatics journals because they aren't amongst the ones they know, the research achievements don't get properly recognised.)

On the second topic, the publication in question got remarks about the user interface from the reviewers. It was clear that they wanted something slick and attractive, although decades after the introduction of usability research, people still don't understand that this is largely an iterative process that you really don't want to do in the confines of an article review. Fortunately for everyone concerned, being a relatively simple database, there wasn't much of a trade-off between "looks great" and "obscures what the tool does". We also worked with a group who put quite a bit of emphasis on the look and feel of their Web front-end to my colleagues' work. Again, for certain audiences (and potentially the ones you need to educate), it seems that good-looking things can be seen as more publishable, sometimes deservedly so (they introduce useful visualisations), other times arguably not.

I agree that audience education is essential, wondering if I didn't state or imply that in what I wrote. I also had experience of competitions between tools which were useful to the extent that you could see what other people's tools were supposedly capable of, but I might also suggest that they were distractions in various respects: you can end up focusing on limited datasets, tuning for potentially atypical data, and still not really learning what people were doing.

I remember one participant in a meeting around one of these competitions saying that he rather doubted that various people employing certain machine learning approaches really understood what they were doing. Another doubted that by making opaque tools we were gaining any insight into the problems to be solved (which is also a hot topic with regard to "AI" these days). To an extent, I got the impression that some of these competitions were profile-sustaining activities for certain research groups, and if the code was freely available then people would get many of the benefits anyway.

My remarks about paying for development weren't made in the context of improving the application of the scientific method, but rather an observation about the status of developers in certain parts of academia. I also have to dispute your assertions about code availability somewhat, not to be contrary, but I had actual experiences of methods and code differing when I was able to review them both. Of course, if no-one looks at the code, and my impression was that the audience was under-resourced and unlikely to look at it, then making everything available doesn't solve all the problems.