From lab to libre software: how can academic software research become open source?

Posted Nov 5, 2017 19:18 UTC (Sun) by sfeam (subscriber, #2841)
In reply to: From lab to libre software: how can academic software research become open source? by pboddie
Parent article: From lab to libre software: how can academic software research become open source?

You have shifted to highlighting a separate problem that is indeed serious. A very real problem when developing tools for sophisticated data analysis is that the more you make them easy to use, e.g. providing a web interface, the more likely it is that you attract users who do not understand when the tool is or is not appropriate. This is true regardless of how well engineered, documented, or reviewed the software behind that easy-to-use web interface may be.

To the extent it is even possible to address this problem, I think it must come through better education of the target user group. That is partly what I had in mind when I said that good documentation is a recurring sore point. It is too often hard for non-experts, even in the same field, to understand under which set of conditions their data is best analysed by method A rather than method B, or tool C rather than tool D. The wrong choice may lead to an erroneous scientific conclusion even when both A and B are perfectly valid methods and tools C and D are correctly implemented, each in their own respective domain of applicability.

Hiring more programmers during development is not going to solve this at all. The open/closed status of the source code is also irrelevant to this end of the problem. An example of something that does help, though it only works if you can attract outside funding to set it up, is to hold well-publicized and "fun" contests that pit competing tools against each other. For example the CASP competitions that pit competing approaches to predicting what 3D shape is formed by a protein produced from a particular DNA sequence. The larger community tends to remember, and use, the winning tools even if they don't understand the details of why it performed better. It is notable that if you look at the CASP winners, many of them use open source toolkits and libraries. And those shared code bases are improved by feedback from competition between comtributing groups. We need more of this.

From lab to libre software: how can academic software research become open source?

Posted Nov 6, 2017 0:25 UTC (Mon) by pboddie (guest, #50784) [Link]

You have shifted to highlighting a separate problem that is indeed serious.

Don't worry: I'm willing to discuss all problems here!

A very real problem when developing tools for sophisticated data analysis is that the more you make them easy to use, e.g. providing a web interface, the more likely it is that you attract users who do not understand when the tool is or is not appropriate. This is true regardless of how well engineered, documented, or reviewed the software behind that easy-to-use web interface may be.

Yes, but there is arguably more of a demand for attractive, "easy to use" Web services rather than tools. Experiences may vary with regard to what is publishable or not and what the expectations of the reviewers are.

On the former topic, I have my name on a publication about a database that I doubt my previous boss would have regarded interesting enough for publication, but for a publication venue for my then boss it passed the threshold. That is the difference between more bioinformatics-related journals, where the computational techniques would be emphasised, and biology-related journals who probably want a greater emphasis on, say, experimental techniques or theory. (For all I know. What I did perceive, however, was that in the evaluation of research, if you have people who don't "rate" bioinformatics journals because they aren't amongst the ones they know, the research achievements don't get properly recognised.)

On the second topic, the publication in question got remarks about the user interface from the reviewers. It was clear that they wanted something slick and attractive, although decades after the introduction of usability research, people still don't understand that this is largely an iterative process that you really don't want to do in the confines of an article review. Fortunately for everyone concerned, being a relatively simple database, there wasn't much of a trade-off between "looks great" and "obscures what the tool does". We also worked with a group who put quite a bit of emphasis on the look and feel of their Web front-end to my colleagues' work. Again, for certain audiences (and potentially the ones you need to educate), it seems that good-looking things can be seen as more publishable, sometimes deservedly so (they introduce useful visualisations), other times arguably not.

I agree that audience education is essential, wondering if I didn't state or imply that in what I wrote. I also had experience of competitions between tools which were useful to the extent that you could see what other people's tools were supposedly capable of, but I might also suggest that they were distractions in various respects: you can end up focusing on limited datasets, tuning for potentially atypical data, and still not really learning what people were doing.

I remember one participant in a meeting around one of these competitions saying that he rather doubted that various people employing certain machine learning approaches really understood what they were doing. Another doubted that by making opaque tools we were gaining any insight into the problems to be solved (which is also a hot topic with regard to "AI" these days). To an extent, I got the impression that some of these competitions were profile-sustaining activities for certain research groups, and if the code was freely available then people would get many of the benefits anyway.

My remarks about paying for development weren't made in the context of improving the application of the scientific method, but rather an observation about the status of developers in certain parts of academia. I also have to dispute your assertions about code availability somewhat, not to be contrary, but I had actual experiences of methods and code differing when I was able to review them both. Of course, if no-one looks at the code, and my impression was that the audience was under-resourced and unlikely to look at it, then making everything available doesn't solve all the problems.