From lab to libre software: how can academic software research become open source?

Posted Oct 26, 2017 13:16 UTC (Thu) by deater (subscriber, #11746)
In reply to: From lab to libre software: how can academic software research become open source? by nim-nim
Parent article: From lab to libre software: how can academic software research become open source?

> Academics and researchers have no interest in sharing
> software if it is the object of their research.

That is an innacurate and, frankly, pretty insulting generalization.

There are a lot of researchers and academics who share their software, even in the face of the huge incentives not to. They might be in the minority, but the story is exactly the same outside of academia. It's not like the majority of corporations are sharing their software.

From lab to libre software: how can academic software research become open source?

Posted Oct 26, 2017 13:29 UTC (Thu) by deater (subscriber, #11746) [Link]

And a bit of a followup, there is a lot of misunderstandings between the groups. A lot of kernel/Linux devels seem to think academics are locked up in ivory towers, but believe it or not a lot of academics think the kernel people are locked up in their own isolated tower.

The area I am most familiar with is the perf subsystem. The academic/supercomputer researchers are *still* bitter about the kernel politics involved with getting perf merged and really want nothing to do with it. They feel like the kernel perf devs only care about debugging kernel stuff and that they have no understanding of nor care about supercomputing issues.

And now, since perf typically requires root to run anyway due to security reasons, the academic researchers see no reason to bother with perf and have built entire ecosystems around tools that program the performance counters directly (direct MSR writes) and completely bypass the perf_event subsystem to accomplish what they want, as they feel like the perf developers are unresponsive to their needs.

A huge, unnecessary duplication of effort. Both sides are free software though. It hasn't really helped the issue.

And this is not a case of an issue that throwing some grant money around or making up a few new obscure journals is going to help.

From lab to libre software: how can academic software research become open source?

Posted Oct 26, 2017 17:13 UTC (Thu) by nim-nim (subscriber, #34454) [Link] (10 responses)

Ok, I'll reformulate. They have no interest in sharing the object of their research from a research point of view. That has the risk of others finishing up before them and reaping awards and funding, or wandering into application aspects.

As human beings, they have the same incentives as every one else: leave their mark, make the world a little better for everyone, and so on.

From lab to libre software: how can academic software research become open source?

Posted Oct 26, 2017 18:44 UTC (Thu) by sfeam (subscriber, #2841) [Link] (6 responses)

There seem to be two discussions working at cross-purposes here, perhaps because of a subtle distinction between "academic software research" and "academic research software". I have decades of history and publication as an academic researcher with a focus on the development of software for structural biology. Of course we have an interest and incentive in sharing the object of our research. But the software is not by itself the object of our research, it is a tool which allows us to conduct that research. Toolmakers are sadly under-appreciated by government funding agencies, but good tools are recognized and appreciated by the target research community and I have never found that publication and consequent academic recognition was hindered by developing the code as open source. The GPL is not a good match for many academic software projects, but that's a separate discussion. So no, for the case of "academic research software" I think even your reformulated statement misses the mark. I'll go further and say that in my field shared software has a much better track record and success rate than equivalent competing projects that are kept purely in-house by the originating research group. Because of this, research proposals that do not incorporate plans for making the software tools public are dinged both by reviewers and by funding agencies.

From lab to libre software: how can academic software research become open source?

Posted Nov 2, 2017 6:29 UTC (Thu) by einar (guest, #98134) [Link] (5 responses)

> Of course we have an interest and incentive in sharing the object of our research.

Is this true for smaller institutions? In my field (bioinformatics), unless you're in a large institution that can handle these things (and often will cripple them with ridicolous licensing, but that's another story) I've seen tools kept as extremely guarded secrets. Because you *might* (but often, never) publish them one day.
(That said, I actively engage in communities that understand that collaboration, even on software, is important).

Also, you briefly touched on an important point for fields where software itself is not the object of research: you will never (or very rarely) get funding for maintaining software. So it's up to the Ph.D. students and the post-docs to create software, if needed, only to abandon it after publication or because they move elsewhere. A sad state of affairs.

From lab to libre software: how can academic software research become open source?

Posted Nov 3, 2017 16:22 UTC (Fri) by pboddie (guest, #50784) [Link] (4 responses)

Is this true for smaller institutions? In my field (bioinformatics), unless you're in a large institution that can handle these things (and often will cripple them with ridicolous licensing, but that's another story) I've seen tools kept as extremely guarded secrets. Because you *might* (but often, never) publish them one day.

It disappoints me slightly that the article focused largely on what might be regarded as computer science research or artefacts thereof, whereas there are huge challenges in other disciplines in the delivery of "sustainable science". Interestingly, Jupyter (silly name, I think) attempts to tackle this challenge.

As to whether software is publishable, I think that especially in disciplines like bioinformatics (where I have some experience), software is frequently and disappointingly seen as being a disposable means to an end. It seemed to me that people were quite happy to surf around looking for a Web service that would tell them what they wanted without any curiosity about how that particular tool was made; if it was backed by a publication they'd be reassured, but then that publication might not have any software or data attached to it, perhaps inviting inquirers to make contact to "collaborate".

I did see some interest in the tools I was writing for the group in which I worked. People certainly saw the need to take advantage of things that process existing datasets, and the different public databases seemed to realise that people wanted software to work with the data, even though "Web services" were also touted as an option, which really isn't viable at all for anything more than ad-hoc queries unless the database supplier likes being hammered with HTTP requests (another idiotic habit we had to deal with, in one case involving some US research group whose IP addresses I eventually ended up banning).

Fortunately, my boss kept the rights to the software we developed and released it as Free Software, thus undermining the stupid and greedy "commercialisation" doctrine and organs of the university that were busy inhibiting other people's non-software work. In theory it lives on, but then the matter of sustaining development once everyone has moved elsewhere becomes a problem.

From lab to libre software: how can academic software research become open source?

Posted Nov 3, 2017 17:58 UTC (Fri) by sfeam (subscriber, #2841) [Link] (3 responses)

The situation is much better than you make it appear. Sure there's junk bioinformatics software, just as there's junk wherever Sturgeon's Law applies, which is nearly everywhere. And if you rely on randomly found web services you should indeed be concerned about their quality. But contrary to what you imply, the key software tools are published, reviewed, shared, and often accompanied by test data sets. Decent documentation is admittedly often a sore point. With specific regard to bioinformatics software provided as a web service, I suggest you look at the special issues devoted annually to exactly this class of software in the journal Nucleic Acids Research [*]. Their requirements for publication, documentation, validation suites, demonstrated community use, etc are all very solid.

[*] Yeah the name of the journal does not make it the obvious place to look for such a focus, but it has become a first rank quality touchstone for web-based bioinformatics software.

From lab to libre software: how can academic software research become open source?

Posted Nov 5, 2017 16:45 UTC (Sun) by pboddie (guest, #50784) [Link] (2 responses)

I'm not a true bioinformatician, merely a software engineer who got involved with people doing bioinformatics, but while I agree that many of the tools and services I had to work with were largely robust in terms of methods and transparency of operation, I would have reservations about whether the audience of those tools and services have the means or even the inclination to review what those tools and services do.

Maybe I should clarify my remarks about the audience for such things, though. While I might expect bioinformaticians to be equipped to evaluate software and services, largely because such people should be familiar with the computational and engineering aspects of such work, there are plenty of other people who use things like Web services to get "answers". Admittedly, this doesn't affect the "bread and butter" services like databases, but is more of a concern with services performing some level of analysis. (I remember a newly-pitched literature-mining service using "what causes cancer" as its Google-style front page example.)

Maybe things are actually great and I never realised it, or maybe I was working more in niches where people were more cavalier about what constitutes the validation of a particular tool, so that instead of just publishing their software and data and allowing people to get their hands dirty, there had to be a back-and-forth to get the code, not necessarily constructed using great engineering practices.

I will say that I was encouraged by the software tool use by some institutions, even though some of their choices weren't always to my liking. But then I must also say that even though our group made all our code and data available, there really wasn't much interest in reproducing what we did. For the most part people wanted us to do everything for them, and I certainly got more confirmation that software development is hardly valued in the field: people will drop big bucks for sequencing equipment but expect the analysis to happen for free by random people elsewhere.

From lab to libre software: how can academic software research become open source?

Posted Nov 5, 2017 19:18 UTC (Sun) by sfeam (subscriber, #2841) [Link] (1 responses)

You have shifted to highlighting a separate problem that is indeed serious. A very real problem when developing tools for sophisticated data analysis is that the more you make them easy to use, e.g. providing a web interface, the more likely it is that you attract users who do not understand when the tool is or is not appropriate. This is true regardless of how well engineered, documented, or reviewed the software behind that easy-to-use web interface may be.

To the extent it is even possible to address this problem, I think it must come through better education of the target user group. That is partly what I had in mind when I said that good documentation is a recurring sore point. It is too often hard for non-experts, even in the same field, to understand under which set of conditions their data is best analysed by method A rather than method B, or tool C rather than tool D. The wrong choice may lead to an erroneous scientific conclusion even when both A and B are perfectly valid methods and tools C and D are correctly implemented, each in their own respective domain of applicability.

Hiring more programmers during development is not going to solve this at all. The open/closed status of the source code is also irrelevant to this end of the problem. An example of something that does help, though it only works if you can attract outside funding to set it up, is to hold well-publicized and "fun" contests that pit competing tools against each other. For example the CASP competitions that pit competing approaches to predicting what 3D shape is formed by a protein produced from a particular DNA sequence. The larger community tends to remember, and use, the winning tools even if they don't understand the details of why it performed better. It is notable that if you look at the CASP winners, many of them use open source toolkits and libraries. And those shared code bases are improved by feedback from competition between comtributing groups. We need more of this.

From lab to libre software: how can academic software research become open source?

Posted Nov 6, 2017 0:25 UTC (Mon) by pboddie (guest, #50784) [Link]

You have shifted to highlighting a separate problem that is indeed serious.

Don't worry: I'm willing to discuss all problems here!

A very real problem when developing tools for sophisticated data analysis is that the more you make them easy to use, e.g. providing a web interface, the more likely it is that you attract users who do not understand when the tool is or is not appropriate. This is true regardless of how well engineered, documented, or reviewed the software behind that easy-to-use web interface may be.

Yes, but there is arguably more of a demand for attractive, "easy to use" Web services rather than tools. Experiences may vary with regard to what is publishable or not and what the expectations of the reviewers are.

On the former topic, I have my name on a publication about a database that I doubt my previous boss would have regarded interesting enough for publication, but for a publication venue for my then boss it passed the threshold. That is the difference between more bioinformatics-related journals, where the computational techniques would be emphasised, and biology-related journals who probably want a greater emphasis on, say, experimental techniques or theory. (For all I know. What I did perceive, however, was that in the evaluation of research, if you have people who don't "rate" bioinformatics journals because they aren't amongst the ones they know, the research achievements don't get properly recognised.)

On the second topic, the publication in question got remarks about the user interface from the reviewers. It was clear that they wanted something slick and attractive, although decades after the introduction of usability research, people still don't understand that this is largely an iterative process that you really don't want to do in the confines of an article review. Fortunately for everyone concerned, being a relatively simple database, there wasn't much of a trade-off between "looks great" and "obscures what the tool does". We also worked with a group who put quite a bit of emphasis on the look and feel of their Web front-end to my colleagues' work. Again, for certain audiences (and potentially the ones you need to educate), it seems that good-looking things can be seen as more publishable, sometimes deservedly so (they introduce useful visualisations), other times arguably not.

I agree that audience education is essential, wondering if I didn't state or imply that in what I wrote. I also had experience of competitions between tools which were useful to the extent that you could see what other people's tools were supposedly capable of, but I might also suggest that they were distractions in various respects: you can end up focusing on limited datasets, tuning for potentially atypical data, and still not really learning what people were doing.

I remember one participant in a meeting around one of these competitions saying that he rather doubted that various people employing certain machine learning approaches really understood what they were doing. Another doubted that by making opaque tools we were gaining any insight into the problems to be solved (which is also a hot topic with regard to "AI" these days). To an extent, I got the impression that some of these competitions were profile-sustaining activities for certain research groups, and if the code was freely available then people would get many of the benefits anyway.

My remarks about paying for development weren't made in the context of improving the application of the scientific method, but rather an observation about the status of developers in certain parts of academia. I also have to dispute your assertions about code availability somewhat, not to be contrary, but I had actual experiences of methods and code differing when I was able to review them both. Of course, if no-one looks at the code, and my impression was that the audience was under-resourced and unlikely to look at it, then making everything available doesn't solve all the problems.

From lab to libre software: how can academic software research become open source?

Posted Oct 27, 2017 14:54 UTC (Fri) by deater (subscriber, #11746) [Link] (2 responses)

> Ok, I'll reformulate. They have no interest in sharing
> the object of their research from a research point of
> view. That has the risk of others finishing up before
> them and reaping awards and funding, or wandering
> into application aspects.

Citation needed? Who is this "they" you are referring to?

There is certainly a subset of academics who do this, but there is also a large number who release their work immediately.

And there has been a big push by funding agencies to force the release of all data and code within a reasonable time window to allow for first publication.

From lab to libre software: how can academic software research become open source?

Posted Nov 7, 2017 10:51 UTC (Tue) by aggelos (subscriber, #41752) [Link] (1 responses)

Ok, I'll reformulate. They have no interest in sharing the object of their research from a research point of view. That has the risk of others finishing up before them and reaping awards and funding, or wandering into application aspects.
Citation needed? Who is this "they" you are referring to?

Oh wow. I guess that's a fair question. This survey comes to mind. I don't think there's any shortage of systems or security papers not supported by source code; a point with which you seem to agree below. Just adding the citation you requested.

Keep in mind that the above survey does not avoid publication bias. Namely, if reviewers are more likely to reject a paper because they can spot obvious bugs, unstated limitations etc in the source, whereas they would accept a similar paper that gives a ponies-and-rainbows description of their implementation, then there's submissions (though not publications) with source that are not accounted for. In my experience, this sort of bias against papers with code is a significant worry for researchers.

That said, I have heard of reviewers requesting code the last couple of years (though this mainly works when the paper is rejected and subsequently resubmitted to the same conference). I'm not at all sure that this perceived bias exists. For all we know, the bias could be in the other direction (i.e. in favor of papers that do publish source).

There is certainly a subset of academics who do this, but there is also a large number who release their work immediately.

And there has been a big push by funding agencies to force the release of all data and code within a reasonable time window to allow for first publication.

Glad to hear that. My turn to ask for a citation now, as I'd like to learn more about those incentives to release code and data.

The existence of academics who release code despite significant disincentives need not draw attention away from the existence of said disincentives. Nor shift the focus to individual failings, of course.

From lab to libre software: how can academic software research become open source?

Posted Nov 7, 2017 18:10 UTC (Tue) by sfeam (subscriber, #2841) [Link]

And there has been a big push by funding agencies to force the release of all data and code within a reasonable time window to allow for first publication.
-- Glad to hear that. My turn to ask for a citation now, as I'd like to learn more about those incentives to release code and data.

Again I will respond specifically with regard to "academic research software", i.e. software tools developed for use in research as opposed to software that is itself the subject of the research. I am mostly familiar with research funded by the US National Institutes of Health. Here is text from the over-arching guideline published in the 1999 Federal Register. Note that software falls under the umbrella categories "research tool", "material", or "unique research resource", terms used throughout the document. One relevant section of the text reads:

Recipients are expected to ensure that unique research resources arising from NIH-funded research are made available to the scientific research community. The majority of transfers to not-for-profit entities should be implemented under terms no more restrictive than the UBMTA. In particular, Recipients are expected to use the Simple Letter Agreement provided below, or another document with no more restrictive terms, to readily transfer unpatented tools developed with NIH funds to other Recipients for use in NIH-funded projects. If the materials are patented or licensed to an exclusive provider, other arrangements may be used, but commercialization option rights, royalty reach-through, or product reach-through rights back to the provider are inappropriate.

That particular text only mandates access by other NIH-funded researchers, but in practice that means everyone from your closest collaborators to your fiercest competitors. Furthermore my experience with the NIH peer-review system both as an applicant and as a reviewer is that in software-heavy proposals, failure to state an intention to share your software tools counts as a black mark, while a well-documented plan and previous history in disseminating your software can boost the "impact" score, which is critical for funding.

I am less familiar with the parallel requirements for funding by other US federal agencies, but here is text from a guideline by the NSF (National Science Foundation): Dissemination and Sharing of Research Results.

c. Investigators and grantees are encouraged to share software and inventions created under the grant or otherwise make them or their products widely available and usable.

From lab to libre software: how can academic software research become open source?

Posted Nov 5, 2017 12:12 UTC (Sun) by CycoJ (guest, #70454) [Link] (1 responses)

I have to agree that this is somewhat insulting. I would in fact argue that academia has a very high proportion of sharing software compared to other industries where producing free software is not the main aim.

While I agree with many of the point brought forward I do see problems with the proposal (the principles proposed in the conclusions) that is being put forward here. One of the main points seems to be to bring in "professionals" and take the project out of academia and have them lead by people with experience in "community building". This sounds to me like somewhat "push the people who started the work out and let the professionals handle it". If Red Hat would propose something like this for other volunteer-lead free-software projects there would be an outcry. Just because the academics don't have the time and funding to fully develop and maintain a project, does not mean that they don't feel strongly about "their baby".

Instead of trying to wrestle control away from the academics who started the work, maybe tackle the point that is the biggest problem. Funding for professional software developers. So Red Hat (or a similar initiative) could give grants (monetary, developer time, other advise and support), to academic projects to develop them into full-blown free-software projects, while leaving the academic in control. This would also foster building bridges (and a community) between the free-software and academic communities, and academics who have been part of such a program could clearly see and advocate for the benefits of creating open source projects. Contrary to what is stated in the article, in my experience most academics have an interest in sharing their work, it's part of the scientific process (on which the free software movement was modelled arguably), they just don't have the resources to do it.

From lab to libre software: how can academic software research become open source?

Posted Nov 6, 2017 19:16 UTC (Mon) by raven667 (subscriber, #5198) [Link]

> One of the main points seems to be to bring in "professionals" and take the project out of academia and have them lead by people with experience in "community building". This sounds to me like somewhat "push the people who started the work out and let the professionals handle it".

This focus on "control" when it comes to open source projects seems negative and foolish, leading to distracting petty squabbles rather than focused team building. If someone forks your projects and takes it in a different direction than you want, you still own your own code and can do with it whatever you like, you haven't lost any control, you just don't get to dictate your wishes to others.

> Instead of trying to wrestle control away from the academics who started the work, maybe tackle the point that is the biggest problem. Funding for professional software developers.

Why would this necessarily be a fight about control between software professionals and academics involving "wrestling" instead of a team where the academic brings their domain specific knowledge and general idea of how to break down a problem into small enough steps for a computer to help, and the software engineer/scientist who can bring project management, change control, deployment, testing, team building and algorithm/data-structure complexity engineering so the end result is fast, efficient and useful to other people.

> Leaving the academic in control. This would also foster building bridges

I don't think so, bridges are two-way and require trust, an insistence on control and hierarchy is not demonstrating trust or respect, it's treating the professional developer as a tool rather than a partner or community member. If you had funding to hire a professional then you could give them orders as your direct employee but if you are expecting a community of volunteers then you have to treat them as peers, not as minions. If you don't have the resources to pay professional developers directly then you have to give them a reason to care and a reason to follow your lead, because they don't have to do either.