|
|
Subscribe / Log in / New account

Statistics from the 4.7 development cycle

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 14:03 UTC (Wed) by corbet (editor, #1)
In reply to: Statistics from the 4.7 development cycle by johannbg
Parent article: Statistics from the 4.7 development cycle

The lack of contributions from academia have been an interesting problem for years. There are guesses as to why (once the work has gone far enough to be published it stops and there's no incentive to polish it for inclusion, for example), but nobody really seems to know what the roots of it are.

Nobody has ever "asked to be taken out of the list."

Most of the unknowns are small contributors, often cleanups. When we see unknowns making significant contributions, we try harder to figure out who they work for.

Gender ratio is hard; there is no gender tag attached to patches. People often ask for country-based statistics as well. It would all be interesting to know, but somehow I don't want to be the one sending "gender and location?" emails to developers...


to post comments

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 14:16 UTC (Wed) by patrick_g (subscriber, #44470) [Link] (1 responses)

> People often ask for country-based statistics as well. It would all be interesting to know

There are some statistics here : http://www.remword.com/kps_result (look at NT:Nation by Patch).
But apparently the site was not updated since November 2015 :-(

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 14:22 UTC (Wed) by corbet (editor, #1) [Link]

And those numbers show just the sort of hazard you can run into; it seems to be based mostly on domain names. I'm sure Neil Brown would be surprised to learn that he's German..:)

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 14:37 UTC (Wed) by fratti (guest, #105722) [Link] (3 responses)

>The lack of contributions from academia have been an interesting problem for years. There are guesses as to why (once the work has gone far enough to be published it stops and there's no incentive to polish it for inclusion, for example), but nobody really seems to know what the roots of it are.

Perhaps academia is also focusing on solely academic kernels, since a kernel that does not have to deal with all the pitfalls of real world hardware is a lot easier to work on when you're trying to implement a proof-of-concept feature, though that's just a guess of mine. Someone (with access to comp sci publications) would have to actually dig through all the papers to find out where the work ended up.

It could also be very possible that some company or individual then re-implements the work in an upstreamable shape after reading the paper, which would mean academic contributions are still very much real, just not as direct. Searching the kernel git log for the word "paper" brings up some commit messages where people mention work to be published in a paper and such.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 22:24 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (2 responses)

Academic work makes its way into GCC (e.g., Concepts Lite by Andrew Sutton has papers behind it) and LLVM/Clang fairly regularly. Is the kernel that much more impenetrable than a compiler for one of the most complicated languages around?

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 22:41 UTC (Wed) by anselm (subscriber, #2796) [Link]

GCC and LLVM are probably closer to the cutting edge of research into compiler technology than the Linux kernel is to the cutting edge of research into operating systems, so that's not a huge surprise.

Statistics from the 4.7 development cycle

Posted Aug 6, 2016 15:41 UTC (Sat) by anton (subscriber, #25547) [Link]

Given the direction that GCC and LLVM/Clang are taking, I am happy that the Linux kernel accepts fewer academic contributions. "Optimizations" based on unrealistic assumptions are an interesting academic curiosity, but should never become the default in production compilers.

As for the Linux kernel, other postings have given the reasons; or in other words, there is a gap between where a research projects ends and a piece of code is good enough for inclusion in the kernel. How big is that gap? Philipp Reisner finished his Diplomarbeit (~master's thesis) on DRBD in 2000, then continued working on it commercially (forming a company along the way), and DRBD was finally accepted into the Linux kernel in 2009; I am sure this did not count as academic contribution at that time, and given that many more years had been spent commercially on it than acedemically, counting it as academic would have been wrong.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 15:14 UTC (Wed) by johannbg (guest, #65743) [Link]

There was a fairly recent study [1] done on pull request and their acceptance in opensource projects on github that showed that women's contributions tend to be accepted more often than men's but only was the women's acceptance rates higher when they are not identifiable as women.

This raises the question if the same thing might apply to the kernel community.

1. https://peerj.com/preprints/1733/

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 19:53 UTC (Wed) by Fats (guest, #14882) [Link]

> The lack of contributions from academia have been an interesting problem for years.

As said in the article kernel is boring and academia needs hot and sexy. Also kernel is likely old OS technology so nothing really novel fit for research.

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 8:46 UTC (Thu) by paulj (subscriber, #341) [Link] (3 responses)

The roots should be obvious to anyone who's been in academia: There is no reward to an academic in getting stuff upstream. Indeed, resources spent cleaning and polishing code for upstream - past the point of having gotten it working enough to get the results for ones' papers - are resources that are *diverted* away from working on the next academic paper, and hence cleaning up and polishing code for upstream inclusion can *damage* ones' academic career.

Academics can not fix this alone. One would need to go to the governments' and government agencies funding CS work and make the case to have factors other than paper output considered as success criteria in funding applications, in departmental assessments, in career progression, etc. Now, the relevance of "Impact" (i.e. real-world effects of research) in academia has slowly become more important to funding agencies - academics do often now have to pay some attention to this in funding applications - however it seems generally still to be a side-line performance metric compared the traditional measure of papers (weighted by venue).

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 11:06 UTC (Thu) by jamesmorris (subscriber, #82698) [Link] (2 responses)

Actually, SELinux was a really good example of govt funded academic research evolving into a major open source project.

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 12:48 UTC (Thu) by paulj (subscriber, #341) [Link]

That was "research" that was funded specifically to get mandatory labelled security system into the kernel though.

Most academics in universities (certainly in the UK) have their career progression measured on the success of their papers, not polishing code.

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 17:33 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

It's also a sterling example why academia shouldn't be allowed within 10 miles of a Linux kernel.

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 14:20 UTC (Thu) by deater (subscriber, #11746) [Link] (1 responses)

> The lack of contributions from academia have been an interesting problem
> for years. There are guesses as to why (once the work has gone far enough
> to be published it stops and there's no incentive to polish it for inclusion, for
> example), but nobody really seems to know what the roots of it are.

How is "academia" tabulated on the list? I try to contribute regularly (but possibly not during the 4.7 timeframe) using my .edu address. Would all people with .edu be tabulated under academia, or would we be individually broken out by our University?

The main problem with academia are threefold:
1. Most academic code contributions are *awful*, generally one-off hacks made during a mad rush to get a paper/thesis out the door
2. There are no incentives to merge your results back in (i.e. federal grants and such don't stipulate this, and really outside of google I'm not sure if there's anyone who is sponsoring linux-kernel related reserach grants), and also open-source contributions don't matter for anything on tenure packages.
3. There's a perception (probably rightly so) that trying to get code merged in is going to be a long, frustrating process. Often by the time the student has finished the work and it's time to contribute back, the student has graduated, moved on to a new job, and has no incentive or time to deal with the hassle.

So most of the people I know from academia who contribute back are ones who (like me) were open-source developers first, academics second. And the fact we bother trying to get things contributed back probably hurts our career both financially and timewise.

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 21:34 UTC (Thu) by Lekensteyn (guest, #99903) [Link]

Greg seems to send every contributor an email, asking them go identify themselves. Four options are given, including Academia: "this category is for people working for Universities and doing kernel work as part of their research or other responsibilities related to school work."

For the full description, see https://github.com/gregkh/kernel-history/blob/master/emai...

Statistics from the 4.7 development cycle

Posted Aug 15, 2016 13:29 UTC (Mon) by broonie (subscriber, #7078) [Link] (2 responses)

It's possible that some of the academic contributions are showing up as industrial ones even when done by the academics - at Linaro we're currently working with Paolo Valente on BFQ which was work he originally did in an academic context and is now upstreaming with support from us. Due to the way it's being funded he is contributing from a Linaro account and shows up that way but the core of the work is academic.

Statistics from the 4.7 development cycle

Posted Aug 15, 2016 14:10 UTC (Mon) by Jonno (subscriber, #49613) [Link] (1 responses)

If anything this proves the point that academia doesn't contribute directly to OSS. Obviously they do generate new algorithms and other good ideas, but unless someone else picks up the slack it won't turn into something useful. The fact that Linaro chose to hire someone who previously worked in academia to do the work doesn't change that, it is still someone outside of academia who picks up the academic idea and turns it into something OSS can use.

Statistics from the 4.7 development cycle

Posted Aug 15, 2016 14:18 UTC (Mon) by broonie (subscriber, #7078) [Link]

No, it's still the same people doing the same work with some extra collaborators - it's not a case of the work being thrown over the wall and picked up by industry but rather a partnership.

Statistics from the 4.7 development cycle

Posted Aug 18, 2016 15:50 UTC (Thu) by ortalo (guest, #4654) [Link]

Thanks for never forgetting the privacy issue. However, the key point is not the data, but the treatment you do with it.
So, what would we do with this information and the associated statistics? I am not sure we can do something useful (to either gender) with the result.
Similar for countries.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds