Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Posted Aug 2, 2016 21:23 UTC (Tue) by fratti (guest, #105722)Parent article: Statistics from the 4.7 development cycle
Things I find notable:
- AMD has almost three times the employed Linux kernel developers compared to NVIDIA.
- Canonical employs a meagre 10 developers, compared to 37 by SUSE and 91 by Red Hat.
- A lot of the companies involved appear to be ones selling ARM-based devices (or are ARM), which goes to show how much the kernel benefits from the embedded and the mobile market.
- There is a surprising lack of contributions from consultants, I'd have thought there would be a bigger market for Linux kernel consulting work.
Posted Aug 3, 2016 0:55 UTC (Wed)
by johannbg (guest, #65743)
[Link] (25 responses)
Then there is the question what happens to those that ask to be taken out of the list.
It actually would be quite interesting to see stats in which area all those unknowns are contributing, if there are any unknowns that fall under the most active developers, the ratio of unknown between men or female ( is one gender preferring to remain as unknown over the other ).
It would also be interesting to see the who are the females behind the linux kernel, their history and statistic associated with that like who was the first woman to ever contribute to the kernel? Is she still contributing? What was/is her experience? Who are the most active ones each cycle? Are there more women contributing? Are there less? is it the same? etc. Break the repetitive pattern and bring in new perspective on the story instead of fixating on the overall story which as the writer mentions has not changed a whole lot.
Posted Aug 3, 2016 1:50 UTC (Wed)
by Indelible (guest, #72815)
[Link] (2 responses)
I agree that diversity and gender balance are great things, but I also firmly believe that singling out women simply because they are women isn't the right strategy. Stopping the cycle of self-selection by permeating the stereo type of the socially inept, white male geek as the only type of people who suit a programming career/hobby is a much more practical use of time.
Please don't shine a spotlight on women developers for being women, but make the Kernel a place where it doesn't matter what gender you are, because no one uses their boobs to program, including the males who are blessed with them.
Posted Aug 3, 2016 2:37 UTC (Wed)
by johannbg (guest, #65743)
[Link]
Even Linus Linux X.X-rcX announcements have repeated pattern in them. yada yada small/big, yada yada driver updates, yada yada go test <shortlog> with occasional yada yada vacation in them. He should have his wife or kids ( or someone else ) write the announcements to break up that pattern for a bit.
Posted Aug 3, 2016 2:54 UTC (Wed)
by mjg59 (subscriber, #23239)
[Link]
It shouldn't, but for many it does. Ignoring that reality doesn't solve it.
> Stopping the cycle of self-selection by permeating the stereo type of the socially inept, white male geek as the only type of people who suit a programming career/hobby is a much more practical use of time.
Evidence doesn't really suggest that the stereotype is the problem here - there are far more women in almost every avenue of professional computing than there are in the kernel. While it is a problem that women are outnumbered by men in the field at every stage of the education and career ladder, those numbers alone don't explain why our community is so disproportionately bad. Very few women enter Linux development, and retention of those that do is abysmal. One demonstrated way of increasing representation in communities is to have more role models, and outreach programs are an excellent way of achieving that.
But you're right that focusing on women isn't the only part of this, which is why the focus of projects like Outreachy is now on minorities in general. We should recognise all minorities who are involved despite social pressure making that more difficult, but we should also look at individual groups to determine whether specific strategies are working more effectively or are unintentionally excluding others.
Posted Aug 3, 2016 2:11 UTC (Wed)
by zuki (subscriber, #41808)
[Link]
Posted Aug 3, 2016 14:03 UTC (Wed)
by corbet (editor, #1)
[Link] (18 responses)
Nobody has ever "asked to be taken out of the list."
Most of the unknowns are small contributors, often cleanups. When we see unknowns making significant contributions, we try harder to figure out who they work for.
Gender ratio is hard; there is no gender tag attached to patches. People often ask for country-based statistics as well. It would all be interesting to know, but somehow I don't want to be the one sending "gender and location?" emails to developers...
Posted Aug 3, 2016 14:16 UTC (Wed)
by patrick_g (subscriber, #44470)
[Link] (1 responses)
There are some statistics here : http://www.remword.com/kps_result (look at NT:Nation by Patch).
Posted Aug 3, 2016 14:22 UTC (Wed)
by corbet (editor, #1)
[Link]
Posted Aug 3, 2016 14:37 UTC (Wed)
by fratti (guest, #105722)
[Link] (3 responses)
Perhaps academia is also focusing on solely academic kernels, since a kernel that does not have to deal with all the pitfalls of real world hardware is a lot easier to work on when you're trying to implement a proof-of-concept feature, though that's just a guess of mine. Someone (with access to comp sci publications) would have to actually dig through all the papers to find out where the work ended up.
It could also be very possible that some company or individual then re-implements the work in an upstreamable shape after reading the paper, which would mean academic contributions are still very much real, just not as direct. Searching the kernel git log for the word "paper" brings up some commit messages where people mention work to be published in a paper and such.
Posted Aug 3, 2016 22:24 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
Posted Aug 3, 2016 22:41 UTC (Wed)
by anselm (subscriber, #2796)
[Link]
GCC and LLVM are probably closer to the cutting edge of research into compiler technology than the Linux kernel is to the cutting edge of research into operating systems, so that's not a huge surprise.
Posted Aug 6, 2016 15:41 UTC (Sat)
by anton (subscriber, #25547)
[Link]
As for the Linux kernel, other postings have given the reasons; or in other words, there is a gap between where a research projects ends and a piece of code is good enough for inclusion in the kernel. How big is that gap? Philipp Reisner finished his Diplomarbeit (~master's thesis) on DRBD in 2000, then continued working on it commercially (forming a company along the way), and DRBD was finally accepted into the Linux kernel in 2009; I am sure this did not count as academic contribution at that time, and given that many more years had been spent commercially on it than acedemically, counting it as academic would have been wrong.
Posted Aug 3, 2016 15:14 UTC (Wed)
by johannbg (guest, #65743)
[Link]
This raises the question if the same thing might apply to the kernel community.
Posted Aug 3, 2016 19:53 UTC (Wed)
by Fats (guest, #14882)
[Link]
As said in the article kernel is boring and academia needs hot and sexy. Also kernel is likely old OS technology so nothing really novel fit for research.
Posted Aug 4, 2016 8:46 UTC (Thu)
by paulj (subscriber, #341)
[Link] (3 responses)
Academics can not fix this alone. One would need to go to the governments' and government agencies funding CS work and make the case to have factors other than paper output considered as success criteria in funding applications, in departmental assessments, in career progression, etc. Now, the relevance of "Impact" (i.e. real-world effects of research) in academia has slowly become more important to funding agencies - academics do often now have to pay some attention to this in funding applications - however it seems generally still to be a side-line performance metric compared the traditional measure of papers (weighted by venue).
Posted Aug 4, 2016 11:06 UTC (Thu)
by jamesmorris (subscriber, #82698)
[Link] (2 responses)
Posted Aug 4, 2016 12:48 UTC (Thu)
by paulj (subscriber, #341)
[Link]
Most academics in universities (certainly in the UK) have their career progression measured on the success of their papers, not polishing code.
Posted Aug 4, 2016 17:33 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Aug 4, 2016 14:20 UTC (Thu)
by deater (subscriber, #11746)
[Link] (1 responses)
How is "academia" tabulated on the list? I try to contribute regularly (but possibly not during the 4.7 timeframe) using my .edu address. Would all people with .edu be tabulated under academia, or would we be individually broken out by our University?
The main problem with academia are threefold:
So most of the people I know from academia who contribute back are ones who (like me) were open-source developers first, academics second. And the fact we bother trying to get things contributed back probably hurts our career both financially and timewise.
Posted Aug 4, 2016 21:34 UTC (Thu)
by Lekensteyn (guest, #99903)
[Link]
For the full description, see https://github.com/gregkh/kernel-history/blob/master/emai...
Posted Aug 15, 2016 13:29 UTC (Mon)
by broonie (subscriber, #7078)
[Link] (2 responses)
Posted Aug 15, 2016 14:10 UTC (Mon)
by Jonno (subscriber, #49613)
[Link] (1 responses)
Posted Aug 15, 2016 14:18 UTC (Mon)
by broonie (subscriber, #7078)
[Link]
Posted Aug 18, 2016 15:50 UTC (Thu)
by ortalo (guest, #4654)
[Link]
Posted Aug 3, 2016 22:35 UTC (Wed)
by linusw (subscriber, #40300)
[Link] (1 responses)
Academia today in what is known as "the new production of knowledge" is pretty much guided by the science citation index: what is important is to make publications and get them quoted by other publications, that appear in the citation index.
In many cases your research grants will be controlled by these metrics so it becomes a closed loop.
What is needed is to guide academia metrics to include de facto standardization as OSS code in their metrics. I have no clue how that can be made to happen. Right now, if you tell the management at an academic institution that you write OSS code you will be met with a mixture of yawns and shrugs.
Posted Aug 14, 2016 3:07 UTC (Sun)
by torquay (guest, #92428)
[Link]
There is a hack of sorts to address this very problem: the Journal of Open Source Software. It aims for short peer-reviewed journal articles that accompany open source code. The articles have a corresponding DOI and are fully citeable, just like "regular" academic articles.
See also the announcement about the journal in LWN.
Posted Aug 3, 2016 6:21 UTC (Wed)
by blackwood (guest, #44174)
[Link]
It's a lot more than what just the "(consultant)" line would indicate.
Posted Aug 3, 2016 9:57 UTC (Wed)
by broonie (subscriber, #7078)
[Link] (1 responses)
Posted Aug 5, 2016 12:52 UTC (Fri)
by armijn (subscriber, #3653)
[Link]
Posted Aug 16, 2016 15:06 UTC (Tue)
by marcH (subscriber, #57642)
[Link] (1 responses)
The kernel is a central but still very small part of any Linux distribution. You may have another, useful point but this is not the data that proves it.
Posted Aug 17, 2016 23:35 UTC (Wed)
by lsl (subscriber, #86508)
[Link]
Even excluding the kernel, do you expect the numbers for glibc, gcc, virtualization tools (qemu, libvirt, …) or storage stuff to be vastly different?
Statistics from the 4.7 development cycle
Do they fall into the unknown category or are they taken out of the stats altogether?
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
The lack of contributions from academia have been an interesting problem for years. There are guesses as to why (once the work has gone far enough to be published it stops and there's no incentive to polish it for inclusion, for example), but nobody really seems to know what the roots of it are.
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
But apparently the site was not updated since November 2015 :-(
And those numbers show just the sort of hazard you can run into; it seems to be based mostly on domain names. I'm sure Neil Brown would be surprised to learn that he's German..:)
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Given the direction that GCC and LLVM/Clang are taking, I am happy that the Linux kernel accepts fewer academic contributions. "Optimizations" based on unrealistic assumptions are an interesting academic curiosity, but should never become the default in production compilers.
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
> for years. There are guesses as to why (once the work has gone far enough
> to be published it stops and there's no incentive to polish it for inclusion, for
> example), but nobody really seems to know what the roots of it are.
1. Most academic code contributions are *awful*, generally one-off hacks made during a mad rush to get a paper/thesis out the door
2. There are no incentives to merge your results back in (i.e. federal grants and such don't stipulate this, and really outside of google I'm not sure if there's anyone who is sponsoring linux-kernel related reserach grants), and also open-source contributions don't matter for anything on tenure packages.
3. There's a perception (probably rightly so) that trying to get code merged in is going to be a long, frustrating process. Often by the time the student has finished the work and it's time to contribute back, the student has graduated, moved on to a new job, and has no incentive or time to deal with the hassle.
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
So, what would we do with this information and the associated statistics? I am not sure we can do something useful (to either gender) with the result.
Similar for countries.
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
What is needed is to guide academia metrics to include de facto standardization as OSS code in their metrics.
Statistics from the 4.7 development cycle
- Many consulting shops contribute through email addresses of their customer, showing up under them instead of their own.
- "Free Electrons" and "Pengutronix" are both consulting shops.
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle