Statistics from the 4.7 development cycle
The 4.7 development cycle saw the merging of 12,283 changesets from 1,582 developers; 232 of those developers appeared in the kernel changelog for the first time. Those changes added just under 300,000 lines to the kernel source and 740 new files to the kernel tree. Of those developers, the most active were:
Most active 4.7 developers
By changesets H Hartley Sweeten 208 1.7% Boris Brezillon 132 1.1% Al Viro 127 1.0% Linus Walleij 121 1.0% Geert Uytterhoeven 120 1.0% Arnaldo Carvalho de Melo 110 0.9% Ville Syrjälä 105 0.9% Laxman Dewangan 101 0.8% Arnd Bergmann 97 0.8% Jes Sorensen 97 0.8% Eric Dumazet 91 0.7% Dan Carpenter 88 0.7% Aneesh Kumar K.V 79 0.6% Michal Hocko 74 0.6% Chris Wilson 71 0.6% Wolfram Sang 68 0.6% Florian Westphal 66 0.5% James Hogan 66 0.5% Daniel Vetter 64 0.5% Imre Deak 62 0.5%
By changed lines Alex Deucher 37185 6.4% Rex Zhu 19912 3.4% Paul E. McKenney 14004 2.4% Thierry Reding 9170 1.6% Jinshan Xiong 8828 1.5% Yuval Mintz 8419 1.4% Jes Sorensen 6982 1.2% Chanwoo Choi 5742 1.0% H Hartley Sweeten 5705 1.0% Varun Prakash 5703 1.0% Boris Brezillon 5347 0.9% Aneesh Kumar K.V 5230 0.9% Tom Zanussi 5116 0.9% CK Hu 5072 0.9% Ilya Dryomov 4764 0.8% Linus Walleij 4738 0.8% Maxime Ripard 4631 0.8% Mathieu Poirier 4559 0.8% Christoph Hellwig 4232 0.7% Finn Thain 4024 0.7%
By this point it should come as no surprise that H Hartley Sweeten made it to the top of the "by changesets" list with continued work on the Comedi drivers in the staging tree; nearly 8,400 patches have gone into that subsystem since it was merged. Boris Brezillon's work was mostly focused on the memory-technology devices subsystem (and NAND controllers in particular), Al Viro made a number of fundamental changes (including parallel lookups) to the virtual filesystem layer and followed the implications of those changes through many filesystems, Linus Walleij has been reworking the GPIO subsystem, and Geert Uytterhoeven worked all over the tree, with an emphasis on various ARM-related subsystems.
In the "lines changed" column, Alex Deucher continues to work on the massive amdgpu graphics driver; Rex Zhu is also working primarily on that driver. Paul McKenney works with the read-copy-update subsystem, of course; the elevated line count this time around results from some large documentation changes. Thierry Reding works with the NVIDIA Tegra ARM subarchitecture, and Jinshan Xiong made some extensive changes to the Lustre filesystem in the staging tree.
Often work in the staging tree tends to overshadow everything else when it comes to these lists, but, this time around, only two developers who appear in the top ten on either side were working on staging code.
There were 222 companies (that we know about) that supported work merged in the 4.7 development cycle — a fairly average figure for recent years. The most active companies this time around were:
Most active 4.7 employers
By changesets Intel 1786 14.5% (None) 968 7.9% Red Hat 967 7.9% (Unknown) 861 7.0% Linaro 633 5.2% SUSE 470 3.8% IBM 378 3.1% AMD 302 2.5% Samsung 276 2.2% 244 2.0% Renesas Electronics 244 2.0% NVIDIA 231 1.9% Mellanox 227 1.8% Free Electrons 222 1.8% ARM 217 1.8% Vision Engraving Systems 208 1.7% Oracle 200 1.6% Imagination Technologies 193 1.6% Texas Instruments 185 1.5% Broadcom 141 1.1%
By lines changed Intel 86056 14.8% AMD 69065 11.8% (None) 35035 6.0% Red Hat 33887 5.8% IBM 28102 4.8% Linaro 23396 4.0% (Unknown) 23287 4.0% NVIDIA 18023 3.1% Mellanox 14011 2.4% Samsung 12918 2.2% SUSE 12810 2.2% Free Electrons 12637 2.2% QLogic 11731 2.0% ARM 9000 1.5% Rockchip 8938 1.5% Renesas Electronics 8734 1.5% Texas Instruments 7462 1.3% (Consultant) 6964 1.2% Chelsio 6868 1.2% Broadcom 6564 1.1%
This table looks as it has for some time, no real surprises here. The percentage of changes from developers working on their own time, at 7.9%, is up from 4.6, but still remains low by historical standards. Once upon a time, volunteer developers were our primary source of new contributors to the kernel. In 4.7, of the 232 first-time contributors, 132 were known to be employed at the time, 38 were known to be working on their own time, and 62 are in the "unknown" column. Even if all the unknowns are volunteers (most of them probably are), we still have more new contributors arriving via companies.
Contributing to the kernel used to be a fairly reliable way to get a job, and it probably still is. But, in 2016, it seems that many of our new developers get the job first, and it is the job that brings them to the kernel community.
The table above shows the changes contributed by the most active companies. One last question one might ask is: how many developers does each company have working on Linux? For the 4.7 development cycle, the answer looks like this:
# of developers/company Company Count Percent (Unknown) 238 14.5% Intel 198 12.1% (None) 172 10.5% Red Hat 91 5.6% IBM 64 3.9% 48 2.9% Linaro 43 2.6% Mellanox 38 2.3% SUSE 37 2.3% AMD 30 1.8% Samsung 27 1.6% Huawei Technologies 27 1.6% ARM 25 1.5% Texas Instruments 23 1.4% Broadcom 22 1.3% Oracle 21 1.3% NXP 20 1.2% Qualcomm 17 1.0% MediaTek 13 0.8% Imagination Technologies 12 0.7% Renesas Electronics 12 0.7% 11 0.7% NVIDIA 11 0.7% Code Aurora Forum 10 0.6% (Consultant) 10 0.6% Rockchip 10 0.6% Canonical 10 0.6% Free Electrons 9 0.5% Pengutronix 9 0.5% Synopsys 8 0.5%
Intel, it seems, has far more developers working on the kernel than any other company — nearly 12% of the total in 4.7. Volunteer developers may not contribute a lot of code, but there are quite a few of them; given that many (if not most) of the unknown developers probably fall into this category, developers working on their own time are still the biggest group.
The kernel community as a whole is a big group indeed, and it continues to
produce kernels in a disciplined and predictable way. The relative lack of
surprises may make for relatively boring statistics articles, but it is
certainly welcome to users of the kernel.
Index entries for this article | |
---|---|
Kernel | Releases/4.7 |
Posted Aug 2, 2016 21:23 UTC (Tue)
by fratti (guest, #105722)
[Link] (31 responses)
Posted Aug 3, 2016 0:55 UTC (Wed)
by johannbg (guest, #65743)
[Link] (25 responses)
Then there is the question what happens to those that ask to be taken out of the list.
It actually would be quite interesting to see stats in which area all those unknowns are contributing, if there are any unknowns that fall under the most active developers, the ratio of unknown between men or female ( is one gender preferring to remain as unknown over the other ).
It would also be interesting to see the who are the females behind the linux kernel, their history and statistic associated with that like who was the first woman to ever contribute to the kernel? Is she still contributing? What was/is her experience? Who are the most active ones each cycle? Are there more women contributing? Are there less? is it the same? etc. Break the repetitive pattern and bring in new perspective on the story instead of fixating on the overall story which as the writer mentions has not changed a whole lot.
Posted Aug 3, 2016 1:50 UTC (Wed)
by Indelible (guest, #72815)
[Link] (2 responses)
I agree that diversity and gender balance are great things, but I also firmly believe that singling out women simply because they are women isn't the right strategy. Stopping the cycle of self-selection by permeating the stereo type of the socially inept, white male geek as the only type of people who suit a programming career/hobby is a much more practical use of time.
Please don't shine a spotlight on women developers for being women, but make the Kernel a place where it doesn't matter what gender you are, because no one uses their boobs to program, including the males who are blessed with them.
Posted Aug 3, 2016 2:37 UTC (Wed)
by johannbg (guest, #65743)
[Link]
Even Linus Linux X.X-rcX announcements have repeated pattern in them. yada yada small/big, yada yada driver updates, yada yada go test <shortlog> with occasional yada yada vacation in them. He should have his wife or kids ( or someone else ) write the announcements to break up that pattern for a bit.
Posted Aug 3, 2016 2:54 UTC (Wed)
by mjg59 (subscriber, #23239)
[Link]
It shouldn't, but for many it does. Ignoring that reality doesn't solve it.
> Stopping the cycle of self-selection by permeating the stereo type of the socially inept, white male geek as the only type of people who suit a programming career/hobby is a much more practical use of time.
Evidence doesn't really suggest that the stereotype is the problem here - there are far more women in almost every avenue of professional computing than there are in the kernel. While it is a problem that women are outnumbered by men in the field at every stage of the education and career ladder, those numbers alone don't explain why our community is so disproportionately bad. Very few women enter Linux development, and retention of those that do is abysmal. One demonstrated way of increasing representation in communities is to have more role models, and outreach programs are an excellent way of achieving that.
But you're right that focusing on women isn't the only part of this, which is why the focus of projects like Outreachy is now on minorities in general. We should recognise all minorities who are involved despite social pressure making that more difficult, but we should also look at individual groups to determine whether specific strategies are working more effectively or are unintentionally excluding others.
Posted Aug 3, 2016 2:11 UTC (Wed)
by zuki (subscriber, #41808)
[Link]
Posted Aug 3, 2016 14:03 UTC (Wed)
by corbet (editor, #1)
[Link] (18 responses)
Nobody has ever "asked to be taken out of the list."
Most of the unknowns are small contributors, often cleanups. When we see unknowns making significant contributions, we try harder to figure out who they work for.
Gender ratio is hard; there is no gender tag attached to patches. People often ask for country-based statistics as well. It would all be interesting to know, but somehow I don't want to be the one sending "gender and location?" emails to developers...
Posted Aug 3, 2016 14:16 UTC (Wed)
by patrick_g (subscriber, #44470)
[Link] (1 responses)
There are some statistics here : http://www.remword.com/kps_result (look at NT:Nation by Patch).
Posted Aug 3, 2016 14:22 UTC (Wed)
by corbet (editor, #1)
[Link]
Posted Aug 3, 2016 14:37 UTC (Wed)
by fratti (guest, #105722)
[Link] (3 responses)
Perhaps academia is also focusing on solely academic kernels, since a kernel that does not have to deal with all the pitfalls of real world hardware is a lot easier to work on when you're trying to implement a proof-of-concept feature, though that's just a guess of mine. Someone (with access to comp sci publications) would have to actually dig through all the papers to find out where the work ended up.
It could also be very possible that some company or individual then re-implements the work in an upstreamable shape after reading the paper, which would mean academic contributions are still very much real, just not as direct. Searching the kernel git log for the word "paper" brings up some commit messages where people mention work to be published in a paper and such.
Posted Aug 3, 2016 22:24 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
Posted Aug 3, 2016 22:41 UTC (Wed)
by anselm (subscriber, #2796)
[Link]
GCC and LLVM are probably closer to the cutting edge of research into compiler technology than the Linux kernel is to the cutting edge of research into operating systems, so that's not a huge surprise.
Posted Aug 6, 2016 15:41 UTC (Sat)
by anton (subscriber, #25547)
[Link]
As for the Linux kernel, other postings have given the reasons; or in other words, there is a gap between where a research projects ends and a piece of code is good enough for inclusion in the kernel. How big is that gap? Philipp Reisner finished his Diplomarbeit (~master's thesis) on DRBD in 2000, then continued working on it commercially (forming a company along the way), and DRBD was finally accepted into the Linux kernel in 2009; I am sure this did not count as academic contribution at that time, and given that many more years had been spent commercially on it than acedemically, counting it as academic would have been wrong.
Posted Aug 3, 2016 15:14 UTC (Wed)
by johannbg (guest, #65743)
[Link]
This raises the question if the same thing might apply to the kernel community.
Posted Aug 3, 2016 19:53 UTC (Wed)
by Fats (guest, #14882)
[Link]
As said in the article kernel is boring and academia needs hot and sexy. Also kernel is likely old OS technology so nothing really novel fit for research.
Posted Aug 4, 2016 8:46 UTC (Thu)
by paulj (subscriber, #341)
[Link] (3 responses)
Academics can not fix this alone. One would need to go to the governments' and government agencies funding CS work and make the case to have factors other than paper output considered as success criteria in funding applications, in departmental assessments, in career progression, etc. Now, the relevance of "Impact" (i.e. real-world effects of research) in academia has slowly become more important to funding agencies - academics do often now have to pay some attention to this in funding applications - however it seems generally still to be a side-line performance metric compared the traditional measure of papers (weighted by venue).
Posted Aug 4, 2016 11:06 UTC (Thu)
by jamesmorris (subscriber, #82698)
[Link] (2 responses)
Posted Aug 4, 2016 12:48 UTC (Thu)
by paulj (subscriber, #341)
[Link]
Most academics in universities (certainly in the UK) have their career progression measured on the success of their papers, not polishing code.
Posted Aug 4, 2016 17:33 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Aug 4, 2016 14:20 UTC (Thu)
by deater (subscriber, #11746)
[Link] (1 responses)
How is "academia" tabulated on the list? I try to contribute regularly (but possibly not during the 4.7 timeframe) using my .edu address. Would all people with .edu be tabulated under academia, or would we be individually broken out by our University?
The main problem with academia are threefold:
So most of the people I know from academia who contribute back are ones who (like me) were open-source developers first, academics second. And the fact we bother trying to get things contributed back probably hurts our career both financially and timewise.
Posted Aug 4, 2016 21:34 UTC (Thu)
by Lekensteyn (guest, #99903)
[Link]
For the full description, see https://github.com/gregkh/kernel-history/blob/master/emai...
Posted Aug 15, 2016 13:29 UTC (Mon)
by broonie (subscriber, #7078)
[Link] (2 responses)
Posted Aug 15, 2016 14:10 UTC (Mon)
by Jonno (subscriber, #49613)
[Link] (1 responses)
Posted Aug 15, 2016 14:18 UTC (Mon)
by broonie (subscriber, #7078)
[Link]
Posted Aug 18, 2016 15:50 UTC (Thu)
by ortalo (guest, #4654)
[Link]
Posted Aug 3, 2016 22:35 UTC (Wed)
by linusw (subscriber, #40300)
[Link] (1 responses)
Academia today in what is known as "the new production of knowledge" is pretty much guided by the science citation index: what is important is to make publications and get them quoted by other publications, that appear in the citation index.
In many cases your research grants will be controlled by these metrics so it becomes a closed loop.
What is needed is to guide academia metrics to include de facto standardization as OSS code in their metrics. I have no clue how that can be made to happen. Right now, if you tell the management at an academic institution that you write OSS code you will be met with a mixture of yawns and shrugs.
Posted Aug 14, 2016 3:07 UTC (Sun)
by torquay (guest, #92428)
[Link]
There is a hack of sorts to address this very problem: the Journal of Open Source Software. It aims for short peer-reviewed journal articles that accompany open source code. The articles have a corresponding DOI and are fully citeable, just like "regular" academic articles.
See also the announcement about the journal in LWN.
Posted Aug 3, 2016 6:21 UTC (Wed)
by blackwood (guest, #44174)
[Link]
It's a lot more than what just the "(consultant)" line would indicate.
Posted Aug 3, 2016 9:57 UTC (Wed)
by broonie (subscriber, #7078)
[Link] (1 responses)
Posted Aug 5, 2016 12:52 UTC (Fri)
by armijn (subscriber, #3653)
[Link]
Posted Aug 16, 2016 15:06 UTC (Tue)
by marcH (subscriber, #57642)
[Link] (1 responses)
The kernel is a central but still very small part of any Linux distribution. You may have another, useful point but this is not the data that proves it.
Posted Aug 17, 2016 23:35 UTC (Wed)
by lsl (subscriber, #86508)
[Link]
Even excluding the kernel, do you expect the numbers for glibc, gcc, virtualization tools (qemu, libvirt, …) or storage stuff to be vastly different?
Things I find notable:
Statistics from the 4.7 development cycle
Also, kudos to Intel for hiring that many kernel devs.
Statistics from the 4.7 development cycle
Do they fall into the unknown category or are they taken out of the stats altogether?
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
The lack of contributions from academia have been an interesting problem for years. There are guesses as to why (once the work has gone far enough to be published it stops and there's no incentive to polish it for inclusion, for example), but nobody really seems to know what the roots of it are.
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
But apparently the site was not updated since November 2015 :-(
And those numbers show just the sort of hazard you can run into; it seems to be based mostly on domain names. I'm sure Neil Brown would be surprised to learn that he's German..:)
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Given the direction that GCC and LLVM/Clang are taking, I am happy that the Linux kernel accepts fewer academic contributions. "Optimizations" based on unrealistic assumptions are an interesting academic curiosity, but should never become the default in production compilers.
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
> for years. There are guesses as to why (once the work has gone far enough
> to be published it stops and there's no incentive to polish it for inclusion, for
> example), but nobody really seems to know what the roots of it are.
1. Most academic code contributions are *awful*, generally one-off hacks made during a mad rush to get a paper/thesis out the door
2. There are no incentives to merge your results back in (i.e. federal grants and such don't stipulate this, and really outside of google I'm not sure if there's anyone who is sponsoring linux-kernel related reserach grants), and also open-source contributions don't matter for anything on tenure packages.
3. There's a perception (probably rightly so) that trying to get code merged in is going to be a long, frustrating process. Often by the time the student has finished the work and it's time to contribute back, the student has graduated, moved on to a new job, and has no incentive or time to deal with the hassle.
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
So, what would we do with this information and the associated statistics? I am not sure we can do something useful (to either gender) with the result.
Similar for countries.
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
What is needed is to guide academia metrics to include de facto standardization as OSS code in their metrics.
Statistics from the 4.7 development cycle
- Many consulting shops contribute through email addresses of their customer, showing up under them instead of their own.
- "Free Electrons" and "Pengutronix" are both consulting shops.
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle
Statistics from the 4.7 development cycle