|
|
Subscribe / Log in / New account

Statistics from the 4.7 development cycle

By Jonathan Corbet
August 2, 2016
The 4.7 kernel was released on July 24, so longtime readers might be wondering where the usual development statistics are. We're running a little late this time around, but for good reason — Greg Kroah-Hartman obtained information from a large number of developers on who they work for, and we're now able to use that information to produce better numbers. Of course, the overall story hasn't changed a whole lot — kernel development is relatively boring and predictable these days — but each cycle still has a few noteworthy points.

The 4.7 development cycle saw the merging of 12,283 changesets from 1,582 developers; 232 of those developers appeared in the kernel changelog for the first time. Those changes added just under 300,000 lines to the kernel source and 740 new files to the kernel tree. Of those developers, the most active were:

Most active 4.7 developers
By changesets
H Hartley Sweeten2081.7%
Boris Brezillon1321.1%
Al Viro1271.0%
Linus Walleij1211.0%
Geert Uytterhoeven1201.0%
Arnaldo Carvalho de Melo1100.9%
Ville Syrjälä1050.9%
Laxman Dewangan1010.8%
Arnd Bergmann970.8%
Jes Sorensen970.8%
Eric Dumazet910.7%
Dan Carpenter880.7%
Aneesh Kumar K.V790.6%
Michal Hocko740.6%
Chris Wilson710.6%
Wolfram Sang680.6%
Florian Westphal660.5%
James Hogan660.5%
Daniel Vetter640.5%
Imre Deak620.5%
By changed lines
Alex Deucher371856.4%
Rex Zhu199123.4%
Paul E. McKenney140042.4%
Thierry Reding91701.6%
Jinshan Xiong88281.5%
Yuval Mintz84191.4%
Jes Sorensen69821.2%
Chanwoo Choi57421.0%
H Hartley Sweeten57051.0%
Varun Prakash57031.0%
Boris Brezillon53470.9%
Aneesh Kumar K.V52300.9%
Tom Zanussi51160.9%
CK Hu50720.9%
Ilya Dryomov47640.8%
Linus Walleij47380.8%
Maxime Ripard46310.8%
Mathieu Poirier45590.8%
Christoph Hellwig42320.7%
Finn Thain40240.7%

By this point it should come as no surprise that H Hartley Sweeten made it to the top of the "by changesets" list with continued work on the Comedi drivers in the staging tree; nearly 8,400 patches have gone into that subsystem since it was merged. Boris Brezillon's work was mostly focused on the memory-technology devices subsystem (and NAND controllers in particular), Al Viro made a number of fundamental changes (including parallel lookups) to the virtual filesystem layer and followed the implications of those changes through many filesystems, Linus Walleij has been reworking the GPIO subsystem, and Geert Uytterhoeven worked all over the tree, with an emphasis on various ARM-related subsystems.

In the "lines changed" column, Alex Deucher continues to work on the massive amdgpu graphics driver; Rex Zhu is also working primarily on that driver. Paul McKenney works with the read-copy-update subsystem, of course; the elevated line count this time around results from some large documentation changes. Thierry Reding works with the NVIDIA Tegra ARM subarchitecture, and Jinshan Xiong made some extensive changes to the Lustre filesystem in the staging tree.

Often work in the staging tree tends to overshadow everything else when it comes to these lists, but, this time around, only two developers who appear in the top ten on either side were working on staging code.

There were 222 companies (that we know about) that supported work merged in the 4.7 development cycle — a fairly average figure for recent years. The most active companies this time around were:

Most active 4.7 employers
By changesets
Intel178614.5%
(None)9687.9%
Red Hat9677.9%
(Unknown)8617.0%
Linaro6335.2%
SUSE4703.8%
IBM3783.1%
AMD3022.5%
Samsung2762.2%
Google2442.0%
Renesas Electronics2442.0%
NVIDIA2311.9%
Mellanox2271.8%
Free Electrons2221.8%
ARM2171.8%
Vision Engraving Systems2081.7%
Oracle2001.6%
Imagination Technologies1931.6%
Texas Instruments1851.5%
Broadcom1411.1%
By lines changed
Intel8605614.8%
AMD6906511.8%
(None)350356.0%
Red Hat338875.8%
IBM281024.8%
Linaro233964.0%
(Unknown)232874.0%
NVIDIA180233.1%
Mellanox140112.4%
Samsung129182.2%
SUSE128102.2%
Free Electrons126372.2%
QLogic117312.0%
ARM90001.5%
Rockchip89381.5%
Renesas Electronics87341.5%
Texas Instruments74621.3%
(Consultant)69641.2%
Chelsio68681.2%
Broadcom65641.1%

This table looks as it has for some time, no real surprises here. The percentage of changes from developers working on their own time, at 7.9%, is up from 4.6, but still remains low by historical standards. Once upon a time, volunteer developers were our primary source of new contributors to the kernel. In 4.7, of the 232 first-time contributors, 132 were known to be employed at the time, 38 were known to be working on their own time, and 62 are in the "unknown" column. Even if all the unknowns are volunteers (most of them probably are), we still have more new contributors arriving via companies.

Contributing to the kernel used to be a fairly reliable way to get a job, and it probably still is. But, in 2016, it seems that many of our new developers get the job first, and it is the job that brings them to the kernel community.

The table above shows the changes contributed by the most active companies. One last question one might ask is: how many developers does each company have working on Linux? For the 4.7 development cycle, the answer looks like this:

# of developers/company
CompanyCountPercent
(Unknown)23814.5%
Intel19812.1%
(None)17210.5%
Red Hat915.6%
IBM643.9%
Google482.9%
Linaro432.6%
Mellanox382.3%
SUSE372.3%
AMD301.8%
Samsung271.6%
Huawei Technologies271.6%
ARM251.5%
Texas Instruments231.4%
Broadcom221.3%
Oracle211.3%
NXP201.2%
Qualcomm171.0%
MediaTek130.8%
Imagination Technologies120.7%
Renesas Electronics120.7%
Facebook110.7%
NVIDIA110.7%
Code Aurora Forum100.6%
(Consultant)100.6%
Rockchip100.6%
Canonical100.6%
Free Electrons90.5%
Pengutronix90.5%
Synopsys80.5%

Intel, it seems, has far more developers working on the kernel than any other company — nearly 12% of the total in 4.7. Volunteer developers may not contribute a lot of code, but there are quite a few of them; given that many (if not most) of the unknown developers probably fall into this category, developers working on their own time are still the biggest group.

The kernel community as a whole is a big group indeed, and it continues to produce kernels in a disciplined and predictable way. The relative lack of surprises may make for relatively boring statistics articles, but it is certainly welcome to users of the kernel.

Index entries for this article
KernelReleases/4.7


to post comments

Statistics from the 4.7 development cycle

Posted Aug 2, 2016 21:23 UTC (Tue) by fratti (guest, #105722) [Link] (31 responses)

Things I find notable:
  • AMD has almost three times the employed Linux kernel developers compared to NVIDIA.
  • Canonical employs a meagre 10 developers, compared to 37 by SUSE and 91 by Red Hat.
  • A lot of the companies involved appear to be ones selling ARM-based devices (or are ARM), which goes to show how much the kernel benefits from the embedded and the mobile market.
  • There is a surprising lack of contributions from consultants, I'd have thought there would be a bigger market for Linux kernel consulting work.
Also, kudos to Intel for hiring that many kernel devs.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 0:55 UTC (Wed) by johannbg (guest, #65743) [Link] (25 responses)

It strikes me a bit odd not seeing *any* "Academia" contribution on that list since it's one of the things he's asking for ( or atleast did when I did a drive by patching to test the efficiency and response of the kernel community back in 2013 ) and 238 of unknowns is quite the number for people who want to remain in the "unknown" category which could mean self employed/consultants/contractors or even Canonical employees for that matter.

Then there is the question what happens to those that ask to be taken out of the list.
Do they fall into the unknown category or are they taken out of the stats altogether?

It actually would be quite interesting to see stats in which area all those unknowns are contributing, if there are any unknowns that fall under the most active developers, the ratio of unknown between men or female ( is one gender preferring to remain as unknown over the other ).

It would also be interesting to see the who are the females behind the linux kernel, their history and statistic associated with that like who was the first woman to ever contribute to the kernel? Is she still contributing? What was/is her experience? Who are the most active ones each cycle? Are there more women contributing? Are there less? is it the same? etc. Break the repetitive pattern and bring in new perspective on the story instead of fixating on the overall story which as the writer mentions has not changed a whole lot.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 1:50 UTC (Wed) by Indelible (guest, #72815) [Link] (2 responses)

One of the things that's completely unremarkable about the female developers I know, is that they wish to be known simply as "developers", _not_ "female developers". A quote from one being "I don't use my boobs to program, so it shouldn't matter if I have them".

I agree that diversity and gender balance are great things, but I also firmly believe that singling out women simply because they are women isn't the right strategy. Stopping the cycle of self-selection by permeating the stereo type of the socially inept, white male geek as the only type of people who suit a programming career/hobby is a much more practical use of time.

Please don't shine a spotlight on women developers for being women, but make the Kernel a place where it doesn't matter what gender you are, because no one uses their boobs to program, including the males who are blessed with them.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 2:37 UTC (Wed) by johannbg (guest, #65743) [Link]

Interesting perspective given that there exist(ed?) spesific outreach program for women <--- ( not everyone and if that's still ongoing but does not mention women specifically those behind that great idea have changed it to "minority" because that's "better" ) anyway knowing the history of women in the kernel would still be interested to me ( at least ) since it's gone quite daunting seeing more or less the same stats, listening to the same people question the same people with the same questions which gets answer the same way or the same people giving the same talk based on the same material year after year.

Even Linus Linux X.X-rcX announcements have repeated pattern in them. yada yada small/big, yada yada driver updates, yada yada go test <shortlog> with occasional yada yada vacation in them. He should have his wife or kids ( or someone else ) write the announcements to break up that pattern for a bit.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 2:54 UTC (Wed) by mjg59 (subscriber, #23239) [Link]

> "I don't use my boobs to program, so it shouldn't matter if I have them"

It shouldn't, but for many it does. Ignoring that reality doesn't solve it.

> Stopping the cycle of self-selection by permeating the stereo type of the socially inept, white male geek as the only type of people who suit a programming career/hobby is a much more practical use of time.

Evidence doesn't really suggest that the stereotype is the problem here - there are far more women in almost every avenue of professional computing than there are in the kernel. While it is a problem that women are outnumbered by men in the field at every stage of the education and career ladder, those numbers alone don't explain why our community is so disproportionately bad. Very few women enter Linux development, and retention of those that do is abysmal. One demonstrated way of increasing representation in communities is to have more role models, and outreach programs are an excellent way of achieving that.

But you're right that focusing on women isn't the only part of this, which is why the focus of projects like Outreachy is now on minorities in general. We should recognise all minorities who are involved despite social pressure making that more difficult, but we should also look at individual groups to determine whether specific strategies are working more effectively or are unintentionally excluding others.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 2:11 UTC (Wed) by zuki (subscriber, #41808) [Link]

I think the statistics about women participating in kernel development would be interesting. We have some rough estimates about the percentages at various conferences, and it would be great to see if there are more/less/same ratios among developers.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 14:03 UTC (Wed) by corbet (editor, #1) [Link] (18 responses)

The lack of contributions from academia have been an interesting problem for years. There are guesses as to why (once the work has gone far enough to be published it stops and there's no incentive to polish it for inclusion, for example), but nobody really seems to know what the roots of it are.

Nobody has ever "asked to be taken out of the list."

Most of the unknowns are small contributors, often cleanups. When we see unknowns making significant contributions, we try harder to figure out who they work for.

Gender ratio is hard; there is no gender tag attached to patches. People often ask for country-based statistics as well. It would all be interesting to know, but somehow I don't want to be the one sending "gender and location?" emails to developers...

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 14:16 UTC (Wed) by patrick_g (subscriber, #44470) [Link] (1 responses)

> People often ask for country-based statistics as well. It would all be interesting to know

There are some statistics here : http://www.remword.com/kps_result (look at NT:Nation by Patch).
But apparently the site was not updated since November 2015 :-(

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 14:22 UTC (Wed) by corbet (editor, #1) [Link]

And those numbers show just the sort of hazard you can run into; it seems to be based mostly on domain names. I'm sure Neil Brown would be surprised to learn that he's German..:)

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 14:37 UTC (Wed) by fratti (guest, #105722) [Link] (3 responses)

>The lack of contributions from academia have been an interesting problem for years. There are guesses as to why (once the work has gone far enough to be published it stops and there's no incentive to polish it for inclusion, for example), but nobody really seems to know what the roots of it are.

Perhaps academia is also focusing on solely academic kernels, since a kernel that does not have to deal with all the pitfalls of real world hardware is a lot easier to work on when you're trying to implement a proof-of-concept feature, though that's just a guess of mine. Someone (with access to comp sci publications) would have to actually dig through all the papers to find out where the work ended up.

It could also be very possible that some company or individual then re-implements the work in an upstreamable shape after reading the paper, which would mean academic contributions are still very much real, just not as direct. Searching the kernel git log for the word "paper" brings up some commit messages where people mention work to be published in a paper and such.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 22:24 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (2 responses)

Academic work makes its way into GCC (e.g., Concepts Lite by Andrew Sutton has papers behind it) and LLVM/Clang fairly regularly. Is the kernel that much more impenetrable than a compiler for one of the most complicated languages around?

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 22:41 UTC (Wed) by anselm (subscriber, #2796) [Link]

GCC and LLVM are probably closer to the cutting edge of research into compiler technology than the Linux kernel is to the cutting edge of research into operating systems, so that's not a huge surprise.

Statistics from the 4.7 development cycle

Posted Aug 6, 2016 15:41 UTC (Sat) by anton (subscriber, #25547) [Link]

Given the direction that GCC and LLVM/Clang are taking, I am happy that the Linux kernel accepts fewer academic contributions. "Optimizations" based on unrealistic assumptions are an interesting academic curiosity, but should never become the default in production compilers.

As for the Linux kernel, other postings have given the reasons; or in other words, there is a gap between where a research projects ends and a piece of code is good enough for inclusion in the kernel. How big is that gap? Philipp Reisner finished his Diplomarbeit (~master's thesis) on DRBD in 2000, then continued working on it commercially (forming a company along the way), and DRBD was finally accepted into the Linux kernel in 2009; I am sure this did not count as academic contribution at that time, and given that many more years had been spent commercially on it than acedemically, counting it as academic would have been wrong.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 15:14 UTC (Wed) by johannbg (guest, #65743) [Link]

There was a fairly recent study [1] done on pull request and their acceptance in opensource projects on github that showed that women's contributions tend to be accepted more often than men's but only was the women's acceptance rates higher when they are not identifiable as women.

This raises the question if the same thing might apply to the kernel community.

1. https://peerj.com/preprints/1733/

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 19:53 UTC (Wed) by Fats (guest, #14882) [Link]

> The lack of contributions from academia have been an interesting problem for years.

As said in the article kernel is boring and academia needs hot and sexy. Also kernel is likely old OS technology so nothing really novel fit for research.

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 8:46 UTC (Thu) by paulj (subscriber, #341) [Link] (3 responses)

The roots should be obvious to anyone who's been in academia: There is no reward to an academic in getting stuff upstream. Indeed, resources spent cleaning and polishing code for upstream - past the point of having gotten it working enough to get the results for ones' papers - are resources that are *diverted* away from working on the next academic paper, and hence cleaning up and polishing code for upstream inclusion can *damage* ones' academic career.

Academics can not fix this alone. One would need to go to the governments' and government agencies funding CS work and make the case to have factors other than paper output considered as success criteria in funding applications, in departmental assessments, in career progression, etc. Now, the relevance of "Impact" (i.e. real-world effects of research) in academia has slowly become more important to funding agencies - academics do often now have to pay some attention to this in funding applications - however it seems generally still to be a side-line performance metric compared the traditional measure of papers (weighted by venue).

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 11:06 UTC (Thu) by jamesmorris (subscriber, #82698) [Link] (2 responses)

Actually, SELinux was a really good example of govt funded academic research evolving into a major open source project.

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 12:48 UTC (Thu) by paulj (subscriber, #341) [Link]

That was "research" that was funded specifically to get mandatory labelled security system into the kernel though.

Most academics in universities (certainly in the UK) have their career progression measured on the success of their papers, not polishing code.

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 17:33 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

It's also a sterling example why academia shouldn't be allowed within 10 miles of a Linux kernel.

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 14:20 UTC (Thu) by deater (subscriber, #11746) [Link] (1 responses)

> The lack of contributions from academia have been an interesting problem
> for years. There are guesses as to why (once the work has gone far enough
> to be published it stops and there's no incentive to polish it for inclusion, for
> example), but nobody really seems to know what the roots of it are.

How is "academia" tabulated on the list? I try to contribute regularly (but possibly not during the 4.7 timeframe) using my .edu address. Would all people with .edu be tabulated under academia, or would we be individually broken out by our University?

The main problem with academia are threefold:
1. Most academic code contributions are *awful*, generally one-off hacks made during a mad rush to get a paper/thesis out the door
2. There are no incentives to merge your results back in (i.e. federal grants and such don't stipulate this, and really outside of google I'm not sure if there's anyone who is sponsoring linux-kernel related reserach grants), and also open-source contributions don't matter for anything on tenure packages.
3. There's a perception (probably rightly so) that trying to get code merged in is going to be a long, frustrating process. Often by the time the student has finished the work and it's time to contribute back, the student has graduated, moved on to a new job, and has no incentive or time to deal with the hassle.

So most of the people I know from academia who contribute back are ones who (like me) were open-source developers first, academics second. And the fact we bother trying to get things contributed back probably hurts our career both financially and timewise.

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 21:34 UTC (Thu) by Lekensteyn (guest, #99903) [Link]

Greg seems to send every contributor an email, asking them go identify themselves. Four options are given, including Academia: "this category is for people working for Universities and doing kernel work as part of their research or other responsibilities related to school work."

For the full description, see https://github.com/gregkh/kernel-history/blob/master/emai...

Statistics from the 4.7 development cycle

Posted Aug 15, 2016 13:29 UTC (Mon) by broonie (subscriber, #7078) [Link] (2 responses)

It's possible that some of the academic contributions are showing up as industrial ones even when done by the academics - at Linaro we're currently working with Paolo Valente on BFQ which was work he originally did in an academic context and is now upstreaming with support from us. Due to the way it's being funded he is contributing from a Linaro account and shows up that way but the core of the work is academic.

Statistics from the 4.7 development cycle

Posted Aug 15, 2016 14:10 UTC (Mon) by Jonno (subscriber, #49613) [Link] (1 responses)

If anything this proves the point that academia doesn't contribute directly to OSS. Obviously they do generate new algorithms and other good ideas, but unless someone else picks up the slack it won't turn into something useful. The fact that Linaro chose to hire someone who previously worked in academia to do the work doesn't change that, it is still someone outside of academia who picks up the academic idea and turns it into something OSS can use.

Statistics from the 4.7 development cycle

Posted Aug 15, 2016 14:18 UTC (Mon) by broonie (subscriber, #7078) [Link]

No, it's still the same people doing the same work with some extra collaborators - it's not a case of the work being thrown over the wall and picked up by industry but rather a partnership.

Statistics from the 4.7 development cycle

Posted Aug 18, 2016 15:50 UTC (Thu) by ortalo (guest, #4654) [Link]

Thanks for never forgetting the privacy issue. However, the key point is not the data, but the treatment you do with it.
So, what would we do with this information and the associated statistics? I am not sure we can do something useful (to either gender) with the result.
Similar for countries.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 22:35 UTC (Wed) by linusw (subscriber, #40300) [Link] (1 responses)

The academia story is pretty straight-forward I think: everyone will work toward the quantitative key performance indicator that their management use. The larger and more bureaucratic the organization, the more it will focus on quantitative measures over qualitative factors.

Academia today in what is known as "the new production of knowledge" is pretty much guided by the science citation index: what is important is to make publications and get them quoted by other publications, that appear in the citation index.

In many cases your research grants will be controlled by these metrics so it becomes a closed loop.

What is needed is to guide academia metrics to include de facto standardization as OSS code in their metrics. I have no clue how that can be made to happen. Right now, if you tell the management at an academic institution that you write OSS code you will be met with a mixture of yawns and shrugs.

Statistics from the 4.7 development cycle

Posted Aug 14, 2016 3:07 UTC (Sun) by torquay (guest, #92428) [Link]

    What is needed is to guide academia metrics to include de facto standardization as OSS code in their metrics.

There is a hack of sorts to address this very problem: the Journal of Open Source Software. It aims for short peer-reviewed journal articles that accompany open source code. The articles have a corresponding DOI and are fully citeable, just like "regular" academic articles.

See also the announcement about the journal in LWN.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 6:21 UTC (Wed) by blackwood (guest, #44174) [Link]

Re the lack of contributions from consultants:
- Many consulting shops contribute through email addresses of their customer, showing up under them instead of their own.
- "Free Electrons" and "Pengutronix" are both consulting shops.

It's a lot more than what just the "(consultant)" line would indicate.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 9:57 UTC (Wed) by broonie (subscriber, #7078) [Link] (1 responses)

Companies like nVidia will have many more developers than show up upstream - I'd imagine that most of their developers are focused on their product kernels.

Statistics from the 4.7 development cycle

Posted Aug 5, 2016 12:52 UTC (Fri) by armijn (subscriber, #3653) [Link]

Also, some companies might use a "gateway" person who will commit all the changes, but the changes themselves might have been written by a team of people.

Statistics from the 4.7 development cycle

Posted Aug 16, 2016 15:06 UTC (Tue) by marcH (subscriber, #57642) [Link] (1 responses)

> Canonical employs a meagre 10 developers, compared to 37 by SUSE and 91 by Red Hat.

The kernel is a central but still very small part of any Linux distribution. You may have another, useful point but this is not the data that proves it.

Statistics from the 4.7 development cycle

Posted Aug 17, 2016 23:35 UTC (Wed) by lsl (subscriber, #86508) [Link]

It's a pretty important part if what you're selling is expert support on things like file system issues and system performance.

Even excluding the kernel, do you expect the numbers for glibc, gcc, virtualization tools (qemu, libvirt, …) or storage stuff to be vastly different?


Copyright © 2016, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds