|
|
Subscribe / Log in / New account

Statistics from the 4.7 development cycle

Statistics from the 4.7 development cycle

Posted Aug 2, 2016 21:23 UTC (Tue) by fratti (guest, #105722)
Parent article: Statistics from the 4.7 development cycle

Things I find notable:

  • AMD has almost three times the employed Linux kernel developers compared to NVIDIA.
  • Canonical employs a meagre 10 developers, compared to 37 by SUSE and 91 by Red Hat.
  • A lot of the companies involved appear to be ones selling ARM-based devices (or are ARM), which goes to show how much the kernel benefits from the embedded and the mobile market.
  • There is a surprising lack of contributions from consultants, I'd have thought there would be a bigger market for Linux kernel consulting work.
Also, kudos to Intel for hiring that many kernel devs.


to post comments

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 0:55 UTC (Wed) by johannbg (guest, #65743) [Link] (25 responses)

It strikes me a bit odd not seeing *any* "Academia" contribution on that list since it's one of the things he's asking for ( or atleast did when I did a drive by patching to test the efficiency and response of the kernel community back in 2013 ) and 238 of unknowns is quite the number for people who want to remain in the "unknown" category which could mean self employed/consultants/contractors or even Canonical employees for that matter.

Then there is the question what happens to those that ask to be taken out of the list.
Do they fall into the unknown category or are they taken out of the stats altogether?

It actually would be quite interesting to see stats in which area all those unknowns are contributing, if there are any unknowns that fall under the most active developers, the ratio of unknown between men or female ( is one gender preferring to remain as unknown over the other ).

It would also be interesting to see the who are the females behind the linux kernel, their history and statistic associated with that like who was the first woman to ever contribute to the kernel? Is she still contributing? What was/is her experience? Who are the most active ones each cycle? Are there more women contributing? Are there less? is it the same? etc. Break the repetitive pattern and bring in new perspective on the story instead of fixating on the overall story which as the writer mentions has not changed a whole lot.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 1:50 UTC (Wed) by Indelible (guest, #72815) [Link] (2 responses)

One of the things that's completely unremarkable about the female developers I know, is that they wish to be known simply as "developers", _not_ "female developers". A quote from one being "I don't use my boobs to program, so it shouldn't matter if I have them".

I agree that diversity and gender balance are great things, but I also firmly believe that singling out women simply because they are women isn't the right strategy. Stopping the cycle of self-selection by permeating the stereo type of the socially inept, white male geek as the only type of people who suit a programming career/hobby is a much more practical use of time.

Please don't shine a spotlight on women developers for being women, but make the Kernel a place where it doesn't matter what gender you are, because no one uses their boobs to program, including the males who are blessed with them.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 2:37 UTC (Wed) by johannbg (guest, #65743) [Link]

Interesting perspective given that there exist(ed?) spesific outreach program for women <--- ( not everyone and if that's still ongoing but does not mention women specifically those behind that great idea have changed it to "minority" because that's "better" ) anyway knowing the history of women in the kernel would still be interested to me ( at least ) since it's gone quite daunting seeing more or less the same stats, listening to the same people question the same people with the same questions which gets answer the same way or the same people giving the same talk based on the same material year after year.

Even Linus Linux X.X-rcX announcements have repeated pattern in them. yada yada small/big, yada yada driver updates, yada yada go test <shortlog> with occasional yada yada vacation in them. He should have his wife or kids ( or someone else ) write the announcements to break up that pattern for a bit.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 2:54 UTC (Wed) by mjg59 (subscriber, #23239) [Link]

> "I don't use my boobs to program, so it shouldn't matter if I have them"

It shouldn't, but for many it does. Ignoring that reality doesn't solve it.

> Stopping the cycle of self-selection by permeating the stereo type of the socially inept, white male geek as the only type of people who suit a programming career/hobby is a much more practical use of time.

Evidence doesn't really suggest that the stereotype is the problem here - there are far more women in almost every avenue of professional computing than there are in the kernel. While it is a problem that women are outnumbered by men in the field at every stage of the education and career ladder, those numbers alone don't explain why our community is so disproportionately bad. Very few women enter Linux development, and retention of those that do is abysmal. One demonstrated way of increasing representation in communities is to have more role models, and outreach programs are an excellent way of achieving that.

But you're right that focusing on women isn't the only part of this, which is why the focus of projects like Outreachy is now on minorities in general. We should recognise all minorities who are involved despite social pressure making that more difficult, but we should also look at individual groups to determine whether specific strategies are working more effectively or are unintentionally excluding others.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 2:11 UTC (Wed) by zuki (subscriber, #41808) [Link]

I think the statistics about women participating in kernel development would be interesting. We have some rough estimates about the percentages at various conferences, and it would be great to see if there are more/less/same ratios among developers.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 14:03 UTC (Wed) by corbet (editor, #1) [Link] (18 responses)

The lack of contributions from academia have been an interesting problem for years. There are guesses as to why (once the work has gone far enough to be published it stops and there's no incentive to polish it for inclusion, for example), but nobody really seems to know what the roots of it are.

Nobody has ever "asked to be taken out of the list."

Most of the unknowns are small contributors, often cleanups. When we see unknowns making significant contributions, we try harder to figure out who they work for.

Gender ratio is hard; there is no gender tag attached to patches. People often ask for country-based statistics as well. It would all be interesting to know, but somehow I don't want to be the one sending "gender and location?" emails to developers...

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 14:16 UTC (Wed) by patrick_g (subscriber, #44470) [Link] (1 responses)

> People often ask for country-based statistics as well. It would all be interesting to know

There are some statistics here : http://www.remword.com/kps_result (look at NT:Nation by Patch).
But apparently the site was not updated since November 2015 :-(

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 14:22 UTC (Wed) by corbet (editor, #1) [Link]

And those numbers show just the sort of hazard you can run into; it seems to be based mostly on domain names. I'm sure Neil Brown would be surprised to learn that he's German..:)

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 14:37 UTC (Wed) by fratti (guest, #105722) [Link] (3 responses)

>The lack of contributions from academia have been an interesting problem for years. There are guesses as to why (once the work has gone far enough to be published it stops and there's no incentive to polish it for inclusion, for example), but nobody really seems to know what the roots of it are.

Perhaps academia is also focusing on solely academic kernels, since a kernel that does not have to deal with all the pitfalls of real world hardware is a lot easier to work on when you're trying to implement a proof-of-concept feature, though that's just a guess of mine. Someone (with access to comp sci publications) would have to actually dig through all the papers to find out where the work ended up.

It could also be very possible that some company or individual then re-implements the work in an upstreamable shape after reading the paper, which would mean academic contributions are still very much real, just not as direct. Searching the kernel git log for the word "paper" brings up some commit messages where people mention work to be published in a paper and such.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 22:24 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (2 responses)

Academic work makes its way into GCC (e.g., Concepts Lite by Andrew Sutton has papers behind it) and LLVM/Clang fairly regularly. Is the kernel that much more impenetrable than a compiler for one of the most complicated languages around?

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 22:41 UTC (Wed) by anselm (subscriber, #2796) [Link]

GCC and LLVM are probably closer to the cutting edge of research into compiler technology than the Linux kernel is to the cutting edge of research into operating systems, so that's not a huge surprise.

Statistics from the 4.7 development cycle

Posted Aug 6, 2016 15:41 UTC (Sat) by anton (subscriber, #25547) [Link]

Given the direction that GCC and LLVM/Clang are taking, I am happy that the Linux kernel accepts fewer academic contributions. "Optimizations" based on unrealistic assumptions are an interesting academic curiosity, but should never become the default in production compilers.

As for the Linux kernel, other postings have given the reasons; or in other words, there is a gap between where a research projects ends and a piece of code is good enough for inclusion in the kernel. How big is that gap? Philipp Reisner finished his Diplomarbeit (~master's thesis) on DRBD in 2000, then continued working on it commercially (forming a company along the way), and DRBD was finally accepted into the Linux kernel in 2009; I am sure this did not count as academic contribution at that time, and given that many more years had been spent commercially on it than acedemically, counting it as academic would have been wrong.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 15:14 UTC (Wed) by johannbg (guest, #65743) [Link]

There was a fairly recent study [1] done on pull request and their acceptance in opensource projects on github that showed that women's contributions tend to be accepted more often than men's but only was the women's acceptance rates higher when they are not identifiable as women.

This raises the question if the same thing might apply to the kernel community.

1. https://peerj.com/preprints/1733/

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 19:53 UTC (Wed) by Fats (guest, #14882) [Link]

> The lack of contributions from academia have been an interesting problem for years.

As said in the article kernel is boring and academia needs hot and sexy. Also kernel is likely old OS technology so nothing really novel fit for research.

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 8:46 UTC (Thu) by paulj (subscriber, #341) [Link] (3 responses)

The roots should be obvious to anyone who's been in academia: There is no reward to an academic in getting stuff upstream. Indeed, resources spent cleaning and polishing code for upstream - past the point of having gotten it working enough to get the results for ones' papers - are resources that are *diverted* away from working on the next academic paper, and hence cleaning up and polishing code for upstream inclusion can *damage* ones' academic career.

Academics can not fix this alone. One would need to go to the governments' and government agencies funding CS work and make the case to have factors other than paper output considered as success criteria in funding applications, in departmental assessments, in career progression, etc. Now, the relevance of "Impact" (i.e. real-world effects of research) in academia has slowly become more important to funding agencies - academics do often now have to pay some attention to this in funding applications - however it seems generally still to be a side-line performance metric compared the traditional measure of papers (weighted by venue).

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 11:06 UTC (Thu) by jamesmorris (subscriber, #82698) [Link] (2 responses)

Actually, SELinux was a really good example of govt funded academic research evolving into a major open source project.

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 12:48 UTC (Thu) by paulj (subscriber, #341) [Link]

That was "research" that was funded specifically to get mandatory labelled security system into the kernel though.

Most academics in universities (certainly in the UK) have their career progression measured on the success of their papers, not polishing code.

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 17:33 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

It's also a sterling example why academia shouldn't be allowed within 10 miles of a Linux kernel.

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 14:20 UTC (Thu) by deater (subscriber, #11746) [Link] (1 responses)

> The lack of contributions from academia have been an interesting problem
> for years. There are guesses as to why (once the work has gone far enough
> to be published it stops and there's no incentive to polish it for inclusion, for
> example), but nobody really seems to know what the roots of it are.

How is "academia" tabulated on the list? I try to contribute regularly (but possibly not during the 4.7 timeframe) using my .edu address. Would all people with .edu be tabulated under academia, or would we be individually broken out by our University?

The main problem with academia are threefold:
1. Most academic code contributions are *awful*, generally one-off hacks made during a mad rush to get a paper/thesis out the door
2. There are no incentives to merge your results back in (i.e. federal grants and such don't stipulate this, and really outside of google I'm not sure if there's anyone who is sponsoring linux-kernel related reserach grants), and also open-source contributions don't matter for anything on tenure packages.
3. There's a perception (probably rightly so) that trying to get code merged in is going to be a long, frustrating process. Often by the time the student has finished the work and it's time to contribute back, the student has graduated, moved on to a new job, and has no incentive or time to deal with the hassle.

So most of the people I know from academia who contribute back are ones who (like me) were open-source developers first, academics second. And the fact we bother trying to get things contributed back probably hurts our career both financially and timewise.

Statistics from the 4.7 development cycle

Posted Aug 4, 2016 21:34 UTC (Thu) by Lekensteyn (guest, #99903) [Link]

Greg seems to send every contributor an email, asking them go identify themselves. Four options are given, including Academia: "this category is for people working for Universities and doing kernel work as part of their research or other responsibilities related to school work."

For the full description, see https://github.com/gregkh/kernel-history/blob/master/emai...

Statistics from the 4.7 development cycle

Posted Aug 15, 2016 13:29 UTC (Mon) by broonie (subscriber, #7078) [Link] (2 responses)

It's possible that some of the academic contributions are showing up as industrial ones even when done by the academics - at Linaro we're currently working with Paolo Valente on BFQ which was work he originally did in an academic context and is now upstreaming with support from us. Due to the way it's being funded he is contributing from a Linaro account and shows up that way but the core of the work is academic.

Statistics from the 4.7 development cycle

Posted Aug 15, 2016 14:10 UTC (Mon) by Jonno (subscriber, #49613) [Link] (1 responses)

If anything this proves the point that academia doesn't contribute directly to OSS. Obviously they do generate new algorithms and other good ideas, but unless someone else picks up the slack it won't turn into something useful. The fact that Linaro chose to hire someone who previously worked in academia to do the work doesn't change that, it is still someone outside of academia who picks up the academic idea and turns it into something OSS can use.

Statistics from the 4.7 development cycle

Posted Aug 15, 2016 14:18 UTC (Mon) by broonie (subscriber, #7078) [Link]

No, it's still the same people doing the same work with some extra collaborators - it's not a case of the work being thrown over the wall and picked up by industry but rather a partnership.

Statistics from the 4.7 development cycle

Posted Aug 18, 2016 15:50 UTC (Thu) by ortalo (guest, #4654) [Link]

Thanks for never forgetting the privacy issue. However, the key point is not the data, but the treatment you do with it.
So, what would we do with this information and the associated statistics? I am not sure we can do something useful (to either gender) with the result.
Similar for countries.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 22:35 UTC (Wed) by linusw (subscriber, #40300) [Link] (1 responses)

The academia story is pretty straight-forward I think: everyone will work toward the quantitative key performance indicator that their management use. The larger and more bureaucratic the organization, the more it will focus on quantitative measures over qualitative factors.

Academia today in what is known as "the new production of knowledge" is pretty much guided by the science citation index: what is important is to make publications and get them quoted by other publications, that appear in the citation index.

In many cases your research grants will be controlled by these metrics so it becomes a closed loop.

What is needed is to guide academia metrics to include de facto standardization as OSS code in their metrics. I have no clue how that can be made to happen. Right now, if you tell the management at an academic institution that you write OSS code you will be met with a mixture of yawns and shrugs.

Statistics from the 4.7 development cycle

Posted Aug 14, 2016 3:07 UTC (Sun) by torquay (guest, #92428) [Link]

    What is needed is to guide academia metrics to include de facto standardization as OSS code in their metrics.

There is a hack of sorts to address this very problem: the Journal of Open Source Software. It aims for short peer-reviewed journal articles that accompany open source code. The articles have a corresponding DOI and are fully citeable, just like "regular" academic articles.

See also the announcement about the journal in LWN.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 6:21 UTC (Wed) by blackwood (guest, #44174) [Link]

Re the lack of contributions from consultants:
- Many consulting shops contribute through email addresses of their customer, showing up under them instead of their own.
- "Free Electrons" and "Pengutronix" are both consulting shops.

It's a lot more than what just the "(consultant)" line would indicate.

Statistics from the 4.7 development cycle

Posted Aug 3, 2016 9:57 UTC (Wed) by broonie (subscriber, #7078) [Link] (1 responses)

Companies like nVidia will have many more developers than show up upstream - I'd imagine that most of their developers are focused on their product kernels.

Statistics from the 4.7 development cycle

Posted Aug 5, 2016 12:52 UTC (Fri) by armijn (subscriber, #3653) [Link]

Also, some companies might use a "gateway" person who will commit all the changes, but the changes themselves might have been written by a team of people.

Statistics from the 4.7 development cycle

Posted Aug 16, 2016 15:06 UTC (Tue) by marcH (subscriber, #57642) [Link] (1 responses)

> Canonical employs a meagre 10 developers, compared to 37 by SUSE and 91 by Red Hat.

The kernel is a central but still very small part of any Linux distribution. You may have another, useful point but this is not the data that proves it.

Statistics from the 4.7 development cycle

Posted Aug 17, 2016 23:35 UTC (Wed) by lsl (subscriber, #86508) [Link]

It's a pretty important part if what you're selling is expert support on things like file system issues and system performance.

Even excluding the kernel, do you expect the numbers for glibc, gcc, virtualization tools (qemu, libvirt, …) or storage stuff to be vastly different?


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds