LWN.net Logo

Forking the ARM kernel?

June 2, 2011

This article was contributed by Thomas Gleixner

In the last few months, many people have suggested forking the ARM kernel and maintaining it as a separate project. While the reasons for forking ARM may seem attractive to some, it turns out that it really doesn't make very much sense for either the ARM community or the kernel as a whole.

Here are the most common reasons given for this suggestion:

  • Time to market

  • It matches the one-off nature of consumer electronics

  • It better suits the diversity of the system-on-chip (SoC) world

  • It avoids the bottleneck of maintainers and useless extra work in response to reviews

Let's have a look at these reasons.

Time to market

The time-to-market advocates reason that it takes less time to hack up a new driver than to get it accepted upstream. I know I'm old school and do not understand the rapidly changing world and new challenges of the semiconductor industry anymore, but I still have enough engineering knowledge and common sense to know that there is no real concept of totally disconnected "new" things in that industry. Most of an SoC's IP blocks (functionality licensed for inclusion in an SoC) will not be new. So what happens to time to market the second time an IP block is used? If the driver is upstream, it can simply be reused. But all too often, a new driver gets written from scratch, complete with a new set of bugs that must be fixed. Are vendors really getting better time to market by rewriting new drivers for old IP blocks on every SoC they sell?

In addition, the real time to market for a completely new generation of chips is not measured in weeks. The usual time frame for a new chip, from the marketing announcement to real silicon usable in devices is close to a year. This should be ample time to get a new driver upstream. In addition, the marketing announcement certainly does not happen the day after the engineering department met for beers after work and some genius engineer sketched the new design on a napkin at the bar, so most projects have even more time for upstreaming.

The one-off nature of embedded

It's undisputed that embedded projects, especially in the consumer market, tend to be one-off, but it is also a fact that the variations of a given SoC family have a lot in common and differ only in small details. In addition, variations of a given hardware type - e.g. the smartphone family which differs in certain aspects of functionality - share most of the infrastructure. Even next generation SoCs often have enough pieces of the previous generation embedded into them as there is no compelling reason to replace already proven-to-work building blocks when the functionality is sufficient. Reuse of known-to-work parts is not a new concept and is, unsurprisingly, an essential part of meeting time-to-market goals.

We recently discovered the following gem. An SoC with perfect support for almost all peripheral components that was already in the mainline kernel underwent a major overhaul. It replaced the CPU core, while the peripheral IP blocks remained the same - except for a slightly different VHDL glue layer which was necessary to hook them up to the new CPU core. Now the engineer in me would have expected that the Linux support for the resulting SoC generation B would have just been a matter of adjusting the existing drivers. The reality taught us that the vendor assigned a team to create the support for the "new" SoC and all the drivers got rewritten from scratch. While we caught some of the drivers in review, some of the others went into the mainline, so now we have two drivers for the same piece of silicon that are neither bug nor feature compatible.

I have a really hard time accepting that rewriting a dozen drivers from scratch is faster than sitting down and identifying the existing - proven to work - drivers and modifying them for the new SoC design. The embedded industry often reuses hardware. Why not reuse software, too?

The SoC diversity

Of course, every SoC vendor will claim that its chip is unique in all aspects and so different that sharing more than a few essential lines of code is impossible. That's understandable from the marketing side, but if you look at the SoC data sheets, the number of unique peripheral building blocks is not excitingly large. Given the fact that the SoC vendors target the same markets and the same customer base, that's more or less expected. A closer look reveals that different vendors often end up using the same or very similar IP blocks for a given functionality. There are only a limited number of functional ways to implement a given requirement in hardware and there are only a few relevant IP block vendors that ship their "unique" building blocks to all of the SoC vendors. The diversity is often limited to a different arrangement of registers or the fact that one vendor chooses a different subset of functionality than the other.

We have recently seen consolidation work in the kernel which proves this to be correct. When cleaning up the interrupt subsystem I noticed that there are only two widely used types of interrupt controllers. Without much effort it was possible to replace the code for more than thirty implementations of "so different" chips with a generic implementation. A similar effort is on the way to replace the ever-repeating pattern of GPIO chip support code. These are the low hanging fruit and there is way more potential for consolidation.

Avoiding the useless work

Dealing with maintainers and their often limited time for review is seen as a bottleneck. The extra work which results in addressing the review comments is also seen as waste of time. The number of maintainers and the time available for review is indeed a limiting factor that needs to be addressed. The ability to review is not limited to those who maintain a certain subsystem of the kernel and we would like to see more people participating in the review process. Spending a bit of time reviewing other people's code is a very beneficial undertaking as it opens one's mind to different approaches and helps to better understand the overall picture.

On the other side, getting code reviewed by others is beneficial as well and, in general, leads to better and more maintainable code. It also helps in avoiding mistakes in the next project. In a recent review, which went through many rounds, a 1200-line driver boiled down to 250 lines and at least a handful of bugs and major mistakes got fixed.

When I have the chance to talk to developers after a lengthy review process, most of them concede that the chance to learn and understand more about the Linux way of development by far outweighs the pain of the review process and the necessary rework. When looking at later patches I've often observed that these developers have improved and avoided the mistakes they made in their first attempts. So review is beneficial for the developer and for their company as it helps writing better code in a more efficient way. I call out those who still claim that review and the resulting work is a major obstacle as hypocrites who are trying to shift the blame for other deficiencies in their companies to the kernel community.

One deficiency is assigning proprietary RTOS developers to write Linux kernel code without teaching them how to work with the Linux community. There is no dishonor in not knowing how to work with the Linux community; after all, every kernel developer including myself started at some point without knowing it. But it took me time to learn how to work with the community, and it will take time for proprietary RTOS developers to work with the community. It is well worth the time and effort.

What would be solved by forking the ARM kernel?

Suppose there was an ARM-specific Git repository which acted as a dumping ground for all of the vendor trees. It would pull in the enhancements of the real mainline kernel from time to time so that the embedded crowd gets the new filesystems, networking features, etc. Extrapolating the recent flow of SoC support patches into Linux and removing all the sanity checks on them would result in a growth rate of that ARM tree which would exceed the growth rate of the overall mainline kernel in no time. And what if the enhancements from mainline require changes to every driver in the ARM tree, as was required for some of my recent work? Who makes those changes? If the drivers are in mainline, the drivers are changed as part of the enhancement. If there is a separate ARM fork, some ARM-fork maintainer will have to make these changes.

And who is going to maintain such a tree? I have serious doubts that there will surface a sufficient number of qualified maintainers out of the blue who have the bandwidth and the experience to deal with such a flood of changes. So to avoid the bottleneck that is one of the complaints when working with mainline, the maintainers would probably just have the role of integrators who merely aggregate the various vendor trees in a central place.

What's the gain of such an exercise? Nothing as far as I can see, it would just allow everyone to claim that all of their code is part of a mysterious ARM Git tree and, of course, it would fulfill the ultimate "time to market" requirements, in the short term, anyway.

How long would an ARM fork be sustainable?

I seriously doubt that an ARM fork would work longer than a few kernel cycles simply because the changes to core code will result in a completely unmaintainable #ifdef mess with incompatible per-SoC APIs that will drive anyone who has to "maintain" such a beast completely nuts in no time. I'm quite sure that none of the ARM-fork proponents has ever tried to pull five full-flavored vendor trees into a single kernel tree and deal with the conflicting changes to DMA APIs, driver subsystems, and infrastructure. I know that maintainers of embedded distribution kernels became desperate in no time exactly for those reasons and I doubt that any reasonable kernel developer is insane enough to cope with such an horror for more than a couple of months.

Aside from the dubious usefulness, such an ARM fork would cut off the ARM world from influencing the overall direction of the Linux kernel entirely. ARM would become a zero-interest issue for most of the experienced mainline maintainers and developers as with other out-of-tree projects. I doubt that the ARM industry can afford to disconnect itself in such a way, especially as the complexity of operating-system-level software is increasing steadily.

Is there a better answer?

There is never an ultimate answer which will resolve all problems magically, but there are a lot of small answers which can effectively address the main problem spots.

One of the root causes for the situation today is of a historical nature. For over twenty years the industry dealt with closed source operating systems where changes to the core code were impossible and collaboration with competitors was unthinkable and unworkable. Now, after moving to Linux, large parts of the industry still think in this well-known model and let their engineers - who have often worked on other operating systems before working on Linux - just continue the way they enabled their systems in the past. This a perfect solution for management as well, because the existing structures and the idea of top-down software development and management still applies.

That "works" as long as the resulting code does not have to be integrated into the mainline kernel and each vendor maintains its own specialized fork. There are reasonable requests from customers for mainline integration, however, as it makes the adoption of new features easier, there is less dependence on the frozen vendor kernels, and, as seen lately, it allows for consolidation toward multi-platform kernels. The latter is important for enabling sensible distribution work targeted at netbooks, tablets, and similar devices. This applies even more for the long-rumored ARM servers which are expected to materialize in the near future. Such consolidation requires cooperation not only across the ARM vendors, it requires a collaborative effort across many parts of the mainline kernel along with the input of maintainers and developers who are not necessarily part of the ARM universe.

So we need to help management understand that holding on to the known models is not the most efficient way to deal with the growing complexity of SoC hardware and the challenges of efficient and sustainable operating system development in the open source space. At the same time, we need to explain at the engineering level that treating Linux in the same way as other OS platforms is making life harder, and is at least partially responsible for the grief which is observed when code hits the mailing lists for review.

Another area that we need to work on is massive collaborative consolidation, which has the preliminary that silicon vendors accept that, at least at the OS engineering level, their SoCs are not as unique as their marketing department wants them to believe. As I explained above, there are only a limited number of ways to implement a given functionality in hardware, which is especially true for hardware with a low complexity level. So we need to encourage developers to first look to see whether existing code might be refactored to fit the new device instead of blindly copying the closest matching driver - or in the worst case a random driver - and hacking it into shape somehow.

In addition, the kernel maintainers need to be more alert to that fact as well and help to avoid the reinvention of the wheel. If driver reuse cannot be achieved, we can often pull out common functionality into core code and avoid duplication that way. There is a lot of low hanging fruit here, and the Linux kernel community as a whole needs to get better and spend more brain cycles on avoiding duplication.

One step forward was taken recently with the ARM SoC consolidation efforts that were initiated by Linaro. But this only will succeed when we are aware of the conflicts with the existing corporate culture and address these conflicts at the non-technical level as well.

Secrecy

Aside from the above issues the secrecy barrier is going to be the next major challenge. Of course a silicon vendor is secretive about the details of its next-generation SoC design, but the information which is revealed in marketing announcements allows us to predict at least parts of the design pretty precisely.

The most obvious recent example is the next-generation ARM SoCs from various vendors targeted for the end of 2011. Many of them will come with USB 3.0 support. Going through the IP block vendor offerings tells us that there are less USB 3.0 IP blocks available for integration than there are SoC vendors who have announced new chips with USB 3.0 support. That means that there are duplicate drivers in the works and I'm sure that, while the engineers are aware of this, no team is allowed to talk to the competitor's team. Even if they would be allowed to do so, it's impossible to figure out who is going to use which particular IP block. So we will see several engineering teams fighting over the "correct" implementation to be merged as the mainline driver in a couple of month when the secrecy barrier has been lifted.

Competing implementations are not a bad thing per se, but the inability to exchange information and discuss design variants is not helping anyone in the "time to market" race. I seriously doubt that any of the to-be-released drivers will have a relevant competitive advantage and even if one does, it will not last very long when the code becomes public. It's sad that opportunities to collaborate and save precious engineering resources for all involved parties are sacrificed in favor of historical competition-oriented behaviour patterns. These patterns have not been overhauled since they were invented twenty or more years ago and they have never been subject to scrutiny in the context of competition on the open source operating system playground.

A way forward

The ever-increasing complexity of hardware, which causes more complex operating-system-level software, caused a shortage of high-profile OS level software developers long ago which cannot be counterbalanced by either money or by assigning a large enough number of the cheapest available developer resources to it. The ARM universe is diversified enough that there is no chance for any of the vendors to get hold of a significant enough number of outstanding kernel developers to cause any serious damage to their competitors. That is particularly true given the fact that those outstanding developers generally prefer to work in the open rather than a semi-closed environment.

It's about time for managers to rethink their competition models and start to comprehend and utilize the massive advantage of collaborative models over an historic and no-longer-working model that assumes the infinite availability of up-to-the-task resources when there is enough money thrown into the ring. Competent developers certainly won't dismiss the chance to get some extra salary lightly, but the ability to get a way more enjoyable working environment for a slightly smaller income is for many of them a strong incentive to refuse the temptation. It's an appealing thought for me that there is no "time to market" howto, no "shareholder value" handbook, and no "human resources management" course which ever took into consideration that people might be less bribable than generally expected, especially if this applies to those people who are some of the scarcest resources in the industry.

I hope that managers working for embedded vendors will start to understand how Open Source works and why there is an huge benefit to work with the community. After all, the stodgy old server vendors were able to figure out how to work with the Linux community, so it cannot be that hard for the fast-moving embedded vendors. However, the realist in me - sometimes called "the grumpy old man" - who has worked in the embedded industry for more than twenty five years does not believe that at all. For all too many SoC vendors, the decision to "work" with the community was made due to outside pressure and an obsession with following the hype.

Outside pressure is not what the open source enthusiasts might hope for: the influence of the community itself. No, it's simply the pressure made by (prospective) customers who request that the chip is - at least basically - supported by the mainline kernel. Following the hype is omnipresent and it always seems to be a valid argument to avoid common-sense-driven decisions based on long-term considerations.

The eternal optimist in me still has hope that the embedded world will become a first class citizen in the Linux community sooner rather than later. The realist in me somehow doubts that it will happen before the "grumpy old man" is going to retire.


(Log in to post comments)

Forking the ARM kernel?

Posted Jun 3, 2011 5:54 UTC (Fri) by Lumag (subscriber, #22579) [Link]

I remember several so-called forks of Linux kernel. Hackndev (Linux for Palm PDAs) is one of the first that come to my mind. Now hackndev is dead long ago. Probably it still contains some drivers/features not present in mainline, but it's dead. Same will come to "ARM fork" kernel.

In the end everything comes to fear of changes. In the end we have a merged x86 arch. We have first commited in m68knommu and then merged it back to main m68k arch. We have survived major PPC rewrite. I hope that in the end
people interested in having ARM in mainline will have what they want.

Forking the ARM kernel?

Posted Jun 3, 2011 12:07 UTC (Fri) by broonie (subscriber, #7078) [Link]

I don't think anyone's seriously talking about doing this, it's more the vendor BSPs that are being talked about than anything else.

IP block owners

Posted Jun 3, 2011 9:50 UTC (Fri) by Bogerr (subscriber, #36700) [Link]

As I understand, IP-blocks are developed by 3-rd parties, then the most interested party in mainline drivers is IP block creator.It's a real competitive advantage to have mainline drivers for own IP blocks.
Or I miss something?

IP block owners

Posted Jun 3, 2011 11:52 UTC (Fri) by tdwebste (guest, #18154) [Link]

You are missing the fact that IP block creators/owners are one step or two steps removed from the end customers of their IP blocks.

Perhaps in time many IP block creators/owners will contribute mainline drivers.

IP block owners

Posted Jun 3, 2011 14:17 UTC (Fri) by linusw (subscriber, #40300) [Link]

One type of IP blocks I have a lot of experience with comes from ARM Ltd. themselves and those are called "PrimeCells". Those are given in source code (VHDL) form to vendors licensing ARM CPUs, more or less as "no strings attached examples". Numerous vendors have since modified the silicon in several ways, fixing hardware bugs etc. But all in all, they could mostly all use the same driver, even though things have been extended and moved around.

It would actually have been a good thing if the VHDL for the IP blocks were Open Source, because many vendors seem to be fixing the same bugs in hardware too. But the world seems not to be ready for OpenCores just yet. :-(

IP block owners

Posted Jun 3, 2011 21:42 UTC (Fri) by iabervon (subscriber, #722) [Link]

The IP block creator is interested in not doing any more work than necessary. They also don't necessarily have a platform for developing and testing software that uses their IP block. Their IP block is set up with the core exchanging data on the signal lines, and they may develop with a test framework instead of an actual core on the other end of those lines. It would obviously be useful if they wrote and provided drivers for their IP blocks, but if their competition isn't doing so either, they can make sales without doing it. And, of course, this is yet another group of people whose technical expertise is not in writing maintainable code in good style (or even software at all). Really, there should be a single driver that goes with each IP block, and the IP block creator should be providing it to the SoC vendor. But they're not going to do the work without demand from the people who pay them.

IP block owners

Posted Jun 9, 2011 12:02 UTC (Thu) by slashdot (guest, #22014) [Link]

Why would there be no demand?

Given two IP blocks doing the same thing, where one just works, and the other requires to assemble a team to write a driver, surely the first is more attractive, no?

In addition to saving the cost of developing the driver, it would be possible to test the IP block immediately with no investment, and thus feel immediately confident that it works and the final product can actually be shipped with it.

IP block owners

Posted Jun 9, 2011 20:02 UTC (Thu) by iabervon (subscriber, #722) [Link]

Ah, but there isn't an alternative IP block with a driver to pick. As long as nobody making IP blocks supplies drivers, everybody making them gets away with not supplying drivers. Once somebody starts providing competent drivers, everybody else will have to follow suit, as you suggest. But it's hard enough to do and a limited competitive advantage (since everybody else would copy you) that market pressures won't provide the activation energy.

Also, my impression is that you can't do much with an IP block until you fabricate an ASIC that includes it, which is a substantial investment anyway.

Forking the ARM kernel?

Posted Jun 3, 2011 14:21 UTC (Fri) by linusw (subscriber, #40300) [Link]

One thing that seems hard to counter is getting already forked drivers out of the kernel (c.f. drivers/mmc/msm_sdcc.* which is a clone of mmci.*), and feeling helpless about that whole business. I have toyed with the idea of proposing patches deleting the inferior forks, but that feels hostile, at the same time asking kindly top merge support into the existing driver doesn't seem to work either.

Forking the ARM kernel?

Posted Jun 4, 2011 10:16 UTC (Sat) by arnd (subscriber, #8866) [Link]

Outright removing would be rather hostile indeed, but maybe moving these drivers to staging is an acceptable way. This way, the driver can still stay around until we're sure that there are no direct regressions and we can remove the staging driver once the main driver is known to be working correctly.

Forking the ARM kernel?

Posted Jun 5, 2011 8:44 UTC (Sun) by linusw (subscriber, #40300) [Link]

That's a pretty good idea actually, I'll try that whenever I find some time!

Forking the ARM kernel?

Posted Jun 9, 2011 21:36 UTC (Thu) by Wol (guest, #4433) [Link]

And if, having found a vendor who actually works with the kernel devs, you leave their driver in main and move competing drivers to staging regardless of quality (okay, if they work with the kernel devs chances are their drivers are best, anyway), you send a clear message to all the other companies that it pays to play nice ...

Cheers,
Wol

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds