June 2, 2011
This article was contributed by Thomas Gleixner
In the last few months, many people have suggested forking
the ARM kernel and maintaining it as a separate project. While the reasons for
forking ARM may seem attractive to some, it turns out that it really
doesn't make very much sense for either the ARM community or the
kernel as a whole.
Here are the most common reasons given for this suggestion:
- Time to market
- It matches the one-off nature of consumer electronics
- It better suits the diversity of the system-on-chip (SoC) world
- It avoids the bottleneck of maintainers and useless extra work in
response to reviews
Let's have a look at these reasons.
Time to market
The time-to-market advocates reason that it takes less time to
hack up a new driver than to get it accepted upstream.
I know I'm old school and do not understand the rapidly changing world
and new challenges of the semiconductor industry anymore, but I
still have enough engineering knowledge and common sense to know that
there is no real concept of totally disconnected "new" things in that
industry.
Most of an SoC's IP blocks (functionality licensed for inclusion in an SoC)
will not be new.
So what happens to time to market the second time an IP block is used?
If the driver is upstream, it can simply be reused.
But all too often, a new driver gets written from scratch, complete with a new
set of bugs that must be fixed.
Are vendors really getting better time to market by rewriting new drivers
for old IP blocks on every SoC they sell?
In addition, the real time to market for a completely new generation of
chips is not measured in weeks.
The usual time frame for a new chip, from the marketing
announcement to real silicon usable in devices is close to a year.
This should be ample time to get a new driver upstream.
In addition, the marketing announcement certainly does not happen the day
after the engineering department met for beers
after work and some genius engineer sketched the new design on a
napkin at the bar, so most projects have even more time for upstreaming.
The one-off nature of embedded
It's undisputed that embedded projects, especially in the consumer
market, tend to be one-off, but it is also a fact that the variations of
a given SoC family have a lot in common and differ only in small
details. In addition, variations of a given hardware type -
e.g. the smartphone family which differs in certain aspects of
functionality - share most of the infrastructure. Even next
generation SoCs often have enough pieces of the previous generation
embedded into them as there is no compelling reason to replace already
proven-to-work building blocks when the functionality is
sufficient. Reuse of known-to-work parts is not a new concept and
is, unsurprisingly, an essential part of meeting time-to-market goals.
We recently discovered the following gem. An SoC with perfect support
for almost all peripheral components that was already in the mainline kernel
underwent a
major overhaul. It replaced the CPU core, while the peripheral IP
blocks remained the same - except for a slightly different VHDL glue
layer which was necessary to hook them up to the new CPU core. Now the
engineer in me would have expected that the Linux support for the
resulting SoC generation B would have just been a matter of adjusting
the existing drivers. The reality taught us that the vendor assigned
a team to create the support for the "new" SoC and all the drivers got
rewritten from scratch. While we caught some of the drivers in review,
some of the others went into the mainline, so now we have two drivers for the
same piece of silicon that are neither bug nor feature compatible.
I have a really hard time accepting that rewriting a dozen drivers from
scratch is faster than sitting down and identifying the existing -
proven to work - drivers and modifying them for the new SoC design.
The embedded industry often reuses hardware.
Why not reuse software, too?
The SoC diversity
Of course, every SoC vendor will claim that its chip is unique in all
aspects and so different that sharing more than a few essential lines
of code is impossible. That's understandable from the marketing side,
but if you look at the SoC data sheets,
the number of unique peripheral building blocks is not excitingly
large. Given the fact that the SoC vendors target the same markets and
the same customer base, that's more or less expected. A closer look
reveals that different vendors often end up using the same or
very similar IP blocks for a given functionality. There are only a
limited number of functional ways to implement a given requirement in
hardware and there are only a few relevant IP block vendors that ship
their "unique" building blocks to all of the SoC vendors. The
diversity is often limited to a different arrangement of
registers or the fact that one vendor chooses a different subset of
functionality than the other.
We have recently seen consolidation work in the kernel which proves
this to be correct. When cleaning up the interrupt subsystem I noticed
that there are only two widely used types of interrupt controllers.
Without much effort it was possible to replace the code for
more than thirty implementations of "so different" chips with a
generic implementation. A similar effort is on the way to replace the
ever-repeating pattern of GPIO chip support code. These are the low
hanging fruit and there is way more potential for consolidation.
Avoiding the useless work
Dealing with maintainers and their often limited time for review is
seen as a bottleneck. The extra work which results in addressing
the review comments is also seen as waste of time. The number of
maintainers and the time available for review is indeed a limiting
factor that needs to be addressed. The ability to review is not
limited to those who maintain a certain subsystem of the kernel and we
would like to see more people participating in the review process. Spending
a bit of time reviewing other people's code is a very beneficial
undertaking as it opens one's mind to different approaches and helps to
better understand the overall picture.
On the other side, getting code reviewed by others is beneficial as
well and, in general, leads to better and more maintainable code. It
also helps in avoiding mistakes in the next project. In a recent review,
which went through many rounds, a 1200-line driver boiled down to
250 lines and at least a handful of bugs and major mistakes got
fixed.
When I have the chance to talk to developers after a lengthy
review process, most of them concede that the chance to learn and
understand more about the Linux way of development by far outweighs
the pain of the review process and the necessary rework. When looking
at later patches I've often observed that these developers
have improved and avoided the mistakes they made in their first
attempts. So review is beneficial for the developer and for their company as
it helps writing better code in a more efficient way. I call
out those who still claim that review and the resulting work is a
major obstacle as hypocrites who are trying to shift the blame for other
deficiencies in their companies to the kernel community.
One deficiency is assigning proprietary RTOS developers to
write Linux kernel code without teaching them how to work with
the Linux community.
There is no dishonor in not knowing how to work with the Linux
community; after all, every kernel developer including myself started
at some point without knowing it.
But it took me time to learn how to work with the community, and it will
take time for proprietary RTOS developers to work with the community.
It is well worth the time and effort.
What would be solved by forking the ARM kernel?
Suppose there was an ARM-specific Git repository which acted as a
dumping ground for all
of the vendor trees. It would pull in the enhancements of the real mainline
kernel
from time to time so that the embedded crowd gets the new filesystems,
networking features, etc.
Extrapolating
the recent flow of SoC support patches into Linux and removing all the
sanity checks on them would result in a growth rate of that ARM tree
which would exceed the growth rate of the overall mainline kernel
in no time.
And what if the enhancements from mainline require changes to every driver in
the ARM tree, as was required for some of my recent work?
Who makes those changes?
If the drivers are in mainline, the drivers are changed as part of the
enhancement.
If there is a separate ARM fork, some ARM-fork maintainer will have to
make these changes.
And who is going to maintain such a tree? I have serious doubts that
there will surface a sufficient number of qualified maintainers out of
the blue who have the bandwidth and the experience to deal with such a
flood of changes. So to avoid the bottleneck that is one of the
complaints when working with mainline, the maintainers would probably
just have the role of integrators who merely aggregate the various vendor
trees in a central place.
What's the gain of such an exercise? Nothing as far as I can see, it
would just allow everyone to claim that all of their code is part of a
mysterious ARM Git tree and, of course, it would fulfill the ultimate
"time to market" requirements, in the short term, anyway.
How long would an ARM fork be sustainable?
I seriously doubt that an ARM fork would work longer than a few kernel cycles
simply because the changes to core code will result in a completely
unmaintainable #ifdef mess with incompatible per-SoC APIs that will
drive anyone who has to "maintain" such a beast completely nuts in no
time. I'm quite sure that none of the ARM-fork proponents has ever
tried to pull five full-flavored vendor trees into a single kernel
tree and deal with the conflicting changes to DMA APIs, driver
subsystems, and infrastructure. I know that maintainers of embedded
distribution kernels became desperate in no time exactly for those
reasons and I doubt that any reasonable kernel developer is insane
enough to cope with such an horror for more than a couple of months.
Aside from the dubious usefulness, such an ARM fork would cut off
the ARM world from influencing the overall direction of the Linux
kernel entirely. ARM would become a zero-interest issue for most
of the experienced mainline maintainers and developers as with other
out-of-tree projects. I doubt that the ARM industry can afford to
disconnect itself in such a way, especially as the complexity of
operating-system-level software is increasing steadily.
Is there a better answer?
There is never an ultimate answer which will resolve all problems
magically, but there are a lot of small answers which can effectively
address the main problem spots.
One of the root causes for the situation today is of a historical
nature. For over twenty years the industry dealt with closed source
operating systems where changes to the core code were impossible and
collaboration with competitors was unthinkable and unworkable. Now,
after moving to Linux, large parts of the industry still think in this
well-known model and let their engineers - who have often worked on other
operating systems before working on Linux - just continue the way
they enabled their systems in the past. This a perfect solution
for management as well, because the existing structures and the idea of
top-down software development and management still applies.
That "works" as long as the resulting code does not have to be integrated
into the mainline kernel and each vendor maintains its own specialized
fork. There are reasonable requests from customers for mainline
integration, however, as it makes the adoption of new features
easier, there is less dependence on the frozen vendor kernels, and, as seen lately,
it allows for consolidation toward multi-platform kernels. The latter is
important for enabling sensible distribution work targeted at netbooks,
tablets, and similar devices. This applies even more for the long-rumored
ARM servers which are expected to materialize in the near
future. Such consolidation requires cooperation not only across the ARM
vendors, it requires a collaborative effort across many parts of the
mainline kernel along with the input of maintainers and developers who are
not necessarily part of the ARM universe.
So we need to help management understand that holding on to the known
models is not the most efficient way to deal with the growing
complexity of SoC hardware and the challenges of efficient and
sustainable operating system development in the open source space. At
the same time, we need to explain at the engineering level that
treating Linux in the same way as other OS platforms is making
life harder, and is at least partially responsible for the grief which
is observed when code hits the mailing lists for review.
Another area that we need to work on is massive collaborative
consolidation, which has the preliminary that silicon vendors accept
that, at least at the OS engineering level, their SoCs are not as unique
as their marketing department wants them to believe. As I explained
above, there are only a limited number of ways to implement a given
functionality in hardware, which is especially true for hardware with a
low complexity level. So we need to encourage developers to first look to see
whether existing code might be refactored to fit the new device
instead of blindly copying the closest matching driver - or in the worst case a
random driver - and hacking it into shape somehow.
In addition, the kernel
maintainers need to be more alert to that fact as well and help to
avoid the reinvention of the wheel. If driver reuse cannot be
achieved, we can often pull out common functionality into
core code and avoid duplication that way. There is a lot of low
hanging fruit here, and the Linux kernel community as a whole needs to
get better and spend more brain cycles on avoiding duplication.
One step forward was taken recently with the ARM SoC consolidation
efforts that were initiated by Linaro. But
this only will succeed when we are aware of the conflicts with the
existing corporate culture and address these conflicts at the non-technical
level as well.
Secrecy
Aside from the above issues the secrecy barrier is going to be the next
major challenge. Of course a silicon vendor is secretive about the
details of its next-generation SoC design, but the information which
is revealed in marketing announcements allows us to predict at least
parts of the design pretty precisely.
The most obvious recent example
is the next-generation ARM SoCs from various vendors targeted for the end
of 2011. Many of them will come with USB 3.0 support. Going through
the IP block vendor offerings tells us that there are less USB 3.0 IP
blocks available for integration than there are SoC vendors who have announced
new chips with USB 3.0 support. That means that there are duplicate
drivers in the works and I'm sure that, while the engineers are aware
of this, no team is allowed to talk to the competitor's team. Even if they
would be allowed to do so, it's impossible to
figure out who is going to use which particular IP block. So we will
see several engineering teams fighting over the "correct" implementation to
be merged as the mainline driver in a couple of month when the secrecy
barrier has
been lifted.
Competing implementations are not a bad thing per se, but
the inability to exchange information and discuss design variants is
not helping anyone in the "time to market" race. I seriously doubt
that any of the to-be-released drivers will have a relevant
competitive advantage and even if one does, it will not last
very long when the code becomes public. It's sad that opportunities to
collaborate and save precious engineering resources for all involved
parties are sacrificed in favor of historical competition-oriented
behaviour patterns. These patterns have not been overhauled since they were
invented twenty or more years ago and they have never been subject to
scrutiny in the context of competition on the open source operating
system playground.
A way forward
The ever-increasing complexity of hardware, which
causes more complex operating-system-level software, caused a
shortage of high-profile OS level software developers long ago which
cannot be counterbalanced by either money or by assigning a large
enough number of the cheapest available developer resources to
it. The ARM universe is diversified enough that there is no
chance for any of the vendors to get hold of a significant enough number
of outstanding kernel developers to cause any serious damage to their
competitors. That is particularly true given the fact that those outstanding
developers generally prefer to work in the open rather than a semi-closed
environment.
It's about time for managers to rethink their competition models
and start to comprehend and utilize the massive advantage of
collaborative models over an historic and no-longer-working model
that assumes the infinite availability of up-to-the-task resources
when there is enough money thrown into the ring. Competent developers
certainly won't dismiss the chance to get some extra salary lightly,
but the ability to get a way more enjoyable working environment for a
slightly smaller income is for many of them a strong incentive to
refuse the temptation. It's an appealing thought for me that there is
no "time to market" howto, no "shareholder value" handbook, and no
"human resources management" course which ever took into consideration
that people might be less bribable than generally expected, especially
if this applies to those people who are some of the scarcest resources
in the industry.
I hope that managers working for embedded vendors will start to
understand how Open Source works and why there is an huge benefit to
work with the community.
After all, the stodgy old server vendors were able to figure out how
to work with the Linux community, so it cannot be that hard for
the fast-moving embedded vendors.
However, the realist in me - sometimes called "the grumpy old man" - who
has worked
in the embedded industry for more than twenty five years does not believe that
at all. For all too many SoC vendors, the decision to "work" with the
community was made due to outside pressure and an obsession with
following the hype.
Outside pressure is not what the open source enthusiasts might hope
for: the influence of the community itself. No, it's simply the
pressure made by (prospective) customers who request that the chip
is - at least basically - supported by the mainline kernel.
Following the hype is omnipresent and it always seems to be a valid
argument to avoid common-sense-driven decisions based on long-term
considerations.
The eternal optimist in me still has hope that the embedded world will
become a first class citizen in the Linux community
sooner rather than later. The realist in me somehow doubts that it
will happen before the "grumpy old man" is going to retire.
(
Log in to post comments)