LWN.net Logo

The platform problem

By Jonathan Corbet
May 18, 2011
Your editor first heard the "platform problem" described by Thomas Gleixner. In short, the platform problem comes about when developers view the platform they are developing for as fixed and immutable. These developers feel that the component they are working on specifically (a device driver, say) is the only part that they have any control over. If the kernel somehow makes their job harder, the only alternatives are to avoid or work around it. It is easy to see how such an attitude may come about, but the costs can be high.

Here is a close-to-home example. Your editor has recently had cause to tear into the cafe_ccic Video4Linux2 driver in order to make it work in settings beyond its original target (which was the OLPC XO 1 laptop). This driver has a fair amount of code for the management of buffers containing image frames: queuing them for data, delivering them to the user, implementing mmap(), implementing the various buffer-oriented V4L2 calls, etc. Looking at this code, it is quite clear that it duplicates the functionality provided by the videobuf layer. It is hard to imagine what inspired the idiotic cafe_ccic developer to reinvent that particular wheel.

Or, at least, it would be hard to imagine except for the inconvenient fact that said idiotic developer is, yes, your editor. The reasoning at the time was simple: videobuf assumed that the underlying device was able to perform scatter/gather DMA operations; the Cafe device was nowhere near so enlightened. The obvious right thing to do was to extend videobuf to handle devices which were limited to contiguous DMA operations; this job was eventually done by Magnus Damm a couple years later. But, for the purposes of getting the cafe_ccic driver going, it simply seemed quicker and easier to implement the needed functionality inside the driver itself.

That decision had a cost beyond the bloating of the driver and the kernel as a whole. Who knows how many other drivers might have benefited from the missing capability in the years before it was finally implemented? An opportunity to better understand (and improve) an important support layer was passed up. As videobuf has improved over the years, the cafe_ccic driver has been stuck with its own, internal implementation which has seen no improvements at all. We ended up with a dead-end, one-off solution instead of a feature that would have been more widely useful.

Clearly, with hindsight, the decision not to improve videobuf was a mistake. In truth, it wasn't even a proper decision; that option was never really considered as a way to solve the problem. Videobuf could not solve the problem at hand, so it was simply eliminated from consideration. The sad fact is that this kind of thinking is rampant in the kernel community - and well beyond. The platform for which a piece of code is being written appears fixed and not amenable to change.

It is not all that hard to see how this kind of mindset can come about. When one develops for a proprietary operating system, the platform is indeed fixed. Many developers have gone through periods of their career where the only alternative was to work around whatever obnoxiousness the target platform might present. It doesn't help that certain layers of the free software stack also seem frustratingly unfixable to those who have to deal with them. Much of the time, there appears to be no alternative to coping with whatever has been provided.

But the truth of the matter is that we have, over the course of many years, managed to create a free operating system for ourselves. That freedom brings many advantages, including the ability to reach across arbitrary module boundaries and fix problems encountered in other parts of the system. We don't have to put up with bugs or inadequate features in the code we use; we can make it work properly instead. That is a valuable freedom that we do not exploit to its fullest.

This is a hard lesson to teach to developers, though. A driver developer with limited time does not want to be told that a bunch of duplicated or workaround code should be deleted and common code improved instead. Indeed, at a kernel summit a few years ago, it was generally agreed that, while such fixes could be requested of developers, to require them as a condition for the merging of a patch was not reasonable. While we can encourage developers to think outside of their specific project, we cannot normally require them to do so.

Beyond that, working on common code can be challenging and intimidating. It may force a developer to move out of his or her comfort zone. Changes to common code tend to attract more attention and are often held to higher standards. There is always the potential of breaking other users of that code. There may simply be the lack of time for - or interest in - developing the wider view of the system which is needed for successful development of common code.

There are no simple solutions to the platform problem. A lot of it comes down to oversight and mentoring; see, for example, the ongoing effort to improve the ARM tree, which has a severe case of this problem. Developers who have supported the idea of bringing more projects together in the same repository also have the platform problem in mind; their goal is to make the lines between projects softer and easier to cross. But, given how often this problem shows up just within the kernel, it's clear that separate repositories are not really the problem. What's really needed is for developers to understand at a deep level that platforms are amenable to change and that one does not have to live with second-rate support.


(Log in to post comments)

Idiotic developer?

Posted May 19, 2011 6:34 UTC (Thu) by cuboci (subscriber, #9641) [Link]

> It is hard to imagine what inspired the idiotic cafe_ccic developer to reinvent that particular wheel.

That doesn't sound like our nice (though sometimes grumpy) editor. Does an arguably less than intelligent idea in a piece of code warrant calling its developer idiotic? I'm kind of disappointed to read something like that in an LWN article.

Idiotic developer?

Posted May 19, 2011 6:37 UTC (Thu) by cuboci (subscriber, #9641) [Link]

lol... I should have read further before posting such a comment. Oh well ;)

Idiotic developer?

Posted May 19, 2011 14:44 UTC (Thu) by rfunk (subscriber, #4054) [Link]

Heh, I read the same line, and before going any further I could guess exactly who that "idiotic developer" was.
(It helps to be familiar with both our editor's sense of humor and the fact that his kernel coding work tends to revolve around video input.)

Idiotic developer?

Posted May 19, 2011 18:36 UTC (Thu) by felixfix (subscriber, #242) [Link]

I had the momentary eyebrow raising at that sentence, followed almost immediately by the thought "I see where this is going" and there it went. No wonder our editor is grumpy, not only being insulted, but finding he has stooped low enough to insult someone!

Idiotic developer?

Posted May 20, 2011 12:17 UTC (Fri) by rvfh (subscriber, #31018) [Link]

Indeed, one should always read the article in full before posting lest they appear as an idiot... which happened to me quite a number of times ;-)

As a French guy likes to say: "better stay silent and look stupid than open your mouth and leave no doubt about it." But I have to say I entirely disagree with him. Sometimes stupid remarks/questions lead to clever and instructive answer for all.

The platform problem

Posted May 19, 2011 7:31 UTC (Thu) by michaeljt (subscriber, #39183) [Link]

In this particular situation, it might have helped to a) keep the interface of the re-implemented functionality as close as possible to the not-quite-suitable original and b) put pointers in the form of comments or whatever into the original code warning about the duplicate, so that anyone extending the original with the required functionality would know to get in touch with our editor with a view to merging the two. (In case there is any doubt, this is a musing for the future, and definitely not criticism of Jon - I have the benefit of his hindsight and his nice article.)

The platform problem

Posted May 19, 2011 9:45 UTC (Thu) by ezyang (subscriber, #62208) [Link]

I have come to the conclusion you simply *must must* be actively working with upstream; it's not simply just a matter of "for the common good", it also can confer critical time savings. Rather than spend a few hours doing experiments on what seems hilariously broken, just go check the source. Yes, sometimes actually getting the fix into upstream can be hard, but sometimes they are quite receptive (and if they think it's important enough, will write the patch for you.)

The platform problem

Posted May 19, 2011 18:07 UTC (Thu) by samroberts (subscriber, #46749) [Link]

To work with upstream requires upstream to work with you. That takes time and energy from them, time and energy they donate for free to help you implement features they might not need.

Whether they are willing to spend time on integrating code for you depends on a lot of factors, but if they have a paid job, and not much spare time, the effort of pushing upstream might not go well. Not all devs are hobbyists who would like nothing more than to work their weekends doing things purely for the intellectual challenge and long-term good of the platform.

The double-edged sword of code reuse

Posted May 19, 2011 13:58 UTC (Thu) by mcoleman (guest, #70990) [Link]

This seems related to the general problem of decide when to reuse existing code (as opposed to rewriting from scratch). Although we learned in school that we should reuse whenever possible, long experience suggests that this is very much a mixed blessing. It can be a serious problem if the reused code isn't entirely under your control, may mutate, may be encountered in multiple incompatible versions, may be missing, there are IP issues, etc.

If you discover a bug in your own code, you can usually correct it in a punctual and coherent way. If the bug is in code that's been reused from elsewhere, it may take years for fixes to become widely available, if you are even able to convince upstream to accept them at all.

In this particular case, it sounds like you personally had the option of making changes "upstream" instead of "downstream", which greatly simplifies the decision. For ordinary coders, though, often the least-bad alternative is to go ahead and implement locally, then dash off a note to the upstream, and watch (possibly over a period of years) to see whether it's incorporated in a stable and usable way at some point.

The platform problem

Posted May 19, 2011 17:11 UTC (Thu) by fuhchee (guest, #40059) [Link]

"the ability to reach across arbitrary module boundaries and fix problems encountered in other parts of the system. [...] It is a valuable freedom that we do not exploit to its fullest."

But maybe that is just as well - meaning that the platform "problem" is an honest trade-off, with nothing to be ashamed of.

The module boundaries represent different pools of expertise. Only a few superstars can make meaningful contributions across many layers & subsystems. Many of those same superstars are strong personalities who would likely clash if they were encouraged to cross-contribute without restraint.

They also represent an opportunity to version-control the system incrementally. It allows release schedules that suit the respective groups. It encourages more vigorous attention to interface stability & compatibility.

In the cafe_ccic case, if Jon had had the wisdom to undertake a videobuf reorganization, is there reason to believe that this could have been done within the various constraints? And even if yes, the harm done by the cafe_ccic being written this way is limited to throwing away some code, and the opportunity cost of not helping a few other drivers. Since the videobuf rework was done only a "few years later", neither cost must have been very high.

The platform problem

Posted May 20, 2011 0:13 UTC (Fri) by neilbrown (subscriber, #359) [Link]

I think this has been touched on indirectly and with different words by other commenters, but I think it is still worth saying...

I think it is important to remember that you cannot reliably generalise from a single example. When you try to write 'generic' code to meet a particular need you may well find out that the code isn't actually useful for anyone else.

I think the best approach is to write your code like a library but don't push it into the 'common area'. Then if someone else has the same need they can copy/paste your library and make it meet their needs. Then a third person can see both proto-libraries, decide that there really is a lot of commonality there, and create a real library in a 'common area' and modify both old drivers and their new driver to use this new library - which is now generalised from three examples, not one.

Of course this assumes two important preconditions:

1/ developers look at other people's code for ideas before writing their own.
2/ The subsystem is structured to allow common functionality to be provided by optional libraries rather than a mandatory midlayer.

The platform problem

Posted May 26, 2011 13:02 UTC (Thu) by Wol (guest, #4433) [Link]

Or, as I try to write code, address MY problem, but put copious comments and hooks if I think it's generic so the next person to come along (often me on a revisit :-) can extend the original code easily.

Cheers,
Wol

Abstracting commonality is a different skill

Posted Jun 7, 2011 13:13 UTC (Tue) by prl (guest, #44893) [Link]

I'll go further - the 3rd person in Neil's example needs to have a different skill set - that of taking in of the commonality and working out what's needed and abstracting and API at the right level.

The first developer is often the wrong person do this - not only because she's only working from one example. Someone new to the code can better analyse the code without the baggage of what the original developer might originally have had in mind.

As in our editor's case the new eye is not infrequently that of the original developer a few years down the line.

The platform problem

Posted May 20, 2011 2:50 UTC (Fri) by TRS-80 (subscriber, #1804) [Link]

Related: Matthew Garret, On platforms.

The platform problem

Posted May 20, 2011 7:25 UTC (Fri) by jezuch (subscriber, #52988) [Link]

I guess the platform problem is so prevalent that it's rarely seen as a separate phenomenon. It's just "the way things are". Everyone has to work with third-party libraries and it's a really rare happy developer that doesn't curse these libraries for brain-damage etc.. Fortunately FOSS is a blessing here. I was recently working on a module based on a well-known caching framework and I was pushing so many fixes to [quite obvious, really] bugs that I was offered direct write access to the repository. Fixing the upstream is so much more joy than piling workarounds in frustration ;)

downstream considerations

Posted May 20, 2011 17:45 UTC (Fri) by dmarti (subscriber, #11625) [Link]

Evan Martin from the Chromium project brought up another good point: "I now see the pain of finding cases where there's been a bug or missing API in a system library or software, we've fixed the bug (and even contributed it upstream), but because distros are so slow to push out updates our users will be stuck with the old version."

The platform problem

Posted Jan 6, 2012 20:04 UTC (Fri) by jrn (subscriber, #64214) [Link]

A fun characterization of this:
> "we have ugly mistakes in the kernel, SO LET'S ADD ANOTHER ONE"
- Linus, http://thread.gmane.org/gmane.linux.kernel/1235602/focus=...

:)

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds