LWN.net Logo

KS2009: End-user panels

By Jonathan Corbet
October 19, 2009
LWN's 2009 Kernel Summit coverage
Kernel developers have a certain tendency to be insulated from many of the people who actually make use of their code. As a way of helping bridge this gap, the kernel summit often invites specific users to address the group and talk about the problems they are having with the kernel. In Tokyo, the summit heard from a panel of five representatives from Japanese companies in both the enterprise and embedded areas. The panel, which was really a series of five independent presentations, shone an interesting light on how Linux is being used in Japan.

First up was Norihiro Kumagai, representing Sharp. Sharp decided a while back to move from its proprietary, in-house realtime operating system to Linux. Its products are generally based on system-on-chip hardware, with drivers coming from the component vendors. That tends to force the use of binary-only drivers. Why does Sharp not just reject components which do not come with free drivers? The answer is that it is still hard to find competitively-priced parts with open drivers.

Sharp's systems run a Linux kernel, but they don't look like traditional Linux systems. To begin with, Sharp has dispensed with shells altogether, choosing instead to boot directly into application code. The main purpose there is to achieve a faster startup. A lot of work has been done on getting the kernel to load quickly; it turns out that there is a tradeoff between the time required to load a kernel (arguing for strong compression) and the time required to extract it (arguing against that compression).

Has Sharp been working with the community to improve boot time? Regardless of any other concerns, there is a real impediment to doing that: Sharp's systems are running a 2.4.20 kernel. That, evidently, is what their supplier (MontaVista) is giving to them.

These systems do a lot with user-space drivers. Much of the code was ported directly from the in-house RTOS, and it was easier to make it work in user space. There is a special char driver which is used to dispatch interrupts to user space. All of Sharp's code runs within the context of a single process; evidently it just works.

Use of filesystems is also minimized. JFFS2 was evaluated, but found to be too slow to mount. The JFFS2 garbage collection thread is too heavyweight, and they also experienced some stability problems with (the 2.4.20 version of) JFFS2. So, instead, the root comes from a read-only cramfs image, and persistent data is managed through direct, user-space access to the MTD device.

Why did Sharp choose Linux? They liked the quality of the core kernel code. The driver support is good, and the quality of the compiler toolchain was unmatched anywhere else. The availability of source code is crucially important; Sharp needs to be able to fix problems on short notice and without relying on anybody else. And Sharp appreciates the ongoing work being done by the development community - even though it is making little use of the community's recent work. In response to a question on whether Sharp had tried a more recent kernel, the answer was that it was usually not possible; the board support is usually not in the mainline so they have to go with what the embedded system provider is offering.

Next up was Ryoichi Sugimura of Panasonic and also a director of the LiMo Foundation. Panasonic, he says, started looking at Linux for digital TV applications around 1996, and for mobile use around 2000. Initially Panasonic looked at sharing its work through the Consumer Electronics Linux Foundation (CELF), but, more recently, it has been contributing code into LiMo instead.

Like others, Panasonic has faced a number of challenges in making Linux work for its products. Reducing memory use was high on the list; it was addressed through the use of execute-in-place technology and more. Startup time is important; it was improved through heavy use of prelinking. Realtime performance was required; to make that happen, Panasonic has employed the realtime preemption patch set. And power consumption is always an issue; one thing that Panasonic did here was to get rid of the periodic timer tick when the processor goes idle.

As mentioned before, Panasonic is now working with LiMo as the outlet for its code contributions. They are having some trouble, though, figuring out how to get code upstream. There is also concern about multicore systems which present some serious development and debugging challenges.

Tim Bird presented as a representative of both Sony and CELF. According to Tim, Linux has achieved world domination everywhere except on desktop systems. It is, he says, the new monopoly, but it's a benevolent monopoly which is less worrisome than many others.

Tim raised a number of "pain points" for Sony. One of those is the "version gap" between what embedded vendors are shipping and the mainline. It is, he says, getting better; just a few years ago, the 2.4 kernel was still being used in new products. Now most companies have moved up until at least 2.6.11, which is still not great. Sony is currently looking at 2.6.29 for products being designed now.

A question was asked: how does the decision on kernel versions get made? It seems there tend to be a lot of internal battles around that decision. Often, though, it is made by default: system-on-chip vendors make a new product, then get one of the embedded Linux vendors to create a kernel for them. That will be the kernel which is available for manufacturers to use. The presence of binary-only drivers can further constrain options in this area.

Patch maintenance is another source of pain. According to Tim, Sony is currently carrying 1029 patches against the 2.6.29 kernels; as was observed by the audience, this is worse than many of the enterprise kernels. The patches break down this way:

637 External features not currently in the mainline, including the realtime tree and the LTTng trace toolkit
164Board support
93 Realtime fixes and tuning patches
68Local features and fixes
34 Internal build system patches
28 Fixes backported from later kernels

It was observed that backports are a relatively small part of the total. What, it was asked, does Sony do about security? The response was relatively vague; in essence, we were told that the security needs were reduced because Sony's devices are closed systems.

Sony would like to get more patches into the mainline, but that proves to be a challenging thing to do. Developers who submit patches are often rewarded by requests for an expanded scope and other work which is only partially related. For example, a patch adding memory notifications to control groups drew a request that the author create a generic event mechanism for the control group subsystem. But embedded developers are often not full-time kernel developers; they have neither the time nor the skills to respond to this kind of request.

So what happens instead is that embedded developers suffer an ongoing barrage of criticism about their lack of contribution. They would like to contribute, but their code tends not to be good enough. The complaints are not fun, but Tim notes that there are fewer complaints than there once were.

Sony wishes that there were fewer barriers to switching versions. One thing that would help a lot would be to merge some of the significant out-of-tree projects, starting with the realtime tree.

Other issues include the growing size of the kernel, though, as Tim notes, "Moore's law saves us." He also acknowledged that the biggest bloat problems are in user space. Boot time is always important to embedded developers, but it has been improving quickly recently. Filesystems need work; UBIFS still must scan the media at mount time, so it is not suitable for a fast-booting system. It seems that the flash filesystem developers know how to solve this problem, but nobody has actually done it yet. Power management is an issue; embedded vendors want their systems to be "mostly asleep" and consuming as little power as possible. Memory management can be a struggle; better ways for notification of and recovery from low-memory situations would be helpful. Video and audio drivers can be problematic, especially in conjunction with the realtime patches. And security is always on the radar - even SMACK is too big for systems like this; SELinux was not even mentioned.

Moving away from embedded systems, Takahiro Itagaki of NTT and PostgreSQL developer, talked about what PostgreSQL would like to see from the kernel. Unlike some other database systems, PostgreSQL does not do any direct I/O. The project's philosophy is that it would like to take advantage of the buffer management done by the kernel and not attempt to rewrite the block layer in user space. As will be seen, this approach has some costs, but it also enables PostgreSQL to be supported on a number of platforms by a relatively small team of developers.

One thing PostgreSQL would like to see is support for low-priority I/O by background tasks. Things like vacuuming the database should run in a way which does not interfere with production work. The biggest problem seems to be that calls to fsync() take the inode mutex, thus blocking most other operations on the file. Even lseek() will block while this is happening. Much of the problem could be solved just by avoiding lseek() and using pread() and pwrite() instead. That, however, is a hard sell; evidently pread() disabled readahead on certain other operating systems. An alternative is to fix lseek(); evidently that has been attempted, but the patch was not accepted.

The other issue for PostgreSQL is duplicated caching between the database and the kernel. Since buffered I/O is being used, any cache kept by PostgreSQL itself risks duplicating data already stored in memory by the kernel. Much of this could be avoided if PostgreSQL were to use mmap() to access its files. But that creates problems in situations where blocks must be written in a specific order - PostgreSQL's write-ahead logging in particular. To avoid this problem, the PostgreSQL developers would like to have a special madvise() operation to tell the kernel not to flush specific blocks to disk. As it turns out, though, this would be an expensive option to implement, so enthusiasm was low.

Linus suggested that the use of mmap() was not the way to go, that it would always be more painful. Chris Mason said that one option could be to avoid writing the commit block to mapped memory entirely until the rest of the data had made it to disk; that would avoid the problems that result if the commit block is written too soon. But Alan Cox warned that disk drives will reorder operations, so there is a need for barriers in any case, and those cannot be done through mapped memory. So a true solution seems elusive.

The final presenter was Kazuhiro Itakura of the Bank of Tokyo-Mitsubishi UFJ, Ltd. His main request was for better support for mirroring in LVM. Current mirroring suffers from the problem that the mirror log goes to one destination only. If that device goes down, the other will go into read-only mode, effectively stopping the system. Mirroring also does not detect a device which simply goes out to lunch without returning an I/O error status. Proprietary Unix systems have evidently done a better job of mirroring for a long time; it would be nice to see this in Linux as well.

As it happens, more robust mirroring can be done now with network-attached storage devices. For locally-attached system, loopback can be used. It is non-trivial to set up, though; it was acknowledged that a better solution is needed.

The panel ended there. It was seen by most as a useful exercise; opportunities for either side to talk to the other in this way are relatively rare. There is always value for kernel developers in hearing where their users are suffering; with any luck at all, the result will be a better kernel for everybody.

Regressions


(Log in to post comments)

KS2009: End-user panels

Posted Oct 20, 2009 14:58 UTC (Tue) by johnflux (subscriber, #58833) [Link]

I didn't get who LiMO are, and why linux kernel patches are being submitted to them...

KS2009: End-user panels

Posted Oct 20, 2009 23:25 UTC (Tue) by tbird20d (subscriber, #1901) [Link]

LIMO manages a Linux distribution for mobile phones. In this regard they could be viewed similar to other Linux distributors. Panasonic (and others) contribute there, because that's essentially their "upstream". For the patches to work into mainline, it would be good, IMHO, for LIMO itself to then subsequently push patches to kernel.org.

KS2009: End-user panels

Posted Oct 22, 2009 7:01 UTC (Thu) by ErikA (guest, #53177) [Link]

Oh boy.
Pondering over all human CPU cycles spent solving the same problem and backporting patches over and over makes my head spin.

Surely there must be a better workflow, solving this problem of not contributing upstream. Or am I just being young and naïve?

KS2009: End-user panels

Posted Oct 24, 2009 17:03 UTC (Sat) by NAR (subscriber, #1313) [Link]

Why did Sharp choose Linux? [...] The availability of source code is crucially important; Sharp needs to be able to fix problems on short notice and without relying on anybody else. [...] whether Sharp had tried a more recent kernel, [...] they have to go with what the embedded system provider is offering.

Is there a contradiction there or am I missing something? Are they hacking on the code on their own or are they using (and relying on) whatever Montavista delivers to them?

KS2009: End-user panels

Posted Oct 26, 2009 3:40 UTC (Mon) by vMeson (subscriber, #45212) [Link]

It's different people making the decisions. They're required to get there source from MV but when
there are critical problems, they need to be able to fix the kernel in a shorter time frame than MV
will turn issues around. Or at least that's the way things work at my office.

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds