Kernel maintenance, Brillo style
Brillo, he said, is a software stack for the Internet of things based on the Android system. These deployments bring a number of challenges, starting with the need to support a different sort of hardware than Android normally runs on; target devices may have no display or input devices, but might well have "fun buses" to drive interesting peripherals. The mix of vendors interested in this area is different; handset vendors are present, but many more traditional embedded vendors can also be found there. Brillo is still in an early state of development.
There are a number of longstanding problems endemic to this area. Each
device has its own special, static kernel version mixing changes from
multiple trees, including the mainline, the common Android tree, and any
vendor-specific trees. Fixes and new features must be backported to this
kernel, and out-of-tree drivers must be carried forward for future
products. As the number of products increases, the number of combinations
of kernels, patch sets, and hardware configurations grows exponentially,
leading to maintenance problems. This growth is relatively manageable
when the problems are small, but one of Brillo's requirements is device support
for at least five years after the last unit is sold. On that sort of time
scale, exponential growth in maintenance issues is simply not sustainable.
The solution to this problem, according to Cook, is a simple matter of making two changes:
- Maintain a single kernel for all systems, reducing patch combinations
and backporting work.
- Keep everything in the mainline kernel, reducing forward-porting work when a new kernel comes along.
Cook allowed as to how those principles might scare some vendors but, he said, if this approach seems too scary, "you're not testing enough."
Brillo is thus built in a single kernel tree containing the Android patches and all necessary vendor patches. This adds an interesting constraint, as it requires the vendors to all play well together with their own patches. These vendor patches should preferably be upstream anyway but, in any case, they must have been sent upstream for consideration. The kernel itself is the latest long-term support kernel from Greg Kroah-Hartman, and it follows the -stable updates as they are released. When a new long-term support kernel comes out, everything moves forward to that release.
Part of making this idea work is reducing the delta between the Brillo kernel and mainline. There are about 600 patches in the Android common kernel currently, Cook said; that has been reduced to less than 150 in the Brillo kernel. That was done by consolidating small patches, tweaking the Android user-space code to not need the patches in the first place, and upstreaming the patches that are easy to get merged.
The upgrade process has been tested once, in the move from the 4.1 to the 4.4 kernel. It went relatively easily and, happily, the list of add-on patches got quite a bit shorter, thanks to the upstreaming of a fair amount of vendor code. It was also possible to drop a whole bunch of backported patches, thanks to the newer kernel. This test may have only been run once so far but, Cook said, it demonstrates that the idea is "not entirely crazy."
For vendors who are afraid of regressions from kernel upgrades, Cook had some advice: get your code upstream. Then, create a better set of automatic tests to verify that everything is working. All vendors should be thinking about just what they fear might break and write tests to detect that when it happens. It is hard work, but it has to be done anyway to verify that things work in the first place; it also only has to be done once. Then perform regular testing on linux-next to catch problems before they end up in the next long-term support kernel.
Will this approach work? He certainly hopes so, he said. Something has to be done to get out of the "backport treadmill" that vendors are on now. Most vendors, he said, have already agreed to this approach, and they are becoming more proactive about upstreaming their code. Some vendors fear the five-year support rule but, for many in the embedded world, five years looks relatively short and doesn't bother them at all. "Handset vendors panic" at the idea, he said, but, in the end, they are going to have to decide between paying the up-front costs of upstreaming their code or the long-term costs of supporting old code for far longer than they have been accustomed to.
[Thanks to LWN subscribers for supporting our travel to the event.]
Index entries for this article | |
---|---|
Conference | Linux Plumbers Conference/2016 |
Posted Nov 17, 2016 10:32 UTC (Thu)
by johnjones (guest, #5462)
[Link]
How about support for a Raspberry Pi otherwise it seems like navel gazing...
Posted Nov 18, 2016 6:52 UTC (Fri)
by tomeu (guest, #64689)
[Link]
Or even better, help kernelci.org find those regressions before they get into a release.
Posted Nov 26, 2016 10:11 UTC (Sat)
by marcH (subscriber, #57642)
[Link]
I'm surprised Kees didn't mention that at least half of this kernel branching model is already in full swing in ChromeOS - a(nother) Google product far beyond its "early" development stage / on the shelves since forever: https://www.google.com/search?q=chromiumos+upstream+first
The other half that isn't there: https://groups.google.com/a/chromium.org/forum/#!msg/chro...
Kernel maintenance, Brillo style
Kernel maintenance, Brillo style
Kernel maintenance, Brillo style