|
|
Subscribe / Log in / New account

The end of CONFIG_ANDROID

By Jonathan Corbet
July 4, 2022
The kernel has thousands of configuration options, many of which can change the kernel's behavior in subtle or surprising ways. Among those options is CONFIG_ANDROID, which one might expect to be relatively straightforward; its description reads, in its entirety: "Enable support for various drivers needed on the Android platform". It turns out that this option does more than that, to the surprise of some users. That has led to a plan to remove this option, but that has brought a surprise or two of its own — and some disagreement — as well.

The discussion started when Alex Xu reported a read-copy-update (RCU) error that was appearing on his system after resuming from suspend. Shortly thereafter, Xu realized that the problem was tied to the fact that his kernel had been built with CONFIG_ANDROID enabled; among other things, that option significantly reduces the time that can elapse before RCU starts putting out stall warnings. RCU maintainer Paul McKenney was not entirely sympathetic after this was revealed:

And let's face it, the intent and purpose of CONFIG_ANDROID=y is extremely straightforward and unmistakable. So perhaps people not running Android devices but wanting a little bit of the Android functionality should do something other than setting CONFIG_ANDROID=y in their .config files. Me, I am surprised that it took this long for something like this to bite you.

This response comes from a part of the discussion that does not appear directly in the archives, but can be seen quoted in Xu's answer, where he points out that both Debian and Fedora ship kernels with CONFIG_ANDROID enabled, since that is the only way to make the binder module available. Xu suggested that the intent of this option is not quite as "straightforward and unmistakable" as one might think; the one-line description mentions nothing about changing internal RCU timeout values. "If major distro vendors are consistently making this 'mistake', then perhaps the problem is elsewhere".

Christoph Hellwig was quick to show up with a patch removing CONFIG_ANDROID altogether, describing it as "obviously a bad idea". Greg Kroah-Hartman was equally quick to agree and queue the patch for the next merge window. Nobody else objected — until Jason Donenfeld pointed out that this option has other surprising effects, and that removing it could create problems on Android systems.

Specifically, both the random-number generator and the WireGuard VPN tunnel implementation will make changes in response to the system being suspended. The random-number generator will reseed itself after the system resumes, while WireGuard will clear its key material just prior to suspending. Both actions are intended to improve security, but they can be problematic on Android systems due to how power management is handled there. Devices running Android are narcoleptic; they will go to sleep at any opportunity in order to save power. Resetting the random-number generator that frequently is inefficient at best, and clearing the WireGuard keys that often may disrupt communications entirely. To avoid such problems, these actions are not taken if the kernel has been built with CONFIG_ANDROID.

The removal of CONFIG_ANDROID also removes that special behavior; this is a change that, Donenfeld feared, could create regressions in the future. He asked Hellwig for either an assurance that these problems would not result, or for an updated version of the patch that fixed those problems. There followed a not-entirely-pleasant discussion over whose responsibility it was to fix any problems, whether that use of CONFIG_ANDROID was correct, whether the removal constitutes a user-space ABI regression, and so on.

Eventually Kroah-Hartman signaled his agreement with Hellwig. Any problems experienced by Android devices, he said, would be long since found and fixed by the time a patched kernel actually is shipped to such a device, but the change might fix desktop-related problems now. So this change appears to be headed toward the mainline.

At the core of the debate was the use of CONFIG_ANDROID as an indicator that the system will suspend and resume frequently. But, as has been seen, there are many systems with CONFIG_ANDROID enabled that do not exhibit that behavior, but which are getting the related changes anyway. There may also be systems that suspend frequently and that should see that behavior, but which are not running Android and do not have CONFIG_ANDROID enabled. The consensus seems to be that using CONFIG_ANDROID to regulate RCU stall timeouts or cryptographic power-management responses is a bug that is in need of fixing.

So, for the purposes of random numbers and WireGuard, some other way to indicate that the system will suspend frequently will be needed. There was talk of a new configuration option or a sysfs knob that could be written to from user space, which would allow the behavior to be changed at run time. Your editor's suggestion that the kernel could observe actual suspend behavior and do the right thing on its own was fairly quickly dismissed.

What will happen instead, it seems, is the addition of a new configuration option, called CONFIG_PM_USERSPACE_AUTOSLEEP, that prepares the kernel for a system that will suspend frequently and enables the (formerly) Android-specific behavior. This option has a more extensive help text describing what it actually does and warning that it "should not be enabled just for fun". The necessary Android changes have already been created, and this appears to be a solution that everybody involved is happy with.

The way this solution came about could have been better, though. The kernel community works best when developers work together toward a common goal rather than argue over who is doing things incorrectly. That did eventually happen here, but it took some time to get to that point. It took multiple developers to endow CONFIG_ANDROID with its somewhat confusing semantics; it is unsurprising that it took more than one person to straighten it out.

Index entries for this article
KernelAndroid
KernelBuild system/Kernel configuration


to post comments

The end of CONFIG_ANDROID

Posted Jul 4, 2022 18:27 UTC (Mon) by developer122 (guest, #152928) [Link] (7 responses)

Not to compound the peanut-gallery-ing, but couldn't you achieve forward secrecy in wireguard anyway?

Just do the clearing of keys before every suspend like you normally should, while tracking suspends.
If the last several suspends have been happening too frequently, then do the android thing and stop clearing keys (at least for a while).

The end of CONFIG_ANDROID

Posted Jul 4, 2022 18:51 UTC (Mon) by ttuttle (subscriber, #51118) [Link] (3 responses)

Would that not increase latency as Wireguard has to negotiate a new key on resume?

The end of CONFIG_ANDROID

Posted Jul 4, 2022 21:12 UTC (Mon) by developer122 (guest, #152928) [Link] (2 responses)

It already does it on non-android systems. There'd be some interruption caused on android devices until the kernel realized it was being suspended frequently, at which point it would start ignoring the suspends (the android behavior).

The end of CONFIG_ANDROID

Posted Jul 4, 2022 21:16 UTC (Mon) by developer122 (guest, #152928) [Link]

The one caveat to this I can think of is maybe this is a hypothetical attack vector? Like maybe an attacker could prevent key-switching by suspending the device (a laptop?) frequently?

Though this starts to sound at least a little like a physical-access scenario.

The end of CONFIG_ANDROID

Posted Jul 5, 2022 5:06 UTC (Tue) by zdzichu (subscriber, #17118) [Link]

Actually, it doesn't. It is supposed to, but as mentioned in the article, Fedora and Debian turn on CONFIG_ANDROID. So in the real world, majority of users get Android-like behaviour. Wireguard's gimnastics with key-cleaning is not executed.

The end of CONFIG_ANDROID

Posted Jul 5, 2022 5:02 UTC (Tue) by jmspeex (subscriber, #51639) [Link] (2 responses)

It seems to me like you would then have the issue that an attacker could "trick" the system into the less secure mode just by causing a few short sleeps. You could try to mitigate that too, but then the odds are it would come back to bite you at some point in the future.

The end of CONFIG_ANDROID

Posted Jul 5, 2022 13:08 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

With reports of bluetooth "bouncing" machines in and out of sleep, this might happen more than one might assume at first.

The end of CONFIG_ANDROID

Posted Jul 5, 2022 14:32 UTC (Tue) by zx2c4 (subscriber, #82519) [Link]

This is the correct answer indeed.

FYI, before Corbet asked about this in that thread, somebody posed the same question with a bit more thoughtfulness on the WireGuard mailing list:

https://lore.kernel.org/wireguard/CAHmME9p2OYSTX2C5M0faKt...

If various people in these comments think they have worthwhile alternative ideas for this, that might be a good thread to jump in on and propose patches or different algorithms or whatever else.

The end of CONFIG_ANDROID

Posted Jul 4, 2022 19:10 UTC (Mon) by flussence (guest, #85566) [Link]

I thought this was what the existing CONFIG_PM_AUTOSLEEP and wakelock system was for, but it sounds like Android doesn't even use it, or else these codepaths would already be using it too. In that case, what's it good for?

The end of CONFIG_ANDROID

Posted Jul 5, 2022 2:23 UTC (Tue) by pabs (subscriber, #43278) [Link] (6 responses)

Now that there are mobile devices running Linux based non-Android distros and people using them that probably want frequent suspends to reduce power usage there but the corresponding distros won't be enabling CONFIG_PM_USERSPACE_AUTOSLEEP in their main builds that are used on desktops and servers, does that mean distros need a second Linux kernel build just for mobile devices that want the behaviour?

It seems like runtime configuration of this behaviour, or simply the automatic handling of too-frequent suspends proposed in comments above, are the more correct solutions to this problem.

The end of CONFIG_ANDROID

Posted Jul 5, 2022 7:56 UTC (Tue) by tonyblackwell (guest, #43641) [Link]

"It seems like runtime configuration of this behaviour, or simply the automatic handling of too-frequent suspends proposed in comments above, are the more correct solutions to this problem."

...which is possibly along the lines of what Jonathan had in mind, but it didn't get up then.

The end of CONFIG_ANDROID

Posted Jul 5, 2022 14:37 UTC (Tue) by zx2c4 (subscriber, #82519) [Link] (2 responses)

> It seems like runtime configuration of this behaviour, or simply the automatic handling of too-frequent suspends proposed in comments above, are the more correct solutions to this problem.

I suggested a runtime configuration knob and wrote a patch for it, but the Android developers didn't want to add new ABI and felt a compile time knob would be sufficient. It didn't really make a difference to me, so I said okay, and wrote a sample patch for a compile time knob too. But if somebody else has an expanded use case, this is a pretty darn easy thing to change into a runtime switch later.

As for your repeating of Corbet's comment that this is handled automatically, see https://lore.kernel.org/wireguard/CAHmME9p2OYSTX2C5M0faKt... for a thread where somebody asked the same thing, and we took it to its logical conclusion which is having laptops wake up in backpacks. If you have better ideas, though, feel free to reply on that mailing list and I'd be very happy to consider good alternatives.

The end of CONFIG_ANDROID

Posted Jul 5, 2022 14:56 UTC (Tue) by pabs (subscriber, #43278) [Link] (1 responses)

I think you have misinterpreted the suggestion in the second last post in the thread. They were not suggesting that expiry of the time period cause wakeups, but instead that when something else caused a wakeup, then before using the keys again, check if the minimum time has passed and do the key expiry and renegotiation then. Doing it that way would not cause laptops to wake up in backpacks. This does mean that keys stay around across suspend and maybe that is not desirable though?

The end of CONFIG_ANDROID

Posted Jul 5, 2022 14:59 UTC (Tue) by zx2c4 (subscriber, #82519) [Link]

> This does mean that keys stay around across suspend and maybe that is not desirable though?

Yes. You're repeating the same exercise as the thread I just linked.

One Kernel For All

Posted Jul 11, 2022 15:34 UTC (Mon) by fratti (guest, #105722) [Link] (1 responses)

> Now that there are mobile devices running Linux based non-Android distros and people using them that probably want frequent suspends to reduce power usage there but the corresponding distros won't be enabling CONFIG_PM_USERSPACE_AUTOSLEEP in their main builds that are used on desktops and servers, does that mean distros need a second Linux kernel build just for mobile devices that want the behaviour?

I don't see an issue in requiring distros that ship mobile GUI frontends to also ship their own kernel. You probably already don't want to ship your server's kernel (see preemption settings) and you're going to be shipping completely different images for desktop vs. ARM handset already anyway. Asking a distro to have a second kernel package for an entirely different class of device really isn't that big of a deal, especially when said kernel likely needs certain things for powersaving and Android compatibility tuned for it.

For what it's worth, my experience in the ARM ecosystem is that we'll never get to have one kernel that fits all use cases there simply due to all the erratas and different device requirements.

One Kernel For All

Posted Jul 12, 2022 1:20 UTC (Tue) by pabs (subscriber, #43278) [Link]

We are already there for 64-bit ARM and 32-bit ARMv7, distros only ship one mainline Linux kernel image for all devices. Of course for non-mainline devices its a different situation, you need one Linux kernel image per device, but most generic distros outside the hardware-specific distros do not support non-mainline devices.

I very much doubt distros would add a mobile variant just for one config option. Distros do not want to have multiple images for different use-cases, they want one Linux kernel image that boots on any hardware and works for any use-case. I think they might accept a boot-time option though if automatic behaviour selection at boottime or runtime really isn't going to be implemented, a boot option is still a bit of a hack though.

At least that is the situation for Debian arm64 and armhf. For arm64 it has a single image for devices and also a -cloud image that contains a cut-down image with only cloud related config to reduce the image size and boot time. For armhf Debian ships two "multiplatform" kernel flavours; with and without LPAE. Also -rt images for both because RT isn't able to be enabled/disabled at boot or runtime.

https://www.debian.org/releases/stable/arm64/ch02s01.en.h...
https://wiki.debian.org/DebianKernel/ARMMP

$ apt-cache showsrc linux-signed-arm64 | grep '^ linux-image'
linux-image-5.18.0-2-arm64 deb kernel optional arch=arm64
linux-image-5.18.0-2-cloud-arm64 deb kernel optional arch=arm64
linux-image-5.18.0-2-rt-arm64 deb kernel optional arch=arm64
linux-image-arm64 deb kernel optional arch=arm64
linux-image-cloud-arm64 deb kernel optional arch=arm64
linux-image-rt-arm64 deb kernel optional arch=arm64
$ apt-cache showsrc linux | grep 'arch=armhf' | grep '^ linux-image' | grep -v -- '-dbg'
linux-image-5.18.0-2-armmp deb kernel optional arch=armhf profile=!pkg.linux.nokernel,!pkg.linux.quick,!stage1
linux-image-5.18.0-2-armmp-lpae deb kernel optional arch=armhf profile=!pkg.linux.nokernel,!pkg.linux.quick,!stage1
linux-image-5.18.0-2-rt-armmp deb kernel optional arch=armhf profile=!pkg.linux.nokernel,!pkg.linux.quick,!stage1
linux-image-armmp deb kernel optional arch=armhf profile=!pkg.linux.nokernel,!pkg.linux.nometa,!pkg.linux.quick,!stage1
linux-image-armmp-lpae deb kernel optional arch=armhf profile=!pkg.linux.nokernel,!pkg.linux.nometa,!pkg.linux.quick,!stage1
linux-image-rt-armmp deb kernel optional arch=armhf profile=!pkg.linux.nokernel,!pkg.linux.nometa,!pkg.linux.quick,!stage1

Needless editorializing

Posted Jul 5, 2022 14:27 UTC (Tue) by zx2c4 (subscriber, #82519) [Link] (1 responses)

> The way this solution came about could have been better, though. The kernel community works best when developers work together toward a common goal rather than argue over who is doing things incorrectly. That did eventually happen here, but it took some time to get to that point.

This commentary is hyperbolic and isn't supported by the actual timeline when you open up the thread and look at the time stamps.

It begins here, <https://lwn.net/ml/linux-kernel/20220629150102.1582425-2-...>, Wed, 29 Jun 2022 17:01:02 +0200. I immediately asked, "will this break my stuff? If not, can you mention why in the commit message?" Christoph didn't like this, and we went back in forth to the tune of "please don't intentionally break my stuff, or if you're not, say why not; I haven't researched it yet myself", "no, your situation is BS, I'll break what I want", "that's not helpful", "why should I have to explain myself", and so on. My last message to Christoph was at Wed, 29 Jun 2022 19:42:28 +0200, at which point he ducked out. That span comprised basically the entire thing related to Corbet's statement, which you can read again:

> The way this solution came about could have been better, though. The kernel community works best when developers work together toward a common goal rather than argue over who is doing things incorrectly. That did eventually happen here, but it took some time to get to that point.

Meanwhile, my entreaties not to intentionally break my stuff and some mentions of various Android source files garnered the attention of Android developers, and Kalesh jumped in at Wed, 29 Jun 2022 12:05:23 -0700. We had a discussion on whether this should be a runtime knob or a compile time switch. I wrote inline sample patches for each method, just so there'd be something concrete to work with and discuss, and mentioned whichever one they want is fine. There was some discussion on describing a device versus describing behavior. And then by the time I woke up in the morning, we had a viable patch (based on one of my earlier inline samples) for the kernel, and then for Android, and then Greg took the patches, and the whole thing was done.

So... I don't think Corbet's editorializing is even minutely correct:

> The way this solution came about could have been better, though. The kernel community works best when developers work together toward a common goal rather than argue over who is doing things incorrectly. That did eventually happen here, but it took some time to get to that point.

Alex raised the initial issue. Christoph wrote a patch for it. That patch broke my stuff, and Christoph didn't want to work on that. So he ducked out, and I took the baton and worked with the Android developers. And this all happened in the span of a day or two. This sounds a lot more like kernel developers working together toward a common goal and accomplishing something in a short span of time. Yes, there was some minor argument in there. But it didn't seem to affect anybody actually getting anything done in a swift and timely manner.

So while _maybe_ there's some minorly newsworthy content (one config knob is gone, now there's another, interesting forward secrecy considerations in the process), the dramatization and fingerwagging in this piece seems unnecessary.

Needless editorializing

Posted Jul 8, 2022 4:52 UTC (Fri) by marcH (subscriber, #57642) [Link]

Not commenting on the dramatization part but on the pure content I found it both educational and entertaining that CONFIG_ANDROID 1. existed 2. did random things totally unrelated to each other 3. is used in very popular distros.

I already knew that Kconfig is a mess (less than in all other projects but still) but I really didn't expect something that bad to exist. Now I have been warned, thanks!


Copyright © 2022, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds