The end of CONFIG_ANDROID
Enable support for various drivers needed on the Android platform". It turns out that this option does more than that, to the surprise of some users. That has led to a plan to remove this option, but that has brought a surprise or two of its own — and some disagreement — as well.
The discussion started when Alex Xu reported a read-copy-update (RCU) error that was appearing on his system after resuming from suspend. Shortly thereafter, Xu realized that the problem was tied to the fact that his kernel had been built with CONFIG_ANDROID enabled; among other things, that option significantly reduces the time that can elapse before RCU starts putting out stall warnings. RCU maintainer Paul McKenney was not entirely sympathetic after this was revealed:
And let's face it, the intent and purpose of CONFIG_ANDROID=y is extremely straightforward and unmistakable. So perhaps people not running Android devices but wanting a little bit of the Android functionality should do something other than setting CONFIG_ANDROID=y in their .config files. Me, I am surprised that it took this long for something like this to bite you.
This response comes from a part of the discussion that does not appear
directly in the archives, but can be seen quoted in Xu's
answer, where he points out that both Debian and Fedora ship kernels
with CONFIG_ANDROID enabled, since that is the only way to make
the binder
module available. Xu suggested that the intent of this option is not quite as
"straightforward and unmistakable
" as one might think; the one-line
description mentions nothing about changing internal RCU timeout values.
"If major distro vendors are consistently making this 'mistake', then
perhaps the problem is elsewhere
".
Christoph Hellwig was quick to show up with a patch
removing CONFIG_ANDROID altogether, describing it as "obviously
a bad idea
". Greg Kroah-Hartman was equally quick to agree and queue
the patch for the next merge window. Nobody else objected — until Jason
Donenfeld pointed
out that this option has other surprising effects, and that removing it
could create problems on Android systems.
Specifically, both the random-number generator and the WireGuard VPN tunnel implementation will make changes in response to the system being suspended. The random-number generator will reseed itself after the system resumes, while WireGuard will clear its key material just prior to suspending. Both actions are intended to improve security, but they can be problematic on Android systems due to how power management is handled there. Devices running Android are narcoleptic; they will go to sleep at any opportunity in order to save power. Resetting the random-number generator that frequently is inefficient at best, and clearing the WireGuard keys that often may disrupt communications entirely. To avoid such problems, these actions are not taken if the kernel has been built with CONFIG_ANDROID.
The removal of CONFIG_ANDROID also removes that special behavior; this is a change that, Donenfeld feared, could create regressions in the future. He asked Hellwig for either an assurance that these problems would not result, or for an updated version of the patch that fixed those problems. There followed a not-entirely-pleasant discussion over whose responsibility it was to fix any problems, whether that use of CONFIG_ANDROID was correct, whether the removal constitutes a user-space ABI regression, and so on.
Eventually Kroah-Hartman signaled his agreement with Hellwig. Any problems experienced by Android devices, he said, would be long since found and fixed by the time a patched kernel actually is shipped to such a device, but the change might fix desktop-related problems now. So this change appears to be headed toward the mainline.
At the core of the debate was the use of CONFIG_ANDROID as an indicator that the system will suspend and resume frequently. But, as has been seen, there are many systems with CONFIG_ANDROID enabled that do not exhibit that behavior, but which are getting the related changes anyway. There may also be systems that suspend frequently and that should see that behavior, but which are not running Android and do not have CONFIG_ANDROID enabled. The consensus seems to be that using CONFIG_ANDROID to regulate RCU stall timeouts or cryptographic power-management responses is a bug that is in need of fixing.
So, for the purposes of random numbers and WireGuard, some other way to indicate that the system will suspend frequently will be needed. There was talk of a new configuration option or a sysfs knob that could be written to from user space, which would allow the behavior to be changed at run time. Your editor's suggestion that the kernel could observe actual suspend behavior and do the right thing on its own was fairly quickly dismissed.
What will happen instead, it seems, is the addition of a
new configuration option, called CONFIG_PM_USERSPACE_AUTOSLEEP,
that prepares the kernel for a system that will suspend frequently and
enables the (formerly) Android-specific behavior. This option has a more
extensive help text describing what it actually does and warning that it
"should not be enabled just for fun
". The necessary Android changes
have already been created, and this appears to be a solution that everybody
involved is happy with.
The way this solution came about could have been better, though. The
kernel community works best when developers work together toward a common
goal rather than argue over who is doing things incorrectly. That did
eventually happen here, but it took some time to get to
that point. It took multiple developers to endow CONFIG_ANDROID
with its somewhat confusing semantics; it is unsurprising that it took more
than one person to straighten it out.
Index entries for this article | |
---|---|
Kernel | Android |
Kernel | Build system/Kernel configuration |
Posted Jul 4, 2022 18:27 UTC (Mon)
by developer122 (guest, #152928)
[Link] (7 responses)
Just do the clearing of keys before every suspend like you normally should, while tracking suspends.
Posted Jul 4, 2022 18:51 UTC (Mon)
by ttuttle (subscriber, #51118)
[Link] (3 responses)
Posted Jul 4, 2022 21:12 UTC (Mon)
by developer122 (guest, #152928)
[Link] (2 responses)
Posted Jul 4, 2022 21:16 UTC (Mon)
by developer122 (guest, #152928)
[Link]
Though this starts to sound at least a little like a physical-access scenario.
Posted Jul 5, 2022 5:06 UTC (Tue)
by zdzichu (subscriber, #17118)
[Link]
Posted Jul 5, 2022 5:02 UTC (Tue)
by jmspeex (subscriber, #51639)
[Link] (2 responses)
Posted Jul 5, 2022 13:08 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link]
Posted Jul 5, 2022 14:32 UTC (Tue)
by zx2c4 (subscriber, #82519)
[Link]
FYI, before Corbet asked about this in that thread, somebody posed the same question with a bit more thoughtfulness on the WireGuard mailing list:
https://lore.kernel.org/wireguard/CAHmME9p2OYSTX2C5M0faKt...
If various people in these comments think they have worthwhile alternative ideas for this, that might be a good thread to jump in on and propose patches or different algorithms or whatever else.
Posted Jul 4, 2022 19:10 UTC (Mon)
by flussence (guest, #85566)
[Link]
Posted Jul 5, 2022 2:23 UTC (Tue)
by pabs (subscriber, #43278)
[Link] (6 responses)
It seems like runtime configuration of this behaviour, or simply the automatic handling of too-frequent suspends proposed in comments above, are the more correct solutions to this problem.
Posted Jul 5, 2022 7:56 UTC (Tue)
by tonyblackwell (guest, #43641)
[Link]
...which is possibly along the lines of what Jonathan had in mind, but it didn't get up then.
Posted Jul 5, 2022 14:37 UTC (Tue)
by zx2c4 (subscriber, #82519)
[Link] (2 responses)
I suggested a runtime configuration knob and wrote a patch for it, but the Android developers didn't want to add new ABI and felt a compile time knob would be sufficient. It didn't really make a difference to me, so I said okay, and wrote a sample patch for a compile time knob too. But if somebody else has an expanded use case, this is a pretty darn easy thing to change into a runtime switch later.
As for your repeating of Corbet's comment that this is handled automatically, see https://lore.kernel.org/wireguard/CAHmME9p2OYSTX2C5M0faKt... for a thread where somebody asked the same thing, and we took it to its logical conclusion which is having laptops wake up in backpacks. If you have better ideas, though, feel free to reply on that mailing list and I'd be very happy to consider good alternatives.
Posted Jul 5, 2022 14:56 UTC (Tue)
by pabs (subscriber, #43278)
[Link] (1 responses)
Posted Jul 5, 2022 14:59 UTC (Tue)
by zx2c4 (subscriber, #82519)
[Link]
Yes. You're repeating the same exercise as the thread I just linked.
Posted Jul 11, 2022 15:34 UTC (Mon)
by fratti (guest, #105722)
[Link] (1 responses)
I don't see an issue in requiring distros that ship mobile GUI frontends to also ship their own kernel. You probably already don't want to ship your server's kernel (see preemption settings) and you're going to be shipping completely different images for desktop vs. ARM handset already anyway. Asking a distro to have a second kernel package for an entirely different class of device really isn't that big of a deal, especially when said kernel likely needs certain things for powersaving and Android compatibility tuned for it.
For what it's worth, my experience in the ARM ecosystem is that we'll never get to have one kernel that fits all use cases there simply due to all the erratas and different device requirements.
Posted Jul 12, 2022 1:20 UTC (Tue)
by pabs (subscriber, #43278)
[Link]
I very much doubt distros would add a mobile variant just for one config option. Distros do not want to have multiple images for different use-cases, they want one Linux kernel image that boots on any hardware and works for any use-case. I think they might accept a boot-time option though if automatic behaviour selection at boottime or runtime really isn't going to be implemented, a boot option is still a bit of a hack though.
At least that is the situation for Debian arm64 and armhf. For arm64 it has a single image for devices and also a -cloud image that contains a cut-down image with only cloud related config to reduce the image size and boot time. For armhf Debian ships two "multiplatform" kernel flavours; with and without LPAE. Also -rt images for both because RT isn't able to be enabled/disabled at boot or runtime.
https://www.debian.org/releases/stable/arm64/ch02s01.en.h...
$ apt-cache showsrc linux-signed-arm64 | grep '^ linux-image'
Posted Jul 5, 2022 14:27 UTC (Tue)
by zx2c4 (subscriber, #82519)
[Link] (1 responses)
This commentary is hyperbolic and isn't supported by the actual timeline when you open up the thread and look at the time stamps.
It begins here, <https://lwn.net/ml/linux-kernel/20220629150102.1582425-2-...>, Wed, 29 Jun 2022 17:01:02 +0200. I immediately asked, "will this break my stuff? If not, can you mention why in the commit message?" Christoph didn't like this, and we went back in forth to the tune of "please don't intentionally break my stuff, or if you're not, say why not; I haven't researched it yet myself", "no, your situation is BS, I'll break what I want", "that's not helpful", "why should I have to explain myself", and so on. My last message to Christoph was at Wed, 29 Jun 2022 19:42:28 +0200, at which point he ducked out. That span comprised basically the entire thing related to Corbet's statement, which you can read again:
> The way this solution came about could have been better, though. The kernel community works best when developers work together toward a common goal rather than argue over who is doing things incorrectly. That did eventually happen here, but it took some time to get to that point.
Meanwhile, my entreaties not to intentionally break my stuff and some mentions of various Android source files garnered the attention of Android developers, and Kalesh jumped in at Wed, 29 Jun 2022 12:05:23 -0700. We had a discussion on whether this should be a runtime knob or a compile time switch. I wrote inline sample patches for each method, just so there'd be something concrete to work with and discuss, and mentioned whichever one they want is fine. There was some discussion on describing a device versus describing behavior. And then by the time I woke up in the morning, we had a viable patch (based on one of my earlier inline samples) for the kernel, and then for Android, and then Greg took the patches, and the whole thing was done.
So... I don't think Corbet's editorializing is even minutely correct:
> The way this solution came about could have been better, though. The kernel community works best when developers work together toward a common goal rather than argue over who is doing things incorrectly. That did eventually happen here, but it took some time to get to that point.
Alex raised the initial issue. Christoph wrote a patch for it. That patch broke my stuff, and Christoph didn't want to work on that. So he ducked out, and I took the baton and worked with the Android developers. And this all happened in the span of a day or two. This sounds a lot more like kernel developers working together toward a common goal and accomplishing something in a short span of time. Yes, there was some minor argument in there. But it didn't seem to affect anybody actually getting anything done in a swift and timely manner.
So while _maybe_ there's some minorly newsworthy content (one config knob is gone, now there's another, interesting forward secrecy considerations in the process), the dramatization and fingerwagging in this piece seems unnecessary.
Posted Jul 8, 2022 4:52 UTC (Fri)
by marcH (subscriber, #57642)
[Link]
I already knew that Kconfig is a mess (less than in all other projects but still) but I really didn't expect something that bad to exist. Now I have been warned, thanks!
The end of CONFIG_ANDROID
If the last several suspends have been happening too frequently, then do the android thing and stop clearing keys (at least for a while).
The end of CONFIG_ANDROID
The end of CONFIG_ANDROID
The end of CONFIG_ANDROID
The end of CONFIG_ANDROID
The end of CONFIG_ANDROID
The end of CONFIG_ANDROID
The end of CONFIG_ANDROID
The end of CONFIG_ANDROID
The end of CONFIG_ANDROID
The end of CONFIG_ANDROID
The end of CONFIG_ANDROID
The end of CONFIG_ANDROID
The end of CONFIG_ANDROID
One Kernel For All
One Kernel For All
https://wiki.debian.org/DebianKernel/ARMMP
linux-image-5.18.0-2-arm64 deb kernel optional arch=arm64
linux-image-5.18.0-2-cloud-arm64 deb kernel optional arch=arm64
linux-image-5.18.0-2-rt-arm64 deb kernel optional arch=arm64
linux-image-arm64 deb kernel optional arch=arm64
linux-image-cloud-arm64 deb kernel optional arch=arm64
linux-image-rt-arm64 deb kernel optional arch=arm64
$ apt-cache showsrc linux | grep 'arch=armhf' | grep '^ linux-image' | grep -v -- '-dbg'
linux-image-5.18.0-2-armmp deb kernel optional arch=armhf profile=!pkg.linux.nokernel,!pkg.linux.quick,!stage1
linux-image-5.18.0-2-armmp-lpae deb kernel optional arch=armhf profile=!pkg.linux.nokernel,!pkg.linux.quick,!stage1
linux-image-5.18.0-2-rt-armmp deb kernel optional arch=armhf profile=!pkg.linux.nokernel,!pkg.linux.quick,!stage1
linux-image-armmp deb kernel optional arch=armhf profile=!pkg.linux.nokernel,!pkg.linux.nometa,!pkg.linux.quick,!stage1
linux-image-armmp-lpae deb kernel optional arch=armhf profile=!pkg.linux.nokernel,!pkg.linux.nometa,!pkg.linux.quick,!stage1
linux-image-rt-armmp deb kernel optional arch=armhf profile=!pkg.linux.nokernel,!pkg.linux.nometa,!pkg.linux.quick,!stage1
Needless editorializing
Needless editorializing