|
|
Subscribe / Log in / New account

An end to uniprocessor configurations

By Jonathan Corbet
June 10, 2025
The Linux kernel famously scales from the smallest of systems to massive servers with thousands of CPUs. It was not always that way, though; the initial version of the kernel could only manage a single processor. That limitation was lifted, obviously, but single-processor machines have always been treated specially in the scheduler. That longstanding situation may soon come to an end, though, if this patch series from Ingo Molnar makes it upstream.

Initially, Linus Torvalds's goal with Linux was simply to get something working; he did not have much time to spare for hardware that he did not personally have. And he had no multiprocessor machine back then — almost nobody did. So, not only did the initial version of the kernel go out with no SMP support, the kernel lacked that support for some years. The 1.0 and 1.2 releases of the kernel, which came out in 1994 and 1995, respectively, only supported uniprocessor machines.

The beginnings of SMP support found their way into the 1.3.31 development release in late 1995; the associated documentation file included the warning: "This is experimental. Back up your disks first. Build only with gcc2.5.8". It took some time for the SMP work to stabilize properly; the dreaded big kernel lock, which ensured that only one CPU was running within the kernel at any time, wasn't even introduced until 1.3.54. But, by the time 2.0 was released in June 1996, Linux worked reasonably well on two-CPU systems, for some workloads, at least.

At that time, though, SMP systems were still relatively rare; most people running Linux did not have one. The majority of Linux users running on uniprocessor systems had little patience for the idea that their systems might be made to run slower in order to support those expensive SMP machines that almost nobody had. The tension between support for users of "big iron" and everybody else ran strong in those days, and a two-CPU system was definitely considered to be big iron.

As a result, the addition of SMP support was done under the condition that it not regress performance on uniprocessor systems. This is a theme that has been seen many times over the history of Linux kernel development. Perhaps most famously, the realtime preemption code was not allowed to slow down non-realtime systems; in the end, realtime preemption brought a lot of improvements for non-realtime systems as well. In the case of SMP, this rule was implemented with a lot of macro magic, #ifdef blocks, and similar techniques.

It is now nearly 30 years after the initial introduction of SMP support into the Linux kernel, and all of that structure that enables the building of special kernels for uniprocessor systems remains, despite the fact that one would have to look hard to find a uniprocessor machine. Machines with a single CPU are now the outlier case; in 2025, we all are big-iron users. Many of the uniprocessor systems that are in use (low-end virtual servers, for example) are likely to be running SMP kernels anyway. Maintaining a separate uniprocessor kernel is usually more trouble than it is worth, and few distributors package them anymore.

As Molnar pointed out in his patch series, there are currently 175 separate #ifdef blocks in the scheduler code that depend on CONFIG_SMP. They add complexity to the scheduler, and the uniprocessor code often breaks because few developers test it. As he put it: "It's rare to see a larger scheduler patch series that doesn't have some sort of build complication on !SMP". It is not at all clear that these costs are justified at this point, given how little use there is of the uniprocessor configuration.

So Molnar proposes that uniprocessor support be removed. The 43-part patch series starts with a set of cleanups designed to make the subsequent surgery easier, then proceeds to remove the uniprocessor versions of the code. Once it is complete, the SMP scheduler is used on all systems, though parts of it (such as load balancing) will never be executed on a machine with a single CPU. Once the work is done, nearly 1,000 lines of legacy code have been removed, and the scheduler is far less of a #ifdef maze than before.

Switching to the SMP kernel will not be free on uniprocessor systems; all that care that was taken with the uniprocessor scheduler did have an effect on its performance. A scheduler benchmark run using the SMP-only kernel on a uniprocessor system showed a roughly 5% performance regression. There is also a 0.3% growth in the size of the kernel text (built with the defconfig x86 configuration) when uniprocessor support is removed. This is a cost that, once upon a time, would have been unacceptable but, in 2025, Molnar said, things have changed:

But at this point I think the burden of proof and the burden of work needs to be reversed: and anyone who cares about UP performance or size should present sensible patches to improve performance/size.

He described the series as "lightly tested", which is not quite the standard one normally wants to see for an invasive scheduler patch; filling out that testing will surely be required before this change can be accepted. But, so far, there have been no objections to the change; there are no uniprocessor users showing up to advocate for keeping their special configuration — yet. Times truly have changed, to the point that it would be surprising if this reversal of priorities didn't make it into the kernel in the relatively near future.

Index entries for this article
KernelScheduler


to post comments

maxcpus=1?

Posted Jun 10, 2025 20:06 UTC (Tue) by marcH (subscriber, #57642) [Link] (2 responses)

This was probably too obvious to even mention but just to be on the safe side... will maxcpus=1 still be available? It's very useful for testing purposes (finding races etc.)

maxcpus=1?

Posted Jun 11, 2025 6:42 UTC (Wed) by tamiko (subscriber, #115350) [Link] (1 responses)

This should continue to work. The patch set is about removing the CONFIG_SMP configuration option.

maxcpus=1?

Posted Jun 11, 2025 9:47 UTC (Wed) by arnd (subscriber, #8866) [Link]

Not even that, the series specifically removes the special case for CONFIG_SMP=n in the scheduler, not anywhere else.

You can still build a kernel with SMP disabled, and it will still use the trivial implementation of per-cpu data, spinlocks, smp barriers etc, which is where most of the performance and size advantages are for non-SMP builds.

There is currently no way to build an SMP kernel for a lot of the older embedded architectures that lack the required CPU instructions or the irqchip for SMP: ARMv5, most MIPS32r2, PowerPC8xx, m68k, SH3/SH4, ARCompact, microblaze, nios2, and xtensa.

Wi-Fi access points and home routers: Single-core and size constraints

Posted Jun 10, 2025 20:08 UTC (Tue) by hailfinger (subscriber, #76962) [Link] (12 responses)

There are quite a few Wi-Fi access points and home routers with only a single CPU core. OpenWrt supports them and gives them a new life even after the manufacturer doesn't care anymore.

A prominent recent example is the TP-Link Archer C6 v2 router/access point with 802.11ac Wi-Fi. At least in Europe, this device is rather popular among OpenWrt users (affordable, reliable, reasonable range, somewhat recent Wi-Fi). It only has a single core and crucially only has 8 MiB flash space where you have to fit a kernel and all userspace software. For that configuration, a size difference of a few kilobytes may be the difference between shipping a standard OpenWrt or a trimmed down version with reduced functionality.

Wi-Fi access points and home routers: Single-core and size constraints

Posted Jun 10, 2025 20:19 UTC (Tue) by mb (subscriber, #50428) [Link] (8 responses)

You can still pick an LTS kernel that doesn't have this change and go one for another couple of years.

There's a cost to maintaining support for features that almost nobody uses.
And at some point the people needing this feature should pay the cost or let it go.
OpenWRT is free to patch a custom small specialized scheduler into their kernel, too. I doubt it's worth it, though.

Wi-Fi access points and home routers: Single-core and size constraints

Posted Jun 10, 2025 20:29 UTC (Tue) by daroc (editor, #160859) [Link] (2 responses)

Also, the kernel will continue to run on uniprocessor machines; scheduling will just be less efficient because it will use the multiprocessor-adapted scheduler.

Wi-Fi access points and home routers: Single-core and size constraints

Posted Jun 11, 2025 8:23 UTC (Wed) by Wol (subscriber, #4433) [Link] (1 responses)

All the more reason for being able to swap schedulers in and out?

Shouldn't a uni-processor scheduler just be an option? If the users can't be bothered to maintain it, it'll bit-rot. And as a compile-time option or whatever, the cost will be borne by the people who use it, which is as it should be.

Cheers,
Wol

Wi-Fi access points and home routers: Single-core and size constraints

Posted Jun 11, 2025 15:39 UTC (Wed) by nix (subscriber, #2304) [Link]

The cost is 5% *during a benchmark that stresses the scheduler*. Almost no uniprocessor use cases do that, because if they stress the scheduler routinely, they are probably overloaded, correspondingly slow at doing any of the jobs they're doing, and are likely to be replaced with something multicore as soon as anyone notices.

The cost here is borne by the maintainers, no matter what. Keeping the uniprocessor scheduler as an option means either keeping the nightmare ifdef maze (no thanks), or *duplicating* the scheduler via unifdef or something, and hoping that changes outside the ifdef maze are maintained in parallel (and usually they have to be maintained or the uniprocessor scheduler will break, hardly anyone will notice, and we're right back where we started). I doubt anyone would be terribly happy with that approach, either...

Wi-Fi access points and home routers: Single-core and size constraints

Posted Jun 11, 2025 14:24 UTC (Wed) by marcH (subscriber, #57642) [Link] (2 responses)

> There's a cost to maintaining support for features that almost nobody uses.
> And at some point the people needing this feature should pay the cost or let it go.

Speaking of which, I don't find the "0.3% size increase" a useful metric... Doesn't this scheduler change have a mostly _fixed_ size cost, mostly independent of how many gazillions drivers you enable?

Wi-Fi access points and home routers: Single-core and size constraints

Posted Jun 11, 2025 15:52 UTC (Wed) by mb (subscriber, #50428) [Link] (1 responses)

That's true. That's why you can find the absolute numbers in the patch description.

Wi-Fi access points and home routers: Single-core and size constraints

Posted Jun 12, 2025 18:25 UTC (Thu) by marcH (subscriber, #57642) [Link]

Thanks, it's about 100K increase with x86-defconfig. Probably smaller for embedded systems.

Wi-Fi access points and home routers: Single-core and size constraints

Posted Jun 11, 2025 18:47 UTC (Wed) by hmh (subscriber, #3838) [Link] (1 responses)

It would be quite enough to delay the merge of UP removal to after the next long-term-support kernel is cut, I think.

Current LTS kernels are going to EOL on 2027 according to kernel.org, which is a bit too close for comfort, IMO.

Wi-Fi access points and home routers: Single-core and size constraints

Posted Jun 13, 2025 7:51 UTC (Fri) by taladar (subscriber, #68407) [Link]

But if you delay that it also effectively means maintaining the uniprocessor ifdef mess for many more years and most likely having backport headaches around the removal of that in newer releases.

LTS users always think of the pain in losing features but not the pain of keeping them for everyone else.

Wi-Fi access points and home routers: Single-core and size constraints

Posted Jun 11, 2025 2:29 UTC (Wed) by PengZheng (subscriber, #108006) [Link]

As an embedded system developer, I saw lots of uni-processor systems, and don't expect this situation to change in the foreseeable future.

Wi-Fi access points and home routers: Single-core and size constraints

Posted Jun 11, 2025 16:16 UTC (Wed) by parametricpoly (subscriber, #143903) [Link] (1 responses)

I think OpenWRT is currently still at kernel 5.15. The latest release is 6.15. It will take a while before they'll adopt this kernel with mandatory SMP support. There are still 3 more recent LTS kernels available to choose from before they must adopt this.

Also that crap < $50 device is almost 7 years old already.

Wi-Fi access points and home routers: Single-core and size constraints

Posted Jun 11, 2025 20:23 UTC (Wed) by mr_bean (subscriber, #5398) [Link]

No, OpenWRT 24.10.1 (the current release) is at 6.6.74 and snapshot is at 6.12

Non-Wintel architecture?

Posted Jun 10, 2025 22:36 UTC (Tue) by Kamilion (guest, #42576) [Link]

I'm confused. How would this affect something like the Rockchip RV1103G2 or RV1106? Those are uniprocessor ARM Cortex designs with a clock speed around 1Ghz. Does it just mean it will be running an SMP kernel with the additional memory usage by the structures? Or does it mean it becomes unsupportable post 6.16/6.17 with this patchset? What about all the M68K systems? riscv32imac?

[Research begins in another tab...]

Seems I've answered my own question. Having a look at their kernel config, I can see that SMP is already enabled in their buildroot.

Okay, yeah, I can get behind the mentality here of "finding a uniprocessor doing something non-trivial in the wild, running tip of mainline linux, is incredibly rare". My luckfox pico pretty much only runs gpsd, and only because that was tremendously saner than trying to write my own firmware image with micropython on a pi pico, for a minimal difference in cost.

I see a *lot* of uniprocessor devices with a SDK along for the ride, typically with an android-derived kernel 5.10 -- so that's pretty distant from tip of mainline, and thus easier to find in the wild.

Optimizing for one CPU

Posted Jun 10, 2025 23:25 UTC (Tue) by iabervon (subscriber, #722) [Link] (11 responses)

I'm a bit surprised, upon thinking about it, that C doesn't have an unsigned 0-bit integer type that you'd use if you set NR_CPUS to 1. It seems to me like a great way of getting code that your compiler can optimize really well: as an lvalue, you can only set it to 0, which doesn't have any effect, and as an rvalue, it's a compile-time constant 0. Then, a ton of code just goes away that would have been necessary if the size wasn't 0.

Optimizing for one CPU

Posted Jun 11, 2025 0:24 UTC (Wed) by wahern (subscriber, #37304) [Link] (1 responses)

What does NR_CPUS == 0 mean?

Optimizing for one CPU

Posted Jun 11, 2025 2:24 UTC (Wed) by iabervon (subscriber, #722) [Link]

I was thinking of NR_CPUS == 1, and I was thinking that the reason this wasn't supported was that cpuids were put in a field that is log(NR_CPUS) bits, which doesn't work in C because you can't have 0-bit fields. But it looks like it's actually fine to have NR_CPUS == 1 and should optimize pretty well, although the compiler probably won't figure out that any two cpuids are the same if NR_CPUS == 1.

Optimizing for one CPU

Posted Jun 11, 2025 10:49 UTC (Wed) by alx.manpages (subscriber, #145117) [Link] (7 responses)

We're talking in the C Committee about adding _BitInt(1) (C23 only supports _BitInt(2) and unsigned _BitInt(1), but not _BitInt(1)). Maybe it would be interesting to discuss adding support for 0 bits as well. I can forward this idea to the committee.

Optimizing for one CPU

Posted Jun 11, 2025 14:28 UTC (Wed) by willy (subscriber, #9762) [Link] (1 responses)

_BitInt(1) would support two values, 0 and -1? I can see the use, but also the confusion. I think there were similar problems with "signed int x:1;"

Optimizing for one CPU

Posted Jun 11, 2025 15:45 UTC (Wed) by alx.manpages (subscriber, #145117) [Link]

Yes, for consistency with 2's complement used elsewhere, the non-zero value in _BitInt(1) must be -1. Consensus seems to be forming that if we standardize it, it should be -1 and not 1.

Optimizing for one CPU

Posted Jun 14, 2025 15:02 UTC (Sat) by quotemstr (subscriber, #45331) [Link] (4 responses)

If you're going to do that, please also consider making void a vacuous value type instead of a special keyword. Simple and surprisingly powerful change. You'd be able to write, say, void x = foo(); bar(x).

Optimizing for one CPU

Posted Jun 14, 2025 17:25 UTC (Sat) by khim (subscriber, #9252) [Link] (3 responses)

> Simple and surprisingly powerful change.

Yes. Powerful enough to destroy countless programs.

You may take a look on Rust: it does gave zero sized types (and Rust's void is also zero-sized tuple). To make that work they had to include tons of checks everywhere.

Simply because now you, suddenly can take double your vector size as many times as you want without ever running out of memory. And do bazillion other similar things.

Plus clang/gcc have already adopted pointer arithmetic for void*, having actual zero-sized type would cause compatibility issues there, too.

Optimizing for one CPU

Posted Jun 14, 2025 18:03 UTC (Sat) by quotemstr (subscriber, #45331) [Link] (1 responses)

Yeah. You're right. The void* arithmetic thing settles it. It's a shame; the void as value C++ paper from a few years ago would have filed off a lot of metaprogramming edges.

Maybe it'd at least make sense to let _Generic work with void as an exception to the general rule against incomplete types?

Optimizing for one CPU

Posted Jun 14, 2025 18:15 UTC (Sat) by alx.manpages (subscriber, #145117) [Link]

_Generic already works with void in GCC 15:

alx@debian:~/tmp$ cat g.c
int
main(void)
{
return _Generic(void, void: 1);
}
alx@debian:~/tmp$ /opt/local/gnu/gcc/countof/bin/gcc -Wall -Wextra g.c
alx@debian:~/tmp$ ./a.out; echo $?
1

Optimizing for one CPU

Posted Jun 14, 2025 20:17 UTC (Sat) by iabervon (subscriber, #722) [Link]

You could just say that a 0-bit integer has enough padding to use up one whole address, so its size is 1. 1-bit integers have enough padding to have an integer size, and you could just say that any type that doesn't need any space for data at all still has to get to size(type) >= 1 by having padding. That would still allow for entirely optimizing out void local variables whose addresses aren't taken, and specifying ABIs such that void arguments are passed in registers without using any registers. And, actually, the clang/gcc pointer arithmetic for void* would make sense if each void item was enough padding to get a new address without any space used to store data.

Optimizing for one CPU

Posted Jun 14, 2025 17:20 UTC (Sat) by khim (subscriber, #9252) [Link]

C doesn't even have zero-sized type and that one is much more useful.

Unfortunately the assumption that any type have size of least one is used in bazillion places, thus it's hard to change that.

A reasonable move

Posted Jun 12, 2025 16:25 UTC (Thu) by wtarreau (subscriber, #51152) [Link] (2 responses)

Indeed I also agree that it's acceptable to lose a bit of performance on older systems for the sake of simplicity and possibly improving support for newer systems. The size increase might be a bit more controversial for some deeply embedded environments, but that's probably not much more than what comes with each new kernel version anyway. The old days of a 500kB compressed kernel image are long gone I think.

Kernel size

Posted Jun 12, 2025 18:18 UTC (Thu) by geert (subscriber, #98403) [Link] (1 responses)

Every new kernel release increases binary kernel size by ca. 25 KiB.

An m68k atari_defconfig kernel gained 32 KiB between v6.15 and v6.16-rc1, of which ca. 10 KiB can be attributed to the console Unicode fixes.

Kernel size

Posted Jun 13, 2025 16:45 UTC (Fri) by andy_shev (subscriber, #75870) [Link]

And 20 releases ca. +1 Mb, from v3.0 to v6.19 will be +4 Mb :-) At least kernel becomes so bif somewhere in v6.x cycle that it makes me to look deeper in the one issue on one of the x86 boards... For curious: https://lore.kernel.org/all/20230830102434.xnlh66omhs6nin...


Copyright © 2025, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds