An end to uniprocessor configurations
Initially, Linus Torvalds's goal with Linux was simply to get something working; he did not have much time to spare for hardware that he did not personally have. And he had no multiprocessor machine back then — almost nobody did. So, not only did the initial version of the kernel go out with no SMP support, the kernel lacked that support for some years. The 1.0 and 1.2 releases of the kernel, which came out in 1994 and 1995, respectively, only supported uniprocessor machines.
The beginnings of SMP support found their way into the 1.3.31 development
release in late 1995; the associated documentation
file included the warning: "This is experimental. Back up your disks
first. Build only with gcc2.5.8
". It took some time for the SMP work
to stabilize properly; the dreaded big kernel
lock, which ensured that only one CPU was running within the kernel at
any time, wasn't even introduced until 1.3.54. But, by the time 2.0 was
released in June 1996, Linux worked reasonably well on two-CPU systems, for
some workloads, at least.
At that time, though, SMP systems were still relatively rare; most people running Linux did not have one. The majority of Linux users running on uniprocessor systems had little patience for the idea that their systems might be made to run slower in order to support those expensive SMP machines that almost nobody had. The tension between support for users of "big iron" and everybody else ran strong in those days, and a two-CPU system was definitely considered to be big iron.
As a result, the addition of SMP support was done under the condition that it not regress performance on uniprocessor systems. This is a theme that has been seen many times over the history of Linux kernel development. Perhaps most famously, the realtime preemption code was not allowed to slow down non-realtime systems; in the end, realtime preemption brought a lot of improvements for non-realtime systems as well. In the case of SMP, this rule was implemented with a lot of macro magic, #ifdef blocks, and similar techniques.
It is now nearly 30 years after the initial introduction of SMP support into the Linux kernel, and all of that structure that enables the building of special kernels for uniprocessor systems remains, despite the fact that one would have to look hard to find a uniprocessor machine. Machines with a single CPU are now the outlier case; in 2025, we all are big-iron users. Many of the uniprocessor systems that are in use (low-end virtual servers, for example) are likely to be running SMP kernels anyway. Maintaining a separate uniprocessor kernel is usually more trouble than it is worth, and few distributors package them anymore.
As Molnar pointed out in his patch series, there are currently 175 separate
#ifdef blocks in the scheduler code that depend on
CONFIG_SMP. They add complexity to the scheduler, and the
uniprocessor code often breaks because few developers test it. As he
put it: "It's rare to see a larger scheduler patch series that doesn't
have some sort of build complication on !SMP
". It is not at all clear
that these costs are justified at this point, given how little use there is
of the uniprocessor configuration.
So Molnar proposes that uniprocessor support be removed. The 43-part patch series starts with a set of cleanups designed to make the subsequent surgery easier, then proceeds to remove the uniprocessor versions of the code. Once it is complete, the SMP scheduler is used on all systems, though parts of it (such as load balancing) will never be executed on a machine with a single CPU. Once the work is done, nearly 1,000 lines of legacy code have been removed, and the scheduler is far less of a #ifdef maze than before.
Switching to the SMP kernel will not be free on uniprocessor systems; all that care that was taken with the uniprocessor scheduler did have an effect on its performance. A scheduler benchmark run using the SMP-only kernel on a uniprocessor system showed a roughly 5% performance regression. There is also a 0.3% growth in the size of the kernel text (built with the defconfig x86 configuration) when uniprocessor support is removed. This is a cost that, once upon a time, would have been unacceptable but, in 2025, Molnar said, things have changed:
But at this point I think the burden of proof and the burden of work needs to be reversed: and anyone who cares about UP performance or size should present sensible patches to improve performance/size.
He described the series as "lightly tested
", which is not quite the
standard one normally wants to see for an invasive scheduler patch; filling
out that testing will surely be required before this change can be
accepted. But, so far, there have been no objections to the change; there
are no uniprocessor users showing up to advocate for keeping their special
configuration — yet. Times truly have changed, to the point that it would be
surprising if this reversal of priorities didn't make it into the kernel
in the relatively near future.
Index entries for this article | |
---|---|
Kernel | Scheduler |
Posted Jun 10, 2025 20:06 UTC (Tue)
by marcH (subscriber, #57642)
[Link] (2 responses)
Posted Jun 11, 2025 6:42 UTC (Wed)
by tamiko (subscriber, #115350)
[Link] (1 responses)
Posted Jun 11, 2025 9:47 UTC (Wed)
by arnd (subscriber, #8866)
[Link]
You can still build a kernel with SMP disabled, and it will still use the trivial implementation of per-cpu data, spinlocks, smp barriers etc, which is where most of the performance and size advantages are for non-SMP builds.
There is currently no way to build an SMP kernel for a lot of the older embedded architectures that lack the required CPU instructions or the irqchip for SMP: ARMv5, most MIPS32r2, PowerPC8xx, m68k, SH3/SH4, ARCompact, microblaze, nios2, and xtensa.
Posted Jun 10, 2025 20:08 UTC (Tue)
by hailfinger (subscriber, #76962)
[Link] (12 responses)
A prominent recent example is the TP-Link Archer C6 v2 router/access point with 802.11ac Wi-Fi. At least in Europe, this device is rather popular among OpenWrt users (affordable, reliable, reasonable range, somewhat recent Wi-Fi). It only has a single core and crucially only has 8 MiB flash space where you have to fit a kernel and all userspace software. For that configuration, a size difference of a few kilobytes may be the difference between shipping a standard OpenWrt or a trimmed down version with reduced functionality.
Posted Jun 10, 2025 20:19 UTC (Tue)
by mb (subscriber, #50428)
[Link] (8 responses)
There's a cost to maintaining support for features that almost nobody uses.
Posted Jun 10, 2025 20:29 UTC (Tue)
by daroc (editor, #160859)
[Link] (2 responses)
Posted Jun 11, 2025 8:23 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (1 responses)
Shouldn't a uni-processor scheduler just be an option? If the users can't be bothered to maintain it, it'll bit-rot. And as a compile-time option or whatever, the cost will be borne by the people who use it, which is as it should be.
Cheers,
Posted Jun 11, 2025 15:39 UTC (Wed)
by nix (subscriber, #2304)
[Link]
The cost here is borne by the maintainers, no matter what. Keeping the uniprocessor scheduler as an option means either keeping the nightmare ifdef maze (no thanks), or *duplicating* the scheduler via unifdef or something, and hoping that changes outside the ifdef maze are maintained in parallel (and usually they have to be maintained or the uniprocessor scheduler will break, hardly anyone will notice, and we're right back where we started). I doubt anyone would be terribly happy with that approach, either...
Posted Jun 11, 2025 14:24 UTC (Wed)
by marcH (subscriber, #57642)
[Link] (2 responses)
Speaking of which, I don't find the "0.3% size increase" a useful metric... Doesn't this scheduler change have a mostly _fixed_ size cost, mostly independent of how many gazillions drivers you enable?
Posted Jun 11, 2025 15:52 UTC (Wed)
by mb (subscriber, #50428)
[Link] (1 responses)
Posted Jun 12, 2025 18:25 UTC (Thu)
by marcH (subscriber, #57642)
[Link]
Posted Jun 11, 2025 18:47 UTC (Wed)
by hmh (subscriber, #3838)
[Link] (1 responses)
Current LTS kernels are going to EOL on 2027 according to kernel.org, which is a bit too close for comfort, IMO.
Posted Jun 13, 2025 7:51 UTC (Fri)
by taladar (subscriber, #68407)
[Link]
LTS users always think of the pain in losing features but not the pain of keeping them for everyone else.
Posted Jun 11, 2025 2:29 UTC (Wed)
by PengZheng (subscriber, #108006)
[Link]
Posted Jun 11, 2025 16:16 UTC (Wed)
by parametricpoly (subscriber, #143903)
[Link] (1 responses)
Also that crap < $50 device is almost 7 years old already.
Posted Jun 11, 2025 20:23 UTC (Wed)
by mr_bean (subscriber, #5398)
[Link]
Posted Jun 10, 2025 22:36 UTC (Tue)
by Kamilion (guest, #42576)
[Link]
[Research begins in another tab...]
Seems I've answered my own question. Having a look at their kernel config, I can see that SMP is already enabled in their buildroot.
Okay, yeah, I can get behind the mentality here of "finding a uniprocessor doing something non-trivial in the wild, running tip of mainline linux, is incredibly rare". My luckfox pico pretty much only runs gpsd, and only because that was tremendously saner than trying to write my own firmware image with micropython on a pi pico, for a minimal difference in cost.
I see a *lot* of uniprocessor devices with a SDK along for the ride, typically with an android-derived kernel 5.10 -- so that's pretty distant from tip of mainline, and thus easier to find in the wild.
Posted Jun 10, 2025 23:25 UTC (Tue)
by iabervon (subscriber, #722)
[Link] (11 responses)
Posted Jun 11, 2025 0:24 UTC (Wed)
by wahern (subscriber, #37304)
[Link] (1 responses)
Posted Jun 11, 2025 2:24 UTC (Wed)
by iabervon (subscriber, #722)
[Link]
Posted Jun 11, 2025 10:49 UTC (Wed)
by alx.manpages (subscriber, #145117)
[Link] (7 responses)
Posted Jun 11, 2025 14:28 UTC (Wed)
by willy (subscriber, #9762)
[Link] (1 responses)
Posted Jun 11, 2025 15:45 UTC (Wed)
by alx.manpages (subscriber, #145117)
[Link]
Posted Jun 14, 2025 15:02 UTC (Sat)
by quotemstr (subscriber, #45331)
[Link] (4 responses)
Posted Jun 14, 2025 17:25 UTC (Sat)
by khim (subscriber, #9252)
[Link] (3 responses)
Yes. Powerful enough to destroy countless programs. You may take a look on Rust: it does gave zero sized types (and Rust's void is also zero-sized tuple). To make that work they had to include tons of checks everywhere. Simply because now you, suddenly can take double your vector size as many times as you want without ever running out of memory. And do bazillion other similar things. Plus clang/gcc have already adopted pointer arithmetic for
Posted Jun 14, 2025 18:03 UTC (Sat)
by quotemstr (subscriber, #45331)
[Link] (1 responses)
Maybe it'd at least make sense to let _Generic work with void as an exception to the general rule against incomplete types?
Posted Jun 14, 2025 18:15 UTC (Sat)
by alx.manpages (subscriber, #145117)
[Link]
alx@debian:~/tmp$ cat g.c
Posted Jun 14, 2025 20:17 UTC (Sat)
by iabervon (subscriber, #722)
[Link]
Posted Jun 14, 2025 17:20 UTC (Sat)
by khim (subscriber, #9252)
[Link]
C doesn't even have zero-sized type and that one is much more useful. Unfortunately the assumption that any type have size of least one is used in bazillion places, thus it's hard to change that.
Posted Jun 12, 2025 16:25 UTC (Thu)
by wtarreau (subscriber, #51152)
[Link] (2 responses)
Posted Jun 12, 2025 18:18 UTC (Thu)
by geert (subscriber, #98403)
[Link] (1 responses)
An m68k atari_defconfig kernel gained 32 KiB between v6.15 and v6.16-rc1, of which ca. 10 KiB can be attributed to the console Unicode fixes.
Posted Jun 13, 2025 16:45 UTC (Fri)
by andy_shev (subscriber, #75870)
[Link]
maxcpus=1?
maxcpus=1?
maxcpus=1?
Wi-Fi access points and home routers: Single-core and size constraints
Wi-Fi access points and home routers: Single-core and size constraints
And at some point the people needing this feature should pay the cost or let it go.
OpenWRT is free to patch a custom small specialized scheduler into their kernel, too. I doubt it's worth it, though.
Wi-Fi access points and home routers: Single-core and size constraints
Wi-Fi access points and home routers: Single-core and size constraints
Wol
Wi-Fi access points and home routers: Single-core and size constraints
Wi-Fi access points and home routers: Single-core and size constraints
> And at some point the people needing this feature should pay the cost or let it go.
Wi-Fi access points and home routers: Single-core and size constraints
Wi-Fi access points and home routers: Single-core and size constraints
Wi-Fi access points and home routers: Single-core and size constraints
Wi-Fi access points and home routers: Single-core and size constraints
Wi-Fi access points and home routers: Single-core and size constraints
Wi-Fi access points and home routers: Single-core and size constraints
Wi-Fi access points and home routers: Single-core and size constraints
Non-Wintel architecture?
Optimizing for one CPU
Optimizing for one CPU
Optimizing for one CPU
Optimizing for one CPU
Optimizing for one CPU
Optimizing for one CPU
Optimizing for one CPU
> Simple and surprisingly powerful change.
Optimizing for one CPU
void*
, having actual zero-sized type would cause compatibility issues there, too.Optimizing for one CPU
Optimizing for one CPU
int
main(void)
{
return _Generic(void, void: 1);
}
alx@debian:~/tmp$ /opt/local/gnu/gcc/countof/bin/gcc -Wall -Wextra g.c
alx@debian:~/tmp$ ./a.out; echo $?
1
Optimizing for one CPU
Optimizing for one CPU
A reasonable move
Kernel size
Kernel size