|
|
Subscribe / Log in / New account

How many ways are there to configure the Linux kernel?

By Daroc Alden
September 10, 2025

There are a large number of ways to configure the 6.16 Linux kernel. It has 32,468 different configuration options on x86_64, and a comparable number for other platforms. Exploring the ways the kernel can be configured is sufficiently difficult that it requires specialized tools. These show the number of possible configurations that options can be combined in has 6,550 digits. How has that number changed over the history of the kernel, and what does it mean for testing?

Analyzing Kconfigs

With so many configuration options to consider, it can be difficult to even figure out which options are redundant, unused, or otherwise amenable to simplification. For our problem of counting the number of ways to configure the kernel, we need to consider the relationships between different configuration options as well. Happily, there is some software to manage the growing complexity.

The kernel uses a custom configuration language, Kconfig, to describe how the build can be customized. While Kconfig has spread to a handful of related projects, it remains a mostly home-grown, Linux-specific system. Paul Gazzillo, along with a handful of other contributors, wrote kmax, a GPL-2.0-licensed software suite for analyzing Kconfig declarations. That project includes tools to determine which configuration options need to be enabled in order to test a patch, to find configuration bugs, and to export Kconfig declarations in a language that can be understood by more tools. It has been integrated into the Intel 0-day test suite, finding a number of bugs in the kernel's configuration in the process.

One of the project's programs, klocalizer, can output the set of relationships between different options in SMT-LIB 2 or DIMACS format — the standard input formats for modern SAT solvers. This makes it much simpler to write custom ad-hoc analysis of the kernel's configuration options. For example, one could use Z3's Python bindings to load the output from klocalizer and explore how options relate to one another. A SAT solver such as Z3 produces (when possible) a single solution to a set of constraints — in this case, a single valid way to configure the kernel.

Calculating the number of valid solutions is a related but much harder problem called #SAT. While there are a number of approximation techniques that can be used to solve #SAT probabilistically, in this case the constraints in the kernel's configuration are simple enough for a precise counter called ganak to solve the problem exactly.

The numbers

So, with these tools in hand, one can analyze how many ways to configure the kernel there really are. The growth in the number of configuration parameters in the kernel has been roughly linear, leading to exponential growth in the number of ways the kernel can be configured. The following chart shows two trend lines: the logarithm of the actual number of ways to configure the kernel in blue, and the number of configuration options (scaled appropriately to sit on the same axis; see below) in red. The red line represents the theoretical maximum that the blue line could obtain. Plotting 6,000 digit numbers directly proved to be too much for both LibreOffice Calc and Google Sheets, hence the logarithms.

[A chart of the number of possible configurations versus version]

There are a few things that can be immediately seen from this chart. For example, there are more than twice as many configuration options available in version 6.16 compared to version 2.6.37 (the earliest version that kmax can understand). In some sense, this is not surprising — the Linux kernel keeps adding support for new features and new hardware; the number of configuration options is just growing along with the code. But there is a subtler trend that is arguably more important: configuration options are becoming more interdependent over time.

If every configuration option could be enabled independently of every other configuration option, there would be 2N ways to configure the kernel given N configuration options. The red line is the number of configuration options scaled by ln(2)ln(10) — that is, it represents the logarithm of 2N. That way the two lines are directly comparable. The actual number of ways to configure the kernel sits below this theoretical maximum because configuration options can depend on or exclude each other. This chart shows the difference between the red and blue lines:

[A chart showing the difference between actual and theoretical numbers of configurations]

Note that this result is still on a logarithmic scale, so this value is the logarithm of the number of ways to configure the kernel that are ruled out by dependencies between different options. As the difference grows, it shows that more configuration options are interacting with each other. This is not, in itself, a problem — but those interactions are likely associated with interdependent pieces of code. The number of places where kernel code depends on multiple related configuration options is growing.

With the number of possible configurations, it is certain that not all of them have been tested individually. If the configuration options were independent, testing a kernel with everything enabled and one with everything disabled would theoretically suffice. Many options, especially ones that are specific to a particular hardware driver, are genuinely independent. But when options can influence each other and interact (even outside of Kconfig interactions), the number of scenarios that need to be tested grows quickly.

Linux's testing is largely distributed — it is done by individual patch contributors and reviewers, several different automatic unit testing and fuzzing robots, distribution QA teams, companies, and normal Linux users. With the decentralized nature of open source, and the large number of possible configurations, it's hard to tell exactly how many configurations are being covered. The kernels used by popular distributions are well-tested, but enabling other configuration options is a fast route into uncharted territory. The Linux kernel isn't going to stop getting new features — so the only real solution to the problem is to try to simplify existing configuration options when possible. The recent work to remove special uniprocessor support from the scheduler is an example of that.

Ultimately, among the many challenges of kernel development, this is not the most urgent or important. It is worth keeping an eye on, though, for the sake of the sanity of anyone trying to configure a kernel, if nothing else.



to post comments

What percentage of valid configuration options successfully build?

Posted Sep 10, 2025 17:14 UTC (Wed) by nickodell (subscriber, #125165) [Link] (7 responses)

It has 32,468 different configuration options on x86_64
Darn it, so close to a pleasing round number. Could someone add 300 more configuration options real quick?
With the number of possible configurations, it is certain that not all of them have been tested individually. If the configuration options were independent, testing a kernel with everything enabled and one with everything disabled would theoretically suffice.
I wonder what percentage of these configurations successfully compile. Of course, it wouldn't be possible to try all configurations, but it seems like it would be possible to randomly sample 100 legal configurations from the set of possible configurations, and test if they successfully compile. My intuition for this is that maybe 20% of randomly sampled configurations would fail to compile, but I'm curious what others think.

What percentage of valid configuration options successfully build?

Posted Sep 10, 2025 17:19 UTC (Wed) by daroc (editor, #160859) [Link] (5 responses)

There is actually a make target in the kernel for selecting a random configuration: "make randconfig". So you can try this experiment yourself. I have seen build failures while playing around with it, but I didn't think to collect rigorous statistics. Generally, I've found that "make randconfig" usually builds, though.

What percentage of valid configuration options successfully build?

Posted Sep 10, 2025 19:46 UTC (Wed) by vegard (subscriber, #52330) [Link] (3 responses)

It should be noted that the "randconfig" target does not select configs uniformly from the whole space of possible configs. If you have N options that do "select X" and each of those options have a 50% probability of being set, then X has a 1 - 1/(2^N) probability of being set, i.e. it is overwhelmingly probable that it will be set in any config produced by randconfig. In other words, randconfig is heavily biased. You need to run it VERY many times in order to hit certain combinations.

There are also a bunch of configs that will rely on the host configuration, e.g. choosing lz4 compression for vmlinux when no lz4 utilities are installed on the host, and which will fail to build successfully for those reasons. Many more combinations of options will fail to boot for various reasons that are not necessarily bugs. There are things you can do to minimize this kind of thing, either you can apply some patches so that those combinations never appear or you can run scripts/config after randconfig to enable/disable specific options as needed.

What percentage of valid configuration options successfully build?

Posted Sep 11, 2025 12:44 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (2 responses)

In that case, you need to also be able to say "option X will be off" and ignore/force anything that would turn it on when generating random configurations. Might need additional logic in Kconfig to support such operations though.

But you need the existing mode too as an option that requires N things will then have 1/2^N chance of being allowed on. Though…perhaps one way would be to:

- make a bag of all options
- weight each option based on the number of other options that either require it or it requires
- pull an option from the bag at random
- choose a value for it
- remove options from the bag that are forced to be a value based on the selected value
- repeat until bag is empty

Maybe just having the bag have pairs of (option, value) each individually weighted would be better?

What percentage of valid configuration options successfully build?

Posted Sep 11, 2025 12:59 UTC (Thu) by daroc (editor, #160859) [Link] (1 responses)

There's actually a tool by the same people who made ganak that will randomly sample from a constrained space of options with negligible bias. So you might be able to use kmax to convert the Kconfigs to a SAT problem, use that tool (csb) to sample a random model, and then convert that back to a .config.

What percentage of valid configuration options successfully build?

Posted Sep 11, 2025 22:43 UTC (Thu) by quotemstr (subscriber, #45331) [Link]

There's also the venerable PICT from Microsoft: https://github.com/microsoft/pict

It solves the problem of generating useful "pairwise testing" configurations by picking from large numbers of options with constraints. Surprisingly non-trivial yet very useful.

What percentage of valid configuration options successfully build?

Posted Sep 11, 2025 17:59 UTC (Thu) by make (subscriber, #62794) [Link]

A few years ago, I had "make randconfig" running in a loop for a few days and I found few (weird) configurations that resulted in build failures. I submitted simple fixes for these (and iirc most of them were merged).

What percentage of valid configuration options successfully build?

Posted Sep 18, 2025 17:56 UTC (Thu) by yanjun.zhu (guest, #173290) [Link]

I have the same reaction when I saw the number 32478.

And then there's runtime options ...

Posted Sep 10, 2025 17:36 UTC (Wed) by willy (subscriber, #9762) [Link] (2 responses)

Both kernel boot parameters and sysfs knobs you can twiddle.

To say nothing of various syscalls you can make.

The search space for "is the kernel correct" (for literally any definition of the word "correct") is mind-numbingly big. And that shouldn't surprise us; BB-6 is unknown, but larger than 10↑↑15 (for those not familiar, that's 10^ (10^ (10 ^ ...)) to a height of 15). The kernel is many orders of magnitude larger than BB-6.

https://en.wikipedia.org/wiki/Busy_beaver

And then there's runtime options ...

Posted Sep 10, 2025 19:11 UTC (Wed) by tux3 (subscriber, #101245) [Link]

Whether the correctness of the kernel is independent of the ZFC axioms of set theory is left as an exercise to the reader (BB(n) acquires this property for *some* n<=745)

Formal verification techniques

Posted Sep 11, 2025 8:06 UTC (Thu) by DemiMarie (subscriber, #164188) [Link]

Search-based approaches to formal verification fail to scale, and I believe it is for exactly this reason. Interactive theorem proving *can* scale, but it is still quadratic in the size of a module. Linux is a quite modular program, but I expect the proof effort would be in the hundreds of billions of dollars, if not trillions.

Log Scale

Posted Sep 10, 2025 22:38 UTC (Wed) by jrtc27 (subscriber, #107748) [Link]

> Plotting 6,000 digit numbers directly proved to be too much for both LibreOffice Calc and Google Sheets, hence the logarithms.

I am disappointed to learn that Daroc is a quitter when it comes to making their point properly: https://xkcd.com/1162/ :)

fake options

Posted Sep 11, 2025 6:03 UTC (Thu) by adobriyan (subscriber, #30858) [Link]

Some of the options aren't user selectable, things like CONFIG_ARCH_HAS_GIGANTIC_PAGE which arguably should live in their own namespace.

Why so many 0s at the end?

Posted Sep 11, 2025 7:23 UTC (Thu) by wsy (subscriber, #121706) [Link] (2 responses)

This seems so strange since 2^n don't tend to get us 0s

Why so many 0s at the end?

Posted Sep 11, 2025 7:32 UTC (Thu) by Sesse (subscriber, #53779) [Link] (1 responses)

Some options are of the type “pick either A or B or C or D or E”, giving you a factor 5.

Why so many 0s at the end?

Posted Sep 11, 2025 9:59 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

Also "depends on" clauses can result in the easiest cases in 2^N+1 configurations. This can end with 3, 5, 7 or 9 and introduce multiples of 5. With over 6000 options, it's not unlikely that this happens 10-20 times.

32-bit KCONFIG_SEED

Posted Sep 15, 2025 23:12 UTC (Mon) by calvinowens (subscriber, #100757) [Link]

Random thought: given the randconfig seed is 32-bit, aren't the overwhelming majority of the set of all possible random kernel configurations impossible to reproduce with `make randconfig` as it is written today? The code calls rand() iteratively, so the reproducible set changes every time any kconfig option in the kernel changes in any way, but on a particular git SHA it's a fixed set, isn't it?

It's an interesting consequence of the seeded approach, but maybe it doesn't matter, I suppose 2^32 is enough that people testing randconfigs in the wild aren't generating the same randconfigs.


Copyright © 2025, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds