A prediction with no data to support it

Posted Feb 26, 2025 12:18 UTC (Wed) by tialaramex (subscriber, #21167)
In reply to: A prediction with no data to support it by butlerm
Parent article: A change in maintenance for the kernel's DMA-mapping layer

I don't think the C++ community has what it takes to successfully maintain Linux.

It's a community which only wants to write the happy path. Exceptions enable dilution of responsibility. If I write C++ code which just throws in the unhappy path and you write C++ code which calls my function, both of us can claim at review that it wasn't our job to handle the error. Somebody else should do that, the happy path code I wrote was difficult enough. In Rust whoever panics gets to explain why, and code where nobody handled the error case at all doesn't compile.

That doesn't make the handling magically correct - but it's much less likely that some of the really wild effects happen when you know you're writing error handling code, than when the "handling" is the consequence of a missed check.

I can believe a "C only Linux" fork could exist, particularly if the way Linux gets to 100% Rust platform support is via removing some older platforms some years in the future. If you're involved in maintaining Linux for a CPU architecture that hasn't been made since last century you might well have zero interest in Rust and plenty of reason to fork the last Linux which built correctly for your favourite machine.

A prediction with no data to support it

Posted Feb 26, 2025 12:50 UTC (Wed) by excors (subscriber, #95769) [Link]

I think quite a large part of the C++ community uses -fno-exceptions, mainly for performance reasons but also because they don't like the design of exceptions (https://wg21.link/p0709r4). And there's probably a large overlap between that group and the people who'd be trying to push C++ in the kernel, so any serious proposal would be for an exceptionless subset of C++.

That does cause a bit of friction when some parts of the C++ standard library and language are designed around the assumption that you have exceptions, but in practice it works okay (or at least it's no more problematic than several other aspects of C++).

I can't imagine the Linux kernel actually adopting C++ though, because it would have pretty much all the same technical challenges and cultural pushback as Rust, with significantly fewer benefits to make it seem worthwhile.

A prediction with no data to support it

Posted Feb 26, 2025 14:10 UTC (Wed) by butlerm (subscriber, #13312) [Link] (10 responses)

I would never use or recommend C++ exceptions in or for an operating system kernel. In my view that is basically insane. I understand Google has a similar policy for all C++ code because a C++ exception can occur anywhere and experience has shown with most languages that support exceptions that exceptions tend to be overused, insufficiently analyzed, and go unhandled more often than not until they reach some sort of :"error occured" exception handler.

It like "oh well, just ship this code or release it into production because the catch all exception handler will handle the problem and the user can either try again or we can fix any issue we find or someone reports after the fact in a month or two or maybe sooner if it is really serious." And that is if the problem ever gets fixed at all, within the lifetime of the project, the product, the service, the volunteers (where applicable), managers, leaders, or the developers in question.

I used to write video games in C and assembly language and in my view good code should perform according to specification and be usable a century from now if committed to ROM and sold on store shelves or shipped in products that way. Does anyone doubt that most Nintendo, Sega, or Atari 7800 games will actually work with the appropriate hardware without major malfunctions decades from now? What about something like Netware (which was originally mostly written in 80386 assembly language) or the Amiga operating system (originally written in a mixture of C, BCPL, and 68K assembly language) or a number of other things, at least if deployed into a non-hostile environment?

You can see this problem in web applications written in Javascript all the time these days, especially on the web sites of banks that are not among the largest in the country or on the websites of most non-bank credit card issuers and lenders as well. I use websites on a regular basis where it is a fifty fifty chance that a login with the correct credentials supplied will actually succeed. And that goes for many other actions as well where the user is often required to do things like put in their credit card information to make a payment twice because of mysterious error occured problems that are cured simply by repeating the process. Or worse where a payment will not go through at all for weeks for other and never documented reasons not explained to the user. There is a major funds transfer webapp whose name you would all recognize that often behaves that way these days.

I believe that this is likely and in large part the result of libraries and code included in many modern Javascript applications that is are so extensive that either the exceptions are undocumented or you have to be an expert to handle them properly, and often entry level developers are not given enough time or resources to fix the problem. That is my experience for a few years when I was in the unfortunate position of having to maintain and develop code for a moderately sophisticated web application that was originally programmed to use Javascript only where necessary. When you have a dozen or more developers working on a project it is that much worse.

Anyway I am not surprised that a large team has a difficult time writing safe, correct, and decently performing C, C++, or Java code and don't really see any solution to that other than compilers and static analysis tools that identify the problems and produce and optimize code better than most developers can write by hand even after they stare at a problem for hours at a time. And in a project as big as the Linux kernel or something like a modern database or web browsers in my view it would be worth it to write static analysis tools that are hard coded if necessary to describe and enforce the constraints and rules that govern and apply to that project. A more general tool would be nice but apparently no one has written one yet - not one capable or used enough (apparently) to find the memory safety, locking, and other problems that still make it to deployed production kernels and have to be corrected after the fact in some cases after making national or international news due to problems that ought to be straightforward to analyze and detect.

Finally, although this almost certainly could not be done well or perfectly without heavy use of a new series of #pragmas or language extensions, my idea of a usable C or C++ compiler for a large project is one that refuses to compile code with undefined behaviors at all and require the developer to supply machine architecture and memory model targeting information to make those behaviors implementation or configuration defined if he or she wants to code almost anything that would otherwise result in undefined behavior that developers, vendors, and publishers of contemporary C and C++ compilers feel like they have a license to do anything for any reason such as delete entire code sections or skip appropriate if statements and safety checks as we have read about here from time to time with regard to C compiler optimizers causing serious problems for that reason. That is my two cents on this question.

A prediction with no data to support it

Posted Feb 26, 2025 17:02 UTC (Wed) by matthias (subscriber, #94967) [Link] (9 responses)

> Finally, although this almost certainly could not be done well or perfectly without heavy use of a new series of #pragmas or language extensions, my idea of a usable C or C++ compiler for a large project is one that refuses to compile code with undefined behaviors at all and ...

C and C++ are not designed for this. Of course many cases of UB in C and C++ can be made implementation defined like integer overflow. But there are certain operations that are already UB on the machine code level:
- dereferencing a dangling pointer
- data races between two threads that access the same memory where at least one access is a write
- probably a few more (but not many)

The rust way of eliminating this kind of UB is the borrow checker that verifies at compile time that all references are sound. I really do not see any reason why this should be done in C or C++. If you add borrow checking to these languages they are not really the same language any more. Instead it would be much better to use rust directly which has been developed with this feasture in mind from the start.

Of course you can also use the good old -O0 approach of forbidding any optimizations that could result in UB. Except you also have to prevent UB on the machine code level. So all data accesses need to be atomic to prevent the CPU from doing crazy reorderings that are only sound in the absence of data races. The resulting performance would be worse than -O0.

Then there is the JVM way of doing things. Use a virtual machine and only code against the virtual machine. I do not see how this should work in the kernel. Also you need a language to write the virtual machine in.

In my opinion, rust already is this hypothetical C++ language without UB. Maybe at some point a clever person will find better alternatives, but I do not see a way to get rid of UB without the borrow checker. And it is really the borrow checker that defines what kind of language rust is. There are of course other differences, but the borrow checker is the most prominent one.

A prediction with no data to support it

Posted Feb 27, 2025 0:50 UTC (Thu) by neggles (subscriber, #153254) [Link]

> Then there is the JVM way of doing things. Use a virtual machine and only code against the virtual machine. I do not see how this should work in the kernel. Also you need a language to write the virtual machine in.

Well that's essentially what eBPF is, a virtual machine model and runtime environment that's suitable for use in the kernel. But the limitations of eBPF (and wasm for that matter, since a number of people are of the opinion that eBPF is "just worse wasm") show why that's not a practical model for the kernel as a whole.

As an aside, It might be an interesting project to try and write a microkernel almost entirely in eBPF, where (say) each individual microkernel service is a verified eBPF program and only the base message passing layer / helper functions aren't. Probably a Ph.D or two to be had there.

A prediction with no data to support it

Posted Feb 27, 2025 1:56 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (6 responses)

> dereferencing a dangling pointer

This is not, strictly speaking, UB on the machine code level (at least in the general case). Depending on what you mean by "dangling," it could be well-defined as having either of the following meanings:

* You access some area of memory that you did not intend to access, but it's still within your address space. It is a perfectly well-defined operation. By assumption, it is not the well-defined operation that you intended to do, but that doesn't make it UB.
* You trap, and the OS does something about it (in practice, usually it kills the offending process, but page faults can use a similar or identical mechanism depending on the architecture, and a page fault is not even a real error). This is also a perfectly well-defined operation (regardless of how the OS decides to respond to it).

Remember, the heap is entirely a construct of libc, and the stack is mostly a construct of libc. The notion of "corrupting" either of them does not exist at the machine code level, because at the machine code level, memory is memory and you can read or write whatever bytes you want at whatever address you want in your address space. If you write the wrong bytes to the wrong address, and confuse some other part of your program, that's your problem. It does not magically cause the CPU to believe that your program is invalid, and to start doing things other than what your machine code tells it to do (or, in the case where the instruction pointer is no longer pointing at your original machine code, whatever the new code tells it to do).

> data races between two threads that access the same memory where at least one access is a write

Most architectures do not provide the full semantics of the C abstract machine under the as-if rule. That is, most architectures are at least willing to promise that you get some sort of value when you execute a data race. It's probably the wrong value, it's probably nondeterministic-but-not-in-a-cryptographically-useful-way, and it might not look like any of the values you would "logically expect" to see (e.g. because of tearing), but it is still not quite the same thing as UB.

UB specifically means "an optimizing compiler is allowed to assume that this never happens." It cannot exist at the machine code level, because there is no compiler. The closest we can get (within the context of the C and C++ standards) is implementation-defined behavior, which roughly translates from the standardese to "if this happens, we don't know what your system will do, but you can read your compiler, CPU, and OS manuals and figure it out if you really want to."

The C and C++ standards committees could, at any time, wave a magic wand and eliminate all UB from their respective languages. The reason that nobody is seriously advocating for that is not because it would not work, but because it would necessarily involve saying something like "all UB is hereby reclassified as IB," and (this general category of) IB is almost as much of a problem as UB. It also requires more documentation that nobody is actually going to read (do *you* want to carefully study a heap diagram for your particular libc's malloc, just so you know what happens if the heap is corrupted?), since all IB must be documented by each implementation (that's the "you can read your manuals" bit). So you'd lose a lot of optimization opportunities, and waste a lot of the implementers' time, in exchange for practically nothing.

A prediction with no data to support it

Posted Feb 27, 2025 7:31 UTC (Thu) by matthias (subscriber, #94967) [Link] (2 responses)

> It does not magically cause the CPU to believe that your program is invalid, and to start doing things other than what your machine code tells it to do (or, in the case where the instruction pointer is no longer pointing at your original machine code, whatever the new code tells it to do).

So what are the semantics if you corrupt the stack and as a consequence jump to uninitialized memory or memory that you intentionally filled with random data to construct a key or even worse, memory filled by data controlled by an attacker. By the very definition of the instruction set anything can happen. You can call the resulting behavior however you like it to call, but it is essentially as undefined as it can possibly get.

And independently from how you call this behavior, this is clearly behavior that has to be avoided. Corrupting the stack clearly leads to exploits so this UB free variant of C(++) that we are talking about has to avoid it. So we are back at square one and we need the borrow checker to avoid this.

> UB specifically means "an optimizing compiler is allowed to assume that this never happens." It cannot exist at the machine code level, because there is no compiler.

But you have a very similar thing. An optimizing out-of-order architecture in the CPU. And this architecture makes similar assumptions on what can happen vs. what cannot happen. And again, you can call this behavior by different names, but it is essentially undefined. The CPU does not have the global sense of what is going on as the compiler, but messing up locally is enough to corrupt your data. And again, we effectively need the borrow checker to prevent data races. You can get rid of some of this behavior if you make each and every data access atomic, but this is obviously undesirable and I am not even sure that this would be enough.

> ...saying something like "all UB is hereby reclassified as IB," and (this general category of) IB is almost as much of a problem as UB.

It is essentially this, giving a new name to the same behavior. And it is not almost as much as a problem as UB, it is exactly as much of a problem as UB, as it can still lead to the same "if you do not follow the rules, I am allowed to format you hardrive" kind of behavior.

I would be absolutely in favor if the committee would eliminate all this nonsense kind of UB like integer arithmetics can be UB. But once you try to avoid the UB of dangling pointers and data races, you essentially have to construct a whole new language.

Architecture, microarchitecture, and undefined behaviour

Posted Feb 28, 2025 8:24 UTC (Fri) by anton (subscriber, #25547) [Link] (1 responses)

So what are the semantics if you corrupt the stack and as a consequence jump to uninitialized memory or memory that you intentionally filled with random data to construct a key or even worse, memory filled by data controlled by an attacker. By the very definition of the instruction set anything can happen.

Not at all. First of all, the architectural effects of every instruction up to that point continue to hold, while, e.g., in C++ undefined behaviour is reportedly allowed to time-travel. Next, in a well-designed architecture what happens then is defined by the actual content of the memory and the architecture description, which does not contain undefined behaviour (remember, we are discussing well-designed architectures). Maybe you as programmer do not deem it worth reasoning about this case and just want to put the label "undefined behaviour" on it, but as far as the architecture is concerned, the behaviour is defined.

An optimizing out-of-order architecture in the CPU.

The architecture does not specify out-of-order execution, on the contrary, it specifies that each instruction is executed one by one. There may be a microarchitecture with out-of-order execution like the Pentium Pro below it, or a microarchitecture with in-order execution like the 486, but the end result of executing a sequence of instructions is the same (except for the few cases where the architectures differ; IIRC the CMOVcc instructions were in the Pentium Pro, but not the 486).

And this [micro]architecture makes similar assumptions on what can happen vs. what cannot happen. And again, you can call this behavior by different names, but it is essentially undefined.

Computer architects have learned what later became Hyrum's law long ago, and therefore define completely (or almost completely for not-so-well designed architectures) what happens under what circumstances. Microarchitectures implement the architectures, and they do not assume that something cannot happen when it actually can. When the microarchitects fail at implementing the architecture, as with Zenbleed, that's a bug.

The CPU does not have the global sense of what is going on as the compiler, but messing up locally is enough to corrupt your data.

Microarchitectures with out-of-order execution do not commit any changes that do not become architectural, and therefore do not corrupt data (rare architecture-implementation bugs like Zenbleed excepted).

Architecture, microarchitecture, and undefined behaviour

Posted Feb 28, 2025 14:01 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

> in C++ undefined behaviour is reportedly allowed to time-travel.

This has been argued, but it seems that no one has been able to show an instance of a compiler actually doing so. There are some solutions for it in the works (by saying "it's not allowed"), but it is practically an no-op as compiler have already behaved that way (though I am certainly not well-steeped in the matter for the details):

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/...

A prediction with no data to support it

Posted Feb 27, 2025 12:31 UTC (Thu) by excors (subscriber, #95769) [Link] (2 responses)

> UB specifically means "an optimizing compiler is allowed to assume that this never happens." It cannot exist at the machine code level, because there is no compiler. The closest we can get (within the context of the C and C++ standards) is implementation-defined behavior, which roughly translates from the standardese to "if this happens, we don't know what your system will do, but you can read your compiler, CPU, and OS manuals and figure it out if you really want to."

I don't think that's really true. x86 and Arm have a number of things that are explicitly documented as "undefined" or "unpredictable" in the architecture references, and are not documented in CPU-specific manuals (as far as I can see), so you can't figure out the behaviour even if you really want to.

E.g. on x86 there's the BSF/BSR instructions ("If the content of the source operand is 0, the content of the destination operand is undefined"). Many instructions leave flags in an undefined state. With memory accesses to I/O address space, "The exact order of bus cycles used to access unaligned ports is undefined". Running the same machine code on different CPUs can give different behaviour, in the same way that running the same C code through different compilers (or the same compiler with different optimisation flags) can give different behaviour, with no documentation of what will happen, so I think it's reasonable to equate that to C's concept of UB.

(And the C standard says UB specifically means "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements", so it's not literally dependent on there being an optimising compiler.)

In practice, all the undefined/unpredictable CPU behaviour that's accessible from userspace is probably documented internally by Intel/Arm for backward compatibility and security reasons, since the CPU is designed to run untrusted machine code (unlike C compilers, which are designed to compile only trusted code). Armv8-A has a lot of "constrained unpredictable", where it's documented that an instruction might e.g. raise an exception or be treated as NOP or set the destination register to an unknown value but it isn't allowed to have any other side effects; but there's still plenty of non-constrained "unpredictable" behaviours. They're not fully unconstrained: they are documented as obeying privilege levels, but they can have arbitrary behaviour that would be achievable by any code within that privilege level, which is the same as C's UB in practice (e.g. UB in an application is not allowed to break the kernel). So I think it's very much like C's UB.

A prediction with no data to support it

Posted Feb 28, 2025 8:55 UTC (Fri) by taladar (subscriber, #68407) [Link]

UB is not really just a single undefined operation, it is more what happens as a consequence of relying on an invariant the optimizer assumes that then gets broken, e.g. you could have a piece of code that relies on the invariant that an enum discriminant is only ever 0, 1 or 2 and optimizes a jump to go to a base address plus the discriminant multiplied by 8 without checking bounds so if that invariant is broken you end up literally jumping to memory that could have any kind of code and so the behavior is undefined and could launch missiles for all we know.

A prediction with no data to support it

Posted Feb 28, 2025 9:18 UTC (Fri) by anton (subscriber, #25547) [Link]

E.g. on x86 there's the BSF/BSR instructions ("If the content of the source operand is 0, the content of the destination operand is undefined"). Many instructions leave flags in an undefined state. With memory accesses to I/O address space, "The exact order of bus cycles used to access unaligned ports is undefined". Running the same machine code on different CPUs can give different behaviour, in the same way that running the same C code through different compilers (or the same compiler with different optimisation flags) can give different behaviour, with no documentation of what will happen, so I think it's reasonable to equate that to C's concept of UB.
(And the C standard says UB specifically means "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements", so it's not literally dependent on there being an optimising compiler.)

C language lawyers make a fine-grained difference between different forms of lack of specification in the C standard. IIRC they have "unspecified value" for cases where the result of an operation is unspecified (as in the BSF/BSR case and the unspecified flags results). I think they do not have a special name for an unspecified order.

And while I agree with the idea that the C standards committee originally used "undefined behaviour" for cases where different implementations produced different behaviour, and where they did not have a more specific term (such as "unspecified value"), for several decades C compiler writers have used "undefined behaviour" to assume that this behaviour does not occur in the programs they support (unless the program is "relevant" for some reason), and there are people around that advocate the position that this has been the intent of "undefined behaviour" from the start.

And the latter form of "undefined behaviour" has quite different results from the former; e.g., with the latter form a loop with an out-of-bounds access can be "optimized" into an endless loop, while with the former form it will perform the memory access, either giving a result, or producing something like a SIGSEGV.

In practice, all the undefined/unpredictable CPU behaviour that's accessible from userspace is probably documented internally by Intel/Arm for backward compatibility and security reasons

Especially backwards-compatibility; the security benefits fall out from that. As for the bad design in the ARM architectures, maybe they have had too much contact with compiler people and become infected by them. I expect that at some point the implementors of ARM architectures will find that existing programs break when they implement some of the ARM-undefined behaviour in a way different than earlier implementations of that architecture, and that behaviour then becomes an unofficial part of the architecture, as for the Intel and AMD cases mentioned above. A well-designed architecture avoids this pitfall from the start.

A prediction with no data to support it

Posted Feb 27, 2025 3:18 UTC (Thu) by raof (subscriber, #57409) [Link]

Then there is the JVM way of doing things. Use a virtual machine and only code against the virtual machine. I do not see how this should work in the kernel. Also you need a language to write the virtual machine in.

There was a very interesting research OS at Microsoft that did exactly this - Singularity. A bit of bootstrap written in assembly, then jumping into a fully managed environment written in a variant of C# (called Sing#, which was the source of a bunch of C# features over time). Being fully managed meant that one of the core weaknesses of microkernels - context switch overhead - didn't exist, because it just didn't use the process-isolation hardware.

There's a really interesting series of blog posts about Midori, the very-nearly-complete project to replace Windows with a Singularity-derived codebase.

Rust will not reduce platforms

Posted Feb 26, 2025 22:23 UTC (Wed) by jmalcolm (subscriber, #8876) [Link] (3 responses)

I do not expect that Rust will reduce the platforms that Linux runs on.

Today, where Rust is going in is the drivers. Drivers are often fairly platform specific already. You can also have competing drivers for the same hardware if it turns out that there needs to be a mainstream and a niche option. But the fact that Apple Silicon users are writing their GPU drivers in Rust is not going to threaten Linux support for my niche architecture.

Rust support is also being added to GCC (gccrs). That may take a while to bake but I expect it to mature before we start seeing Rust in core Linux systems that are non-optional across platforms. In other words, Rust in the kernel will not threaten platform support as long as your platform is supported by either GCC or Clang (LLVM).

What platforms are we worried about that cannot be targeted by GCC or Clang? Can Linux run there now?

As a final back-stop, their is mrustc. This allows Rust to target any system with a capable C++ compiler.

By the time Rust becomes non-optional in Linux, Rust will be as portable as C or C++.

Rust will not reduce platforms

Posted Mar 1, 2025 18:32 UTC (Sat) by mfuzzey (subscriber, #57966) [Link] (2 responses)

Drivers are generally *hardware* dependent (of course) but most are *platform* independent.

This applies to virtually all drivers for hardware that isn't in the SoC itself (eg chips connected to the CPU using busses like I2C / SPI / PCI / USB ).

Even when the hardware is actually inside the SoC it's quite common for IP blocks to be reused in multiple SoCs, even ones from different manufacturers (because manufacturers often buy the IP for an ethernet controller, USB controller or whatever and integrate it in their SoC). In that case the register interface is the same so the driver code is the same but the registers will be at different addresses (and that's taken care of by injecting the appropriate base address by DT / ACPI)

So, in many cases, having drivers in Rust will impact Linux support for platforms that don't yet have a Rust implementation. And while it is indeed possible to have competing implementations this usually frowned upon in the kernel for duplication / maintenance reasons and usually exists only temporarilly.

Rust will not reduce platforms

Posted Mar 3, 2025 10:24 UTC (Mon) by taladar (subscriber, #68407) [Link]

The real question is why do those niche platforms have such an overwhelming impact on Linux development decisions despite largely being out of production for decades?

Rust will not reduce platforms

Posted Mar 5, 2025 0:59 UTC (Wed) by edgewood (subscriber, #1123) [Link]

Are there kernel architectures that are not supported by Rust that can support new devices in need of new drivers? My understanding is that all but one of the kernel architectures not supported by Rust are legacy architectures that are out of manufacturing. Are they really getting new devices? Can you point me to a concrete example of one of these?

A prediction with no data to support it

Posted Feb 27, 2025 0:44 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> some older platforms some years in the future

Hm. Right now Rust is missing the following in-tree archs: sh, parisc, openrisc, nios2, microblaze, csky, arc, alpha.

Out of these architectures, only sh is still being manufactured. And maybe arc (from Synopsys). I'd be surprised if these architectures stay in-tree by the time Rust becomes mandatory. Except for Alpha, people love it for some reason.