A change in maintenance for the kernel's DMA-mapping layer
Posted Feb 25, 2025 21:32 UTC (Tue)
by amarao (guest, #87073)
[Link] (29 responses)
Posted Feb 26, 2025 14:44 UTC (Wed)
by wkudla (guest, #116550)
[Link] (27 responses)
Posted Feb 26, 2025 15:55 UTC (Wed)
by pizza (subscriber, #46)
[Link] (26 responses)
You are missing a critical point here -- Linus has effectively stripped Hellwig of his authority/power as a maintainer of a critical subsystem. Why would anyone want to continue to be officially responsible for something they do not have the authority/power to make decisions over?
> He just didn't like it so quit in a tantrum.
Stepping down quietly is the polar opposite of a tantrum.
Posted Feb 26, 2025 16:28 UTC (Wed)
by garyvdm (subscriber, #82325)
[Link] (9 responses)
Ah... no. That's not how I read things.
Linus wrote:
Remember that the Rust DMA bindings being asked for merge sat *outside* the DMA subsystem.
Where is Linus say you no longer have authority/power over the DMA subsystem?
Posted Feb 26, 2025 20:39 UTC (Wed)
by smurf (subscriber, #17840)
[Link] (8 responses)
He doesn't. Mr Hellwig essentially wrote, "I won't work with the Rust people and I don't care about your Rust policy", Linus replied with "I don't care that you don't care", and Hellwig's reaction to that was not "OK let's see how things *actually* work out before I say 'I told you so'" but "OK fine then I quit".
He's fine to do that of course, nobody forces him to do anything, but the flip side is that nobody forces me (or anybody else of course) to have a particularly high opinion of somebody who chooses to deal with imaginary problems that way.
And yes they are imaginary problems at this point. They don't become non-imaginary unless and until there actually *is* a (technical) problem that needs maintainer interaction, and things have not progressed that far yet AFAIK.
Posted Feb 26, 2025 22:03 UTC (Wed)
by jmalcolm (subscriber, #8876)
[Link] (7 responses)
He was demanding that they not exist, at least not officially, in the kernel.
He tried to say that they could not work with his code. Nobody was asking him to work with their code.
Linus did not have to strip him of authority OUTSIDE of dma. He never had any to begin with.
If he does not want to work on a project that allows Rust code, that is his choice. Let's call it what it is though.
Posted Feb 26, 2025 22:10 UTC (Wed)
by pizza (subscriber, #46)
[Link] (3 responses)
30+ years of Linux history shows that maintainers _are_ forced to work with the in-tree users of their subsystems.
Sorry.
Posted Feb 27, 2025 0:14 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (2 responses)
But that was explicitly changed for Rust - if your C changes broke Rust, it was down to the Rust guys to fix it.
And as a maintainer of ANY subsystem, you need a clearly defined API. That's the problem here - the DMA code had (has) a contradictory API. Forget Rust, that's something that needs fixing.
The deal as it stands is that if your C changes break C, you're still expected to fix it. Sane programming demands that if you change an API you should document it. Leave Rust out of it, the deal is if you have an API, you should document it and then you can forget about Rust - it's their problem to follow your API.
Cheers,
Posted Feb 27, 2025 7:42 UTC (Thu)
by Mook (subscriber, #71173)
[Link] (1 responses)
Down thread of this week's quotes of the week: https://lwn.net/ml/all/CAHk-=wjg1PJ81E23DB1QbvPBQ04wCf7mJ...
> The most common situation is that something doesn't build for me, [snip…]
> My build testing is trying to be wide-ranging in the sense that yes, I do an allmodconfig build on x86-64 [snip…]
See also the parent of that mail.
Posted Feb 27, 2025 9:17 UTC (Thu)
by pbonzini (subscriber, #60935)
[Link]
Posted Feb 28, 2025 11:46 UTC (Fri)
by LtWorf (subscriber, #124958)
[Link] (2 responses)
True. The alternative being quitting. Which he did.
Posted Feb 28, 2025 11:49 UTC (Fri)
by amarao (guest, #87073)
[Link] (1 responses)
It's a very slippery slope, when you reject people based on their preferences of language, mascot and deity.
Posted Feb 28, 2025 12:17 UTC (Fri)
by LtWorf (subscriber, #124958)
[Link]
It is happening to C people…
Posted Feb 26, 2025 16:30 UTC (Wed)
by wkudla (guest, #116550)
[Link] (15 responses)
Could you expand on that? My understanding was that Hellwig would not be responsible for rust code in the slightest. He was opposing rust targeting his apis. But I might be missing something important here.
Posted Feb 26, 2025 16:57 UTC (Wed)
by judas_iscariote (guest, #47386)
[Link]
Posted Feb 26, 2025 17:28 UTC (Wed)
by pizza (subscriber, #46)
[Link] (13 responses)
Maintainers have _always_ been responsible for all [1] in-tree users of their subsystems -- they change an API, all users need to be fixed up. Additionally, they're the point of contact of bug reports and other problems.
Saying "no, you're not responsible for _that_ class of in-tree users" directly contradicts longstanding mainline Linux policy, and leads to spider-man meme situations.
[1] And I do mean _all_. Even the "optional" [2] components such as every device driver.
Posted Feb 26, 2025 17:33 UTC (Wed)
by pbonzini (subscriber, #60935)
[Link]
I expect that Rust will not break Linus's tree except in extremely rare cases that are more mistakes than policy, because the development process is already designed to allow and coordinate tree-wide changes (which aren't that frequent anyway).
Posted Feb 26, 2025 18:42 UTC (Wed)
by koverstreet (✭ supporter ✭, #4296)
[Link]
That's not the rule for maintainers, that's the rule for everyone because we work in a monorepo.
Maintaining a critical subsystem cannot give you absolute veto power over the rest of the kernel, it has _never_ worked that way.
There's no absolute rules here, except for maybe - try to work with your fellow engineers, be reasonable, and keep the whole thing working. No one person's interests or wishes override everyone else's, we have to balance everyone's priorities.
And to keep the spider man memes going, with power comes responsibility.
Posted Feb 27, 2025 9:49 UTC (Thu)
by amarao (guest, #87073)
[Link] (10 responses)
As far as I know the story, there is a condition for Rust code, that maintainer can break them and change thing without thinking about Rust problems, and Rust people will fix their code to match breaking changes. So, it's not a 'usual in-tree user'. But, perhaps, the mere existence of Rust code was unpleasant.
Posted Feb 27, 2025 11:00 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (9 responses)
If a C maintainer changes an API, it's now no longer enough to update all the (C) callers of that API, the guy now has to document the API as well!
Seriously? And they expect the lack of documentation to be acceptable?
Cheers,
Posted Feb 27, 2025 11:21 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (8 responses)
How often has our dear editor remarked that some fancy new merged behavior of the kernel is "rigorously undocumented" (or along those lines)?
Posted Feb 27, 2025 11:54 UTC (Thu)
by amarao (guest, #87073)
[Link] (7 responses)
Some things are just plainly not documented. And I'm not talking about sysctl.osbcure.mode, I'm talking about big things like nftables. There is a wiki, but the kernel itself (repo) literally have nothing about it. Not how to do it, not how to program it. There is just few mentioning about netfilter, and that's all.
Posted Feb 27, 2025 13:34 UTC (Thu)
by daroc (editor, #160859)
[Link] (6 responses)
And, if you think you can do that documentation in the form of an LWN.net article of about 1500 words, we will even pay you for it. See the "Write for us" link in the sidebar. Lots of the kernel's official documentation links to LWN.net articles, so it's not without precedent.
Posted Feb 27, 2025 14:04 UTC (Thu)
by amarao (guest, #87073)
[Link] (5 responses)
But we definitively should try. Better to have incomplete and jerky docs, than serene and concise "no docs".
Posted Feb 27, 2025 14:53 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (4 responses)
Apologies for getting on my high horse here - but has anybody noticed what happened to kernel raid wiki? It was aimed at USERS - people running raid, people who didn't know what raid was, people who wanted to set up a system. It quite deliberately did NOT attempt to duplicate the official kernel documentation.
So someone came along, archived it with a big notice saying "this is obsolete, refer to the official kernel documentation if you need anything". WTF!!!
For the target readers of the wiki, the official kernel documentation is probably written in something worse than double dutch !!!
And this is a major problem with modern documentation - it usually completely ignores the user and is written BY experts, FOR experts. Which is why most user documentation is a case of the blind leading the blind. (My new TV is a case in point - it's a complicated computer, and the user documentation consists pretty much entirely of "plug the power lead here, the network cable there, and the aerial this other place". There's loads of fancy stuff we don't have a clue how to use!)
PLEASE KERNEL GUYS - *DON'T* piss off users who are trying to teach people how to USE your software. Without users, there's no point you writing it !!!
Cheers,
Wol
Posted Feb 28, 2025 8:48 UTC (Fri)
by taladar (subscriber, #68407)
[Link] (3 responses)
However having expert documentation for experts would be a good first step at least to allow someone else with a different skill set to write documentation for users. Having no documentation at all requires both the skill set to read undocumented code (which is much harder than writing undocumented code) and the skill set to write good documentation in the same person.
Posted Feb 28, 2025 14:06 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
Posted Feb 28, 2025 15:44 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (1 responses)
Actually no. From the user's PoV, (a) that documentation is probably written in double dutch, and (b) addresses completely the wrong problem, anyway. If I want to learn to be a chauffeur, why on earth do I want a detailed schematic of an Internal Combustion Engine? (Yes, knowing that schematic might be useful, but it's completely orthogonal to the problem at hand.)
The gap between the maker's understanding, and the user's understanding, of any product grows wider much faster than I suspect many of us here realise. A good maker is curious about what they're making. A user usually cares very little beyond "how do I get this to work" (because they don't have time for much more). Assuming the documentation will be cross-comprehensible between the two groups is asking for problems ...
Cheers,
Posted Feb 28, 2025 18:34 UTC (Fri)
by draco (subscriber, #1792)
[Link]
It's not saying that "expert documenting for experts, the end" = "user documentation, though of lower quality"
It's saying that "expert that's bad at user documentation documenting for experts that can't figure it out from raw code" leads to "experts that are good at user documentation creating decent user documentation" with higher probability than the alternative.
Now, if you're saying that documenting how it works instead of intended behavior for end users doesn't necessarily help describe the intended behavior, that's a fair statement, but I'd argue that if you don't have both, you don't have adequate expert documentation either.
Posted Feb 27, 2025 11:27 UTC (Thu)
by CChittleborough (subscriber, #60775)
[Link]
Humans have a tendency to leap to negative conclusions. We should all fight that tendency.
Posted Feb 25, 2025 21:39 UTC (Tue)
by PeeWee (guest, #175777)
[Link] (1 responses)
Posted Feb 26, 2025 13:26 UTC (Wed)
by daroc (editor, #160859)
[Link]
Posted Feb 25, 2025 23:12 UTC (Tue)
by alphyr (subscriber, #173368)
[Link]
Posted Feb 26, 2025 0:24 UTC (Wed)
by jmalcolm (subscriber, #8876)
[Link]
Posted Feb 26, 2025 1:07 UTC (Wed)
by Phantom_Hoover (subscriber, #167627)
[Link] (11 responses)
Posted Feb 28, 2025 11:58 UTC (Fri)
by LtWorf (subscriber, #124958)
[Link] (10 responses)
Posted Feb 28, 2025 13:30 UTC (Fri)
by pizza (subscriber, #46)
[Link] (6 responses)
This is my take too.
...Replace the word "Rust" with anything else, and there's a long, long, long history of Linus utterly flaming folks for it.
If those "reassurances" are true, they represent a _massive_ change to how the process of development and maintainership is supposed to work. The (now-former) processes weren't arbitrary; they were carefully honed and battle-tested in no small part to prevent "issues down the line".
Posted Feb 28, 2025 17:41 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (5 responses)
I think you're right ... (not the way you think you are)
> The (now-former) processes weren't arbitrary; they were carefully honed and battle-tested in no small part to prevent "issues down the line".
At which they failed miserably. Rust *FORCES* you to document your interfaces. The fact this is not true of C leads to the steady stream of CVEs and exploits all C projects seem to suffer from, as people on both sides of the interface mis-understand each other (and you can see that here in this storm).
Linus has learned that not documenting things is costly. I know that writing good documentation is difficult (and costly) - I can understand people not wanting to do it. Unfortunately, the attitude of "I know what I'm doing, why do I need to tell other people" is no longer acceptable. I've spent most of today getting extremely frustrated with (a) users who can't explain what they want, and (b) Excel formulae which can't explain what they're doing! Oh for some decent documentation! Excel actively frustrates attempts at decent documentation!
At the end of the day, Christoph has paid the price for not working well with others. Linus has always been a good people manager, and maybe he's now realising that prima donnas are more trouble than they're worth (or maybe he always knew that, he may just be being forced to face up to it).
Cheers,
Posted Feb 28, 2025 19:23 UTC (Fri)
by pizza (subscriber, #46)
[Link] (4 responses)
(Funny you say that. Linux is routinely held up as the most successful software engineering project of all time, in no small part due to the development methodology and processes that you're calling a miserable failure)
Meanwhile, Rust, in of itself, does not force anything of the sort; One can easily commit all manner of horrible sins with Rust. What gets committed depends on what the project considers acceptable. The same goes for interface documentation.
If Linus wants to change the development model/processes of Linux, then he needs to be explicit about it, and the discussion can revolve around the pros and cons of those changes, and from there the pros/cons of various approaches.
But this way is bass-ackwards and reeks of disingenuineness. Be honest about your intentions and goals up front, because the *only* logical outcome of this entire effort is Rust-in-the-core-kernel and completely deprecating (new) C. Anything less is gives you the worst of both worlds.
Posted Mar 1, 2025 22:37 UTC (Sat)
by raven667 (subscriber, #5198)
[Link]
I don't think this is accurate, it's not that black and white, either/or, win/lose, I would take it at face value that introducing Rust to the kernel is a hopeful experiment and that new drivers can be written in C or Rust and that while Rust is experimental that Rust maintainers would be responsible for keeping it in synch with other kernel interfaces as they change, not just the person making interface changes. I would expect that both styles will coexist for a long time and that only if the predominant consensus of the kernel developer community changes would new work in C be *forbidden*. Even after Rust is no longer experimental I think that there is still going to be a long period where predominantly Rust developers will be needed to maintain internal bindings and wrappers for the predominantly C core, and having C developers update Rust bindings depends on how many see value in spending time learning Rust.
Posted Mar 1, 2025 22:57 UTC (Sat)
by khim (subscriber, #9252)
[Link]
It kinda-sorta does. Lifetime markup may be perceived as part of the code, but in reality it's the documentation. Proof: mrustc ignores it yet generates valid code. What I find really strange is such an active resistance to it. Linux was doing that same thing for years with sparse. Rust just turns that same thing (that was part of Linux development process for more than 20 years!) “up to eleven”.
Posted Mar 2, 2025 0:41 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (1 responses)
If you want honsety, don't accuse others of dishonesty! There's far too many people who take the attitude "I've made up my mind, i don't care about the facts!". What's that saying? "What you see in others is a mirror of yourself"?
I'll be honest with you - I suspect that what you claim will turn out to be the future. I do NOT think that is the aim of this "Rust in the kernel" experiment, I just expect it will be what ends up happening. Because when people realise they can write a mostly-bug-free driver in Rust in two weeks, but it takes two years to write a similar driver in C, they will refuse to use C.
Which is why the Rust refuse-niks are making such a fuss. They can see the writing on the wall just as clearly as you or me, and they don't want that future.
Cheers,
Posted Mar 2, 2025 0:51 UTC (Sun)
by corbet (editor, #1)
[Link]
Posted Feb 28, 2025 19:27 UTC (Fri)
by draco (subscriber, #1792)
[Link] (2 responses)
But another part of it could easily be to give Rust more time to prove its worth (or lack). For example, meaningful metrics on CVE/bug counts correlated to changes due to adding Rust (whether it's drivers being written in it or API correctness fixes/documentation to support it) would have a significant impact on the discussion. Or improvements in review bandwidth for maintainers that decide to embrace it, or the lack of that. Or coming through with better platform support and feature stabilization (or not).
It's okay to delay arguments if doing so gives everyone better information & circumstances and doesn't make things substantially worse in the meantime. It's avoidance for the sake of it while letting things get worse (the typical situation) that's bad.
Posted Feb 28, 2025 19:43 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (1 responses)
"Don't let the facts spoil a good argument". I think we already have those metrics.
The Rust video driver(s) have pretty much 0 CVEs I believe, and were written in much less time than equivalent C drivers. I expect the stats will be the same for other Rust drivers.
Cheers,
Posted Feb 28, 2025 19:59 UTC (Fri)
by corbet (editor, #1)
[Link]
Posted Feb 26, 2025 2:01 UTC (Wed)
by dowdle (subscriber, #659)
[Link] (44 responses)
Linux surpassed 40 million lines of code a while back and has consistently broken all records and norms... so those who proclaim that multi-language projects do worse may be right... or maybe not. One of the common knowledge tenants of development is you have to be prepared to throw the first version away and start over. That has been very common with various things in the kernel that didn't work out. They got yanked and replaced by something better. Rust may end up being an experiment that didn't work out... or maybe not. Ups and downs are to be expected. Who remembers how long it took the 2.4.x kernel to stabilize? A lot is going to change over the next 20 years as those who manage things now will have long moved on assuming they are even still on this earth. The next gen gotta next gen.
Posted Feb 26, 2025 3:07 UTC (Wed)
by dralley (subscriber, #143766)
[Link] (1 responses)
All this is to say that it's not a bad bet long-term, but also it's not actually necessary to touch the core kernel to make a big dent in those 40 million lines. But if the pipeline for new kernel developers is predominated by Rust, then yes it probably will make it's way into the kernel over time.
Posted Feb 26, 2025 9:11 UTC (Wed)
by taladar (subscriber, #68407)
[Link]
So I would really expect the core subsystems to get bindings (or Rust replacements once it is clear that Rust is necessary to build the kernel anyway) first before the majority of existing drivers are replaced by Rust ones, especially considering how the Rust community tends to lean towards thinking things through and doing things in the right order with the RFC process for language changes.
Posted Feb 26, 2025 6:58 UTC (Wed)
by Alterego (guest, #55989)
[Link] (7 responses)
It seems rust is good for
Posted Feb 26, 2025 8:46 UTC (Wed)
by jengelh (guest, #33263)
[Link] (6 responses)
Posted Feb 26, 2025 10:13 UTC (Wed)
by danieldk (guest, #27876)
[Link] (5 responses)
Posted Feb 26, 2025 10:56 UTC (Wed)
by jengelh (guest, #33263)
[Link] (4 responses)
I guess we'll find out sooner or later, for the lovely price of one Linux project. And perhaps we can then tell everybody "I told you so" (or not).
Posted Feb 26, 2025 12:12 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (1 responses)
And how many (especially language) projects are bootstrapped in one language, and then rewritten in another?
I suspect the big impact Rust will (and already is) hav(ing) on the kernel, is to force people to clearly define the interfaces. And that has to be a good thing, no?
Cheers,
Posted Feb 26, 2025 20:56 UTC (Wed)
by edomaur (subscriber, #14520)
[Link]
typically Rust, which was originally written in OCaml :-D
Posted Feb 26, 2025 12:19 UTC (Wed)
by danieldk (guest, #27876)
[Link]
Posted Feb 28, 2025 0:37 UTC (Fri)
by rgmoore (✭ supporter ✭, #75)
[Link]
Posted Feb 26, 2025 10:12 UTC (Wed)
by butlerm (subscriber, #13312)
[Link] (19 responses)
Posted Feb 26, 2025 11:13 UTC (Wed)
by farnz (subscriber, #17727)
[Link]
As a result, the fork is either going to be neutral (no gain for the mainline, no loss either, since the people working on it wouldn't work on mainline if they had to deal with mainline's choice of languages), or beneficial (since mainline can take the improvements from them).
Posted Feb 26, 2025 12:18 UTC (Wed)
by tialaramex (subscriber, #21167)
[Link] (17 responses)
It's a community which only wants to write the happy path. Exceptions enable dilution of responsibility. If I write C++ code which just throws in the unhappy path and you write C++ code which calls my function, both of us can claim at review that it wasn't our job to handle the error. Somebody else should do that, the happy path code I wrote was difficult enough. In Rust whoever panics gets to explain why, and code where nobody handled the error case at all doesn't compile.
That doesn't make the handling magically correct - but it's much less likely that some of the really wild effects happen when you know you're writing error handling code, than when the "handling" is the consequence of a missed check.
I can believe a "C only Linux" fork could exist, particularly if the way Linux gets to 100% Rust platform support is via removing some older platforms some years in the future. If you're involved in maintaining Linux for a CPU architecture that hasn't been made since last century you might well have zero interest in Rust and plenty of reason to fork the last Linux which built correctly for your favourite machine.
Posted Feb 26, 2025 12:50 UTC (Wed)
by excors (subscriber, #95769)
[Link]
That does cause a bit of friction when some parts of the C++ standard library and language are designed around the assumption that you have exceptions, but in practice it works okay (or at least it's no more problematic than several other aspects of C++).
I can't imagine the Linux kernel actually adopting C++ though, because it would have pretty much all the same technical challenges and cultural pushback as Rust, with significantly fewer benefits to make it seem worthwhile.
Posted Feb 26, 2025 14:10 UTC (Wed)
by butlerm (subscriber, #13312)
[Link] (10 responses)
It like "oh well, just ship this code or release it into production because the catch all exception handler will handle the problem and the user can either try again or we can fix any issue we find or someone reports after the fact in a month or two or maybe sooner if it is really serious." And that is if the problem ever gets fixed at all, within the lifetime of the project, the product, the service, the volunteers (where applicable), managers, leaders, or the developers in question.
I used to write video games in C and assembly language and in my view good code should perform according to specification and be usable a century from now if committed to ROM and sold on store shelves or shipped in products that way. Does anyone doubt that most Nintendo, Sega, or Atari 7800 games will actually work with the appropriate hardware without major malfunctions decades from now? What about something like Netware (which was originally mostly written in 80386 assembly language) or the Amiga operating system (originally written in a mixture of C, BCPL, and 68K assembly language) or a number of other things, at least if deployed into a non-hostile environment?
You can see this problem in web applications written in Javascript all the time these days, especially on the web sites of banks that are not among the largest in the country or on the websites of most non-bank credit card issuers and lenders as well. I use websites on a regular basis where it is a fifty fifty chance that a login with the correct credentials supplied will actually succeed. And that goes for many other actions as well where the user is often required to do things like put in their credit card information to make a payment twice because of mysterious error occured problems that are cured simply by repeating the process. Or worse where a payment will not go through at all for weeks for other and never documented reasons not explained to the user. There is a major funds transfer webapp whose name you would all recognize that often behaves that way these days.
I believe that this is likely and in large part the result of libraries and code included in many modern Javascript applications that is are so extensive that either the exceptions are undocumented or you have to be an expert to handle them properly, and often entry level developers are not given enough time or resources to fix the problem. That is my experience for a few years when I was in the unfortunate position of having to maintain and develop code for a moderately sophisticated web application that was originally programmed to use Javascript only where necessary. When you have a dozen or more developers working on a project it is that much worse.
Anyway I am not surprised that a large team has a difficult time writing safe, correct, and decently performing C, C++, or Java code and don't really see any solution to that other than compilers and static analysis tools that identify the problems and produce and optimize code better than most developers can write by hand even after they stare at a problem for hours at a time. And in a project as big as the Linux kernel or something like a modern database or web browsers in my view it would be worth it to write static analysis tools that are hard coded if necessary to describe and enforce the constraints and rules that govern and apply to that project. A more general tool would be nice but apparently no one has written one yet - not one capable or used enough (apparently) to find the memory safety, locking, and other problems that still make it to deployed production kernels and have to be corrected after the fact in some cases after making national or international news due to problems that ought to be straightforward to analyze and detect.
Finally, although this almost certainly could not be done well or perfectly without heavy use of a new series of #pragmas or language extensions, my idea of a usable C or C++ compiler for a large project is one that refuses to compile code with undefined behaviors at all and require the developer to supply machine architecture and memory model targeting information to make those behaviors implementation or configuration defined if he or she wants to code almost anything that would otherwise result in undefined behavior that developers, vendors, and publishers of contemporary C and C++ compilers feel like they have a license to do anything for any reason such as delete entire code sections or skip appropriate if statements and safety checks as we have read about here from time to time with regard to C compiler optimizers causing serious problems for that reason. That is my two cents on this question.
Posted Feb 26, 2025 17:02 UTC (Wed)
by matthias (subscriber, #94967)
[Link] (9 responses)
C and C++ are not designed for this. Of course many cases of UB in C and C++ can be made implementation defined like integer overflow. But there are certain operations that are already UB on the machine code level:
The rust way of eliminating this kind of UB is the borrow checker that verifies at compile time that all references are sound. I really do not see any reason why this should be done in C or C++. If you add borrow checking to these languages they are not really the same language any more. Instead it would be much better to use rust directly which has been developed with this feasture in mind from the start.
Of course you can also use the good old -O0 approach of forbidding any optimizations that could result in UB. Except you also have to prevent UB on the machine code level. So all data accesses need to be atomic to prevent the CPU from doing crazy reorderings that are only sound in the absence of data races. The resulting performance would be worse than -O0.
Then there is the JVM way of doing things. Use a virtual machine and only code against the virtual machine. I do not see how this should work in the kernel. Also you need a language to write the virtual machine in.
In my opinion, rust already is this hypothetical C++ language without UB. Maybe at some point a clever person will find better alternatives, but I do not see a way to get rid of UB without the borrow checker. And it is really the borrow checker that defines what kind of language rust is. There are of course other differences, but the borrow checker is the most prominent one.
Posted Feb 27, 2025 0:50 UTC (Thu)
by neggles (subscriber, #153254)
[Link]
Well that's essentially what eBPF is, a virtual machine model and runtime environment that's suitable for use in the kernel. But the limitations of eBPF (and wasm for that matter, since a number of people are of the opinion that eBPF is "just worse wasm") show why that's not a practical model for the kernel as a whole.
As an aside, It might be an interesting project to try and write a microkernel almost entirely in eBPF, where (say) each individual microkernel service is a verified eBPF program and only the base message passing layer / helper functions aren't. Probably a Ph.D or two to be had there.
Posted Feb 27, 2025 1:56 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (6 responses)
This is not, strictly speaking, UB on the machine code level (at least in the general case). Depending on what you mean by "dangling," it could be well-defined as having either of the following meanings:
* You access some area of memory that you did not intend to access, but it's still within your address space. It is a perfectly well-defined operation. By assumption, it is not the well-defined operation that you intended to do, but that doesn't make it UB.
Remember, the heap is entirely a construct of libc, and the stack is mostly a construct of libc. The notion of "corrupting" either of them does not exist at the machine code level, because at the machine code level, memory is memory and you can read or write whatever bytes you want at whatever address you want in your address space. If you write the wrong bytes to the wrong address, and confuse some other part of your program, that's your problem. It does not magically cause the CPU to believe that your program is invalid, and to start doing things other than what your machine code tells it to do (or, in the case where the instruction pointer is no longer pointing at your original machine code, whatever the new code tells it to do).
> data races between two threads that access the same memory where at least one access is a write
Most architectures do not provide the full semantics of the C abstract machine under the as-if rule. That is, most architectures are at least willing to promise that you get some sort of value when you execute a data race. It's probably the wrong value, it's probably nondeterministic-but-not-in-a-cryptographically-useful-way, and it might not look like any of the values you would "logically expect" to see (e.g. because of tearing), but it is still not quite the same thing as UB.
UB specifically means "an optimizing compiler is allowed to assume that this never happens." It cannot exist at the machine code level, because there is no compiler. The closest we can get (within the context of the C and C++ standards) is implementation-defined behavior, which roughly translates from the standardese to "if this happens, we don't know what your system will do, but you can read your compiler, CPU, and OS manuals and figure it out if you really want to."
The C and C++ standards committees could, at any time, wave a magic wand and eliminate all UB from their respective languages. The reason that nobody is seriously advocating for that is not because it would not work, but because it would necessarily involve saying something like "all UB is hereby reclassified as IB," and (this general category of) IB is almost as much of a problem as UB. It also requires more documentation that nobody is actually going to read (do *you* want to carefully study a heap diagram for your particular libc's malloc, just so you know what happens if the heap is corrupted?), since all IB must be documented by each implementation (that's the "you can read your manuals" bit). So you'd lose a lot of optimization opportunities, and waste a lot of the implementers' time, in exchange for practically nothing.
Posted Feb 27, 2025 7:31 UTC (Thu)
by matthias (subscriber, #94967)
[Link] (2 responses)
So what are the semantics if you corrupt the stack and as a consequence jump to uninitialized memory or memory that you intentionally filled with random data to construct a key or even worse, memory filled by data controlled by an attacker. By the very definition of the instruction set anything can happen. You can call the resulting behavior however you like it to call, but it is essentially as undefined as it can possibly get.
And independently from how you call this behavior, this is clearly behavior that has to be avoided. Corrupting the stack clearly leads to exploits so this UB free variant of C(++) that we are talking about has to avoid it. So we are back at square one and we need the borrow checker to avoid this.
> UB specifically means "an optimizing compiler is allowed to assume that this never happens." It cannot exist at the machine code level, because there is no compiler.
But you have a very similar thing. An optimizing out-of-order architecture in the CPU. And this architecture makes similar assumptions on what can happen vs. what cannot happen. And again, you can call this behavior by different names, but it is essentially undefined. The CPU does not have the global sense of what is going on as the compiler, but messing up locally is enough to corrupt your data. And again, we effectively need the borrow checker to prevent data races. You can get rid of some of this behavior if you make each and every data access atomic, but this is obviously undesirable and I am not even sure that this would be enough.
> ...saying something like "all UB is hereby reclassified as IB," and (this general category of) IB is almost as much of a problem as UB.
It is essentially this, giving a new name to the same behavior. And it is not almost as much as a problem as UB, it is exactly as much of a problem as UB, as it can still lead to the same "if you do not follow the rules, I am allowed to format you hardrive" kind of behavior.
I would be absolutely in favor if the committee would eliminate all this nonsense kind of UB like integer arithmetics can be UB. But once you try to avoid the UB of dangling pointers and data races, you essentially have to construct a whole new language.
Posted Feb 28, 2025 8:24 UTC (Fri)
by anton (subscriber, #25547)
[Link] (1 responses)
Posted Feb 28, 2025 14:01 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
This has been argued, but it seems that no one has been able to show an instance of a compiler actually doing so. There are some solutions for it in the works (by saying "it's not allowed"), but it is practically an no-op as compiler have already behaved that way (though I am certainly not well-steeped in the matter for the details):
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/...
Posted Feb 27, 2025 12:31 UTC (Thu)
by excors (subscriber, #95769)
[Link] (2 responses)
I don't think that's really true. x86 and Arm have a number of things that are explicitly documented as "undefined" or "unpredictable" in the architecture references, and are not documented in CPU-specific manuals (as far as I can see), so you can't figure out the behaviour even if you really want to.
E.g. on x86 there's the BSF/BSR instructions ("If the content of the source operand is 0, the content of the destination operand is undefined"). Many instructions leave flags in an undefined state. With memory accesses to I/O address space, "The exact order of bus cycles used to access unaligned ports is undefined". Running the same machine code on different CPUs can give different behaviour, in the same way that running the same C code through different compilers (or the same compiler with different optimisation flags) can give different behaviour, with no documentation of what will happen, so I think it's reasonable to equate that to C's concept of UB.
(And the C standard says UB specifically means "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements", so it's not literally dependent on there being an optimising compiler.)
In practice, all the undefined/unpredictable CPU behaviour that's accessible from userspace is probably documented internally by Intel/Arm for backward compatibility and security reasons, since the CPU is designed to run untrusted machine code (unlike C compilers, which are designed to compile only trusted code). Armv8-A has a lot of "constrained unpredictable", where it's documented that an instruction might e.g. raise an exception or be treated as NOP or set the destination register to an unknown value but it isn't allowed to have any other side effects; but there's still plenty of non-constrained "unpredictable" behaviours. They're not fully unconstrained: they are documented as obeying privilege levels, but they can have arbitrary behaviour that would be achievable by any code within that privilege level, which is the same as C's UB in practice (e.g. UB in an application is not allowed to break the kernel). So I think it's very much like C's UB.
Posted Feb 28, 2025 8:55 UTC (Fri)
by taladar (subscriber, #68407)
[Link]
Posted Feb 28, 2025 9:18 UTC (Fri)
by anton (subscriber, #25547)
[Link]
(And the C standard says UB specifically means "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements", so it's not literally dependent on there being an optimising compiler.)
And while I agree with the idea that the C standards committee originally used "undefined behaviour" for cases where different implementations produced different behaviour, and where they did not have a more specific term (such as "unspecified value"), for several decades C compiler writers have used "undefined behaviour" to assume that this behaviour does not occur in the programs they support (unless the program is "relevant" for some reason), and there are people around that advocate the position that this has been the intent of "undefined behaviour" from the start.
And the latter form of "undefined behaviour" has quite different results from the former; e.g., with the latter form a loop with an out-of-bounds access can be "optimized" into an endless loop, while with the former form it will perform the memory access, either giving a result, or producing something like a SIGSEGV.
Posted Feb 27, 2025 3:18 UTC (Thu)
by raof (subscriber, #57409)
[Link]
There was a very interesting research OS at Microsoft that did exactly this - Singularity. A bit of bootstrap written in assembly, then jumping into a fully managed environment written in a variant of C# (called Sing#, which was the source of a bunch of C# features over time). Being fully managed meant that one of the core weaknesses of microkernels - context switch overhead - didn't exist, because it just didn't use the process-isolation hardware.
There's a really interesting series of blog posts about Midori, the very-nearly-complete project to replace Windows with a Singularity-derived codebase.
Posted Feb 26, 2025 22:23 UTC (Wed)
by jmalcolm (subscriber, #8876)
[Link] (3 responses)
Today, where Rust is going in is the drivers. Drivers are often fairly platform specific already. You can also have competing drivers for the same hardware if it turns out that there needs to be a mainstream and a niche option. But the fact that Apple Silicon users are writing their GPU drivers in Rust is not going to threaten Linux support for my niche architecture.
Rust support is also being added to GCC (gccrs). That may take a while to bake but I expect it to mature before we start seeing Rust in core Linux systems that are non-optional across platforms. In other words, Rust in the kernel will not threaten platform support as long as your platform is supported by either GCC or Clang (LLVM).
What platforms are we worried about that cannot be targeted by GCC or Clang? Can Linux run there now?
As a final back-stop, their is mrustc. This allows Rust to target any system with a capable C++ compiler.
By the time Rust becomes non-optional in Linux, Rust will be as portable as C or C++.
Posted Mar 1, 2025 18:32 UTC (Sat)
by mfuzzey (subscriber, #57966)
[Link] (2 responses)
This applies to virtually all drivers for hardware that isn't in the SoC itself (eg chips connected to the CPU using busses like I2C / SPI / PCI / USB ).
Even when the hardware is actually inside the SoC it's quite common for IP blocks to be reused in multiple SoCs, even ones from different manufacturers (because manufacturers often buy the IP for an ethernet controller, USB controller or whatever and integrate it in their SoC). In that case the register interface is the same so the driver code is the same but the registers will be at different addresses (and that's taken care of by injecting the appropriate base address by DT / ACPI)
So, in many cases, having drivers in Rust will impact Linux support for platforms that don't yet have a Rust implementation. And while it is indeed possible to have competing implementations this usually frowned upon in the kernel for duplication / maintenance reasons and usually exists only temporarilly.
Posted Mar 3, 2025 10:24 UTC (Mon)
by taladar (subscriber, #68407)
[Link]
Posted Mar 5, 2025 0:59 UTC (Wed)
by edgewood (subscriber, #1123)
[Link]
Posted Feb 27, 2025 0:44 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Hm. Right now Rust is missing the following in-tree archs: sh, parisc, openrisc, nios2, microblaze, csky, arc, alpha.
Out of these architectures, only sh is still being manufactured. And maybe arc (from Synopsys). I'd be surprised if these architectures stay in-tree by the time Rust becomes mandatory. Except for Alpha, people love it for some reason.
Posted Feb 26, 2025 17:28 UTC (Wed)
by magnus (subscriber, #34778)
[Link] (13 responses)
Posted Feb 26, 2025 17:49 UTC (Wed)
by mb (subscriber, #50428)
[Link] (11 responses)
The key concept here is to create zero-cost abstraction layers just above your unsafe hardware and to implement basic primitives in partial unsafe code and use these safe primitives in your implementation.
It's easily possible to write bare-metal Rust code without a single line of unsafe code.
The PAC is an extremely simple and mostly auto-generated zero-cost abstraction of the microcontroller hardware. The HAL is one layer above that putting together higher level hardware abstractions with only little unsafe code. Think about driver code for hardware primitives like I2C, SPI, etc...
Typical kernel code is *much* more high level than that. Even in the core kernel.
With these concepts it's easy to write irq-handling, scheduling, mm, traps, etc... in safe Rust code.
*PAC = Peripheral Access Crate.
Posted Feb 26, 2025 21:56 UTC (Wed)
by Wol (subscriber, #4433)
[Link]
Eggsackerly.
All you need is a properly defined API. And that's all the Rust guys were asking for. All I care about when using 3rd party code is that I have a definition of the interface I can comply with. Anything else is an opaque box I don't want to have to give a monkeys about.
Lack of such interfaces generally indicates badly designed (or implemented) spaghetti code. One only has to think back to Alan Cox and the tty drivers (okay, the younger linux kernel guys are probably too young to remember ... :-).
The more Rust makes people define their interfaces, the easier it will be to replace chunks of the kernel - with C++, PL/1, Basic, ... or Rust. Doesn't matter. The cleaner the boundaries, the easier it will be to replace bits.
Cheers,
Posted Feb 26, 2025 22:01 UTC (Wed)
by magnus (subscriber, #34778)
[Link] (9 responses)
Posted Feb 27, 2025 3:03 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (7 responses)
Rust is generally pretty good at expressing rules like the following (which your wrapper would selectively combine in such a way as to prohibit incorrect use of the underlying unsafe primitive, whatever that may happen to look like):
* Before you can [do the thing], you must [do the other thing].
These are type-safety rules, so they officially apply to objects rather than verbs. But it is easy enough to tie an object to a verb (by making it a required argument or the return type, as appropriate), and it is also easy enough to make zero-cost wrappers or zero-byte objects in Rust, so this is not a real restriction.
Rust can fully express the following, but it is painfully complicated to do so (see std::pin or core::pin, and note there are multiple people looking into ways of making this less painful):
* [The thing] must live at a fixed address, and may not be relocated (moved or copied) under any circumstances.
Rust can express the following in most cases, but there is no direct support for it, and the most common workaround (an API like std::thread::scope) is a bit more involved and restrictive than it ideally should be (search for "affine types" if you want to find the people who are looking into improving this one):
* After you [do the thing], you must [do the other thing].
Posted Feb 27, 2025 3:17 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link]
Posted Feb 27, 2025 4:31 UTC (Thu)
by draco (subscriber, #1792)
[Link]
I don't find the terminology terribly helpful either 😂
Posted Feb 28, 2025 17:23 UTC (Fri)
by magnus (subscriber, #34778)
[Link] (4 responses)
Posted Mar 3, 2025 11:23 UTC (Mon)
by laarmen (subscriber, #63948)
[Link]
I'm pretty sure such situations will come up, but I don't see this as a problem. Again, the goal is not to avoid `unsafe` entirely. The entire point of `unsafe` is to have a way to communicate that there are invariants that cannot be upheld by the compilers that guarantee the safety of operations that are in general not safe.
Posted Mar 3, 2025 12:02 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (2 responses)
It does tend to be true in practice that you can encapsulate your unsafe blocks behind zero-overhead safe abstractions (e.g. having a lock type that is unsafe to construct because it depends on all CPUs running kernel code, not userspace), but that's an observation about code, not a requirement to benefit; even without that encapsulation, you benefit by reducing the scope of unsafe blocks so that it's easier to verify the remaining bits.
Posted Mar 4, 2025 3:50 UTC (Tue)
by magnus (subscriber, #34778)
[Link] (1 responses)
For example if the ownership or concurrency management of something central like a struct page or something depends on a lot of tangled state that the Rust compiler can not verify you can create abstractions that modify it but then you also require that the other conditions are satisfied. If they are called at the wrong time they would still compile but yield unsafe behavior.
On the other hand the hard distinction between safe/unsafe and logic bug may not make much sense deep in the kernel anyway as any logic bug would be both a safety and functional issue.
Posted Mar 4, 2025 10:23 UTC (Tue)
by farnz (subscriber, #17727)
[Link]
And even in the kernel, the hard distinction makes a lot of sense; the point of unsafe is that the "unsafe superpowers" allow you to cause "spooky action at a distance" by breaking the rules of the abstract machine. The core purpose of the unsafe/safe distinction is to separate out these two classes of code, so that you can focus review efforts on ensuring that unsafe code doesn't break the rules of the machine, and hence that bugs can be found by local reasoning around the area of code with a bug.
The problem that Unsafe Rust and C both share is that a bug doesn't have to have local symptoms; for example, a bug in an in-kernel cipher used for MACsec can result in corruption of another process's VMA structures, causing the damaged process to crash for no apparent reason. That means that you have to understand the entire codebase to be certain of finding the cause of a bug; Safe Rust at least constrains the impact of a bug in the code to the things that code is supposed to be touching, so you can (e.g.) rule out all parts of the Safe Rust codebase that don't touch VMAs if what you're seeing is a corrupt VMA.
Posted Feb 27, 2025 5:50 UTC (Thu)
by mb (subscriber, #50428)
[Link]
These things already have abstraction layers in the C code today.
If you have SMP concurrency in C code today, you can't express that in plain C. You have to use the abstraction layers that the kernel provides. Their inner workings mostly come from "arch" where the hardware specific magic happens.
>Linux is a lot more complex than a basic bare metal os.
Yes. But I wasn't talking about the OS itself. I was talking about the primitives required to build an OS.
Rust is *not* about avoiding unsafe code.
Unsafe blocks are not magic. They don't let you magically write low level code and ignore all Rust safety rules.
Posted Feb 26, 2025 18:10 UTC (Wed)
by asahilina (subscriber, #166071)
[Link]
https://github.com/AsahiLinux/linux/blob/gpu/rust-wip/dri...
The idea that low-level memory management ends up being one big unsafe blob is a myth. Even for low-level mm/kernel code, you still only end up with the unsafe bits walled off into small sections. And you can build safe abstractions around them as necessary, like I did there with `with_pages()` which is used as the basic primitive for PTE walking and mutation (that's a generic function, so it gets monomorphized/optimized into a separate variant for each usage in the rest of the file, which means I only have to write the error-prone page table walking code once for every possible op).
For core IRQ handling you don't even need any unsafe code at all (in principle, other than the vectors of course but that ends up written in assembly anyway). There's nothing memory-unsafe about IRQ management.
Of course, you can introduce memory safety issues outside the unsafe blocks when you're writing core kernel code (and some driver code), such as by mapping the wrong physical memory pages into the page tables. Rust doesn't protect against that, but it does provide many more convenient tools to make things like address math less error-prone and build safer zero-cost abstractions to handle things, so you still get a lot of benefits over C.
This is not unique to kernels either. In general, soundness is defined/expected on crate/module boundaries, so the safety of unsafe blocks within a crate is allowed to be conditional on invariants maintained by "safe" code (in this case, the safety of the unsafe blocks in PT management is conditional on the incoming address inputs being correct). Kernel code is more challenging than most Rust code in this regard, but it's not a total free-for-all like C at all. You still get a lot of mileage out of not having unsafe code that can cause memory unsafety in parts of the code that aren't doing low-level things, and you also get a lot of mileage out of the powerful abstractions.
As long as your interface boundaries are as safe and sound as possible (ideally fully sound at the lowest level of abstraction you can manage), the potential for bugs goes way down and the reliability of code review goes way up. For example, in my GPU driver, the GPU MM memory unsafety is limited to the page table code and the next higher level module which handles tracking mappings at the object level (mmu.rs). Above that, excluding some rare special-case operations like mapping a raw I/O page, the rest of the driver has no ability to accidentally supply the wrong physical address or free a GPU memory page without first unmapping it and flushing the TLBs. The Rust lifetime and ownership rules guarantee that if a GPU object is being freed, it can have no active GPU mappings.
Posted Feb 26, 2025 8:59 UTC (Wed)
by jezuch (subscriber, #52988)
[Link]
More like "passively-aggressively", that commit looks like to me.
I hope Christoph Hellwig finds peace now :) The kernel moves on, despite the fear mongering that everything will fall apart without him.
Posted Feb 26, 2025 9:12 UTC (Wed)
by hailfinger (subscriber, #76962)
[Link] (1 responses)
Posted Feb 26, 2025 11:29 UTC (Wed)
by tlamp (subscriber, #108540)
[Link]
And while I might not agree with all his opinions and stances, especially as we already use Rust successfully at work a lot, I also am a bit appalled by the amount of (seemingly hive-mind) toxicity that one can find in the various threads and also comment section of platforms like here.
I mean sure, everyone should feel free to disagree with other acts or opinions, but if one cannot do that without personal attacks, "schadenfreude" or ignoring all the (past) effort that maintainers poured into projects by basically saying "good riddance" then I'm really unsure about how anybody can think that adding additional fuel here is better than just being silent, especially if one is not more directly involved in these things anyway.
Posted Feb 26, 2025 10:15 UTC (Wed)
by sdalley (subscriber, #18550)
[Link] (1 responses)
Posted Feb 27, 2025 11:59 UTC (Thu)
by taladar (subscriber, #68407)
[Link]
rust?
rust?
rust?
rust?
> You are not forced to take any Rust code, or care about any Rust code in the DMA code. You can ignore it.
>
> But "ignore the Rust side" automatically also means that you don't have any *say* on the Rust side.
>
> You can't have it both ways. You can't say "I want to have nothing to do with Rust", and then in the very next sentence say "And that means that the Rust code that I will ignore cannot use the C interfaces I maintain".
rust?
rust?
rust?
rust?
Wol
rust?
rust?
rust?
rust?
rust?
rust?
rust?
rust?
[2] Rust is only "optional" until something you need is written in it. Such as a display driver for those extremely rare Arm Macs.
rust?
rust?
rust?
rust?
Wol
rust?
rust?
rust?
rust?
rust?
rust?
rust?
rust?
Wol
rust?
Or maybe ...
Maintainer != Contributor
Maintainer != Contributor
Not only DMA-mapping, but configfs too
Welcome Marek
Well
Well
Well
Well
Wol
Well
Well
> Meanwhile, Rust, in of itself, does not force anything of the sort
Well
Well
Wol
Wol, I have asked you this before. Please assume good faith on the part of the developers involved in these discussions. Please do not attribute such base and cowardly motives to people who have worked for years to build the kernel you use and who are concerned about its ongoing maintenance. The people who are worried about Rust may well turn out to be wrong (I suspect they will), but they are not driven by fear of developers who can write a driver faster than they can. Seriously. That kind of stuff just makes the conversation harder for no good purpose.
Assume good faith, please
Not all delay is bad
Not all delay is bad
Wol
The "Rust video drivers" are not part of any kernel release at this point, how would you expect them to accumulate CVEs? Please, arguing for the sake of argument doesn't help anybody.
Not all delay is bad
A prediction with no data to support it
A prediction with no data to support it
A prediction with no data to support it
A prediction with no data to support it
- drivers
- get fresh blood in kernel ecosystem
- keep an open mind collectively (like described by Mr Greg K-H)
A prediction with no data to support it
- causing a stir
A prediction with no data to support it
A prediction with no data to support it
A prediction with no data to support it
Wol
A prediction with no data to support it
A prediction with no data to support it
In fact, the Linux kernel is one of the projects written in C and assembly. It doesn't run into the multi-language problem because the assembly is restricted to places where C is impractical, and nobody is threatening to rip out functioning C code to replace it with assembly.
A prediction with no data to support it
A prediction with no data to support it
The limiting factor on forks is always going to be volunteer power. If there's enough people to maintain a "no Rust" (or "no C", or "no C++", or "no Zig", or "only FSF-approved licensing") fork of the kernel, it'll happen; further, the kernel's development methodology (going right back to the Alan Cox forks of the kernel) has always demonstrated a talent for merging in bits from forks wherever there's an advantage to doing so.
A prediction with no data to support it
A prediction with no data to support it
A prediction with no data to support it
A prediction with no data to support it
A prediction with no data to support it
- dereferencing a dangling pointer
- data races between two threads that access the same memory where at least one access is a write
- probably a few more (but not many)
A prediction with no data to support it
A prediction with no data to support it
* You trap, and the OS does something about it (in practice, usually it kills the offending process, but page faults can use a similar or identical mechanism depending on the architecture, and a page fault is not even a real error). This is also a perfectly well-defined operation (regardless of how the OS decides to respond to it).
A prediction with no data to support it
Architecture, microarchitecture, and undefined behaviour
So what are the semantics if you corrupt the stack and as a consequence jump to uninitialized memory or memory that you intentionally filled with random data to construct a key or even worse, memory filled by data controlled by an attacker. By the very definition of the instruction set anything can happen.
Not at all. First of all, the architectural effects of every instruction up to that point continue to hold, while, e.g., in C++ undefined behaviour is reportedly allowed to time-travel. Next, in a well-designed architecture what happens then is defined by the actual content of the memory and the architecture description, which does not contain undefined behaviour (remember, we are discussing well-designed architectures). Maybe you as programmer do not deem it worth reasoning about this case and just want to put the label "undefined behaviour" on it, but as far as the architecture is concerned, the behaviour is defined.
An optimizing out-of-order architecture in the CPU.
The architecture does not specify out-of-order execution, on the contrary, it specifies that each instruction is executed one by one. There may be a microarchitecture with out-of-order execution like the Pentium Pro below it, or a microarchitecture with in-order execution like the 486, but the end result of executing a sequence of instructions is the same (except for the few cases where the architectures differ; IIRC the CMOVcc instructions were in the Pentium Pro, but not the 486).
And this [micro]architecture makes similar assumptions on what can happen vs. what cannot happen. And again, you can call this behavior by different names, but it is essentially undefined.
Computer architects have learned what later became Hyrum's law long ago, and therefore define completely (or almost completely for not-so-well designed architectures) what happens under what circumstances. Microarchitectures implement the architectures, and they do not assume that something cannot happen when it actually can. When the microarchitects fail at implementing the architecture, as with Zenbleed, that's a bug.
The CPU does not have the global sense of what is going on as the compiler, but messing up locally is enough to corrupt your data.
Microarchitectures with out-of-order execution do not commit any changes that do not become architectural, and therefore do not corrupt data (rare architecture-implementation bugs like Zenbleed excepted).
Architecture, microarchitecture, and undefined behaviour
A prediction with no data to support it
A prediction with no data to support it
A prediction with no data to support it
E.g. on x86 there's the BSF/BSR instructions ("If the content of the source operand is 0, the content of the destination operand is undefined"). Many instructions leave flags in an undefined state. With memory accesses to I/O address space, "The exact order of bus cycles used to access unaligned ports is undefined". Running the same machine code on different CPUs can give different behaviour, in the same way that running the same C code through different compilers (or the same compiler with different optimisation flags) can give different behaviour, with no documentation of what will happen, so I think it's reasonable to equate that to C's concept of UB.
C language lawyers make a fine-grained difference between different forms of lack of specification in the C standard. IIRC they have "unspecified value" for cases where the result of an operation is unspecified (as in the BSF/BSR case and the unspecified flags results). I think they do not have a special name for an unspecified order.
In practice, all the undefined/unpredictable CPU behaviour that's accessible from userspace is probably documented internally by Intel/Arm for backward compatibility and security reasons
Especially backwards-compatibility; the security benefits fall out from that. As for the bad design in the ARM architectures, maybe they have had too much contact with compiler people and become infected by them. I expect that at some point the implementors of ARM architectures will find that existing programs break when they implement some of the ARM-undefined behaviour in a way different than earlier implementations of that architecture, and that behaviour then becomes an unofficial part of the architecture, as for the Intel and AMD cases mentioned above. A well-designed architecture avoids this pitfall from the start.
A prediction with no data to support it
Then there is the JVM way of doing things. Use a virtual machine and only code against the virtual machine. I do not see how this should work in the kernel. Also you need a language to write the virtual machine in.
Rust will not reduce platforms
Rust will not reduce platforms
Rust will not reduce platforms
Rust will not reduce platforms
A prediction with no data to support it
A prediction with no data to support it
A prediction with no data to support it
There are more than enough examples for this.
Almost all of the unsafe code is in the PAC* and (if you use one) the HAL*.
*HAL = Hardware Abstraction Layer.
A prediction with no data to support it
> There are more than enough examples for this.
Wol
A prediction with no data to support it
A prediction with no data to support it
* After you [do the thing], you may no longer [do the other thing].
* Only one thread may [do the thing] at a time.
* If any thread can [do the thing], then no (other) thread is allowed to [do the other thing].
* [The thing] may not outlive [the other thing].
* You may only [do the thing] on the same thread that [did the other thing].
* Your callback function must [do the thing].
* You may only [do the thing] from a (specific) callback.
* Many variations of "you may only [do the thing] if I say so" (which could give rise to runtime checking, if you are so inclined).
A prediction with no data to support it
A prediction with no data to support it
A prediction with no data to support it
A prediction with no data to support it
The goal isn't to remove unsafe completely when you're writing a kernel; rather, you want to constrain it to small chunks of code that are easily verified by a human reader. For example, it's completely reasonable to require unsafe when you're changing paging related registers, since you're changing something underneath yourself that the compiler cannot check, and that can completely break all the safety promises Rust has verified.
Use of unsafe in kernel Rust code
Use of unsafe in kernel Rust code
You may have to weaken it as compared to #![forbid(unsafe_code)], but that's still a lot stronger than you get from plain C. Bubbling up unsafe to a high level is absolutely fine, though - it just tells callers that there's safety promises that Rust can't check, but relies on you checking manually instead.
Use of unsafe in kernel Rust code
A prediction with no data to support it
And in most cases Rust code can just use the existing abstractions (put another zero-cost or in some cases nearly-zero-cost safe Rust abstraction around them).
Primitives like dispatching an interrupt from the bare metal into a safe and well defined high level language routine.
Unsafe code is the tool to write your lowest level primitives that your whole stack builds upon.
All Rust safety rules also apply to unsafe blocks and they are all enforced by the compiler inside of unsafe blocks. Adding a unsafe to existing safe code changes nothing.
But unsafe blocks give you a couple of more tools (mainly raw pointers) that are needed to build safe abstractions (e.g. with safe references on the outside instead of raw pointers).
A prediction with no data to support it
Quiet quitting
Thank you, Christoph.
From an outsider's point of view, some of those cleanups probably have also benefited the effort to include Rust in the kernel because (some of) the burden of cleaning up interfaces before others could create Rust bindings was shouldered by Christoph.
Thank you, Christoph.
A sincere thank-you to Christoph for years of hard work in a difficult and critical area
A sincere thank-you to Christoph for years of hard work in a difficult and critical area
