A change in maintenance for the kernel's DMA-mapping layer [LWN.net]

rust?

Posted Feb 25, 2025 21:32 UTC (Tue) by amarao (guest, #87073) [Link] (29 responses)

So, he decided to step away, but not to tolerate existence of the Rust code (in a different subtree)?

rust?

Posted Feb 26, 2025 14:44 UTC (Wed) by wkudla (guest, #116550) [Link] (27 responses)

I am equally baffled by this because it's just petty bordering on childish. Rust in no way impacted his work or required him to do anything. He just didn't like it so quit in a tantrum.

rust?

Posted Feb 26, 2025 15:55 UTC (Wed) by pizza (subscriber, #46) [Link] (26 responses)

> I am equally baffled by this because it's just petty bordering on childish. Rust in no way impacted his work or required him to do anything.

You are missing a critical point here -- Linus has effectively stripped Hellwig of his authority/power as a maintainer of a critical subsystem. Why would anyone want to continue to be officially responsible for something they do not have the authority/power to make decisions over?

> He just didn't like it so quit in a tantrum.

Stepping down quietly is the polar opposite of a tantrum.

rust?

Posted Feb 26, 2025 16:28 UTC (Wed) by garyvdm (subscriber, #82325) [Link] (9 responses)

> Linus has effectively stripped Hellwig of his authority/power as a maintainer of a critical subsystem.

Ah... no. That's not how I read things.

Linus wrote:
> You are not forced to take any Rust code, or care about any Rust code in the DMA code. You can ignore it.
>
> But "ignore the Rust side" automatically also means that you don't have any *say* on the Rust side.
>
> You can't have it both ways. You can't say "I want to have nothing to do with Rust", and then in the very next sentence say "And that means that the Rust code that I will ignore cannot use the C interfaces I maintain".

Remember that the Rust DMA bindings being asked for merge sat *outside* the DMA subsystem.

Where is Linus say you no longer have authority/power over the DMA subsystem?

rust?

Posted Feb 26, 2025 20:39 UTC (Wed) by smurf (subscriber, #17840) [Link] (8 responses)

> Where is Linus say you no longer have authority/power over the DMA subsystem?

He doesn't. Mr Hellwig essentially wrote, "I won't work with the Rust people and I don't care about your Rust policy", Linus replied with "I don't care that you don't care", and Hellwig's reaction to that was not "OK let's see how things *actually* work out before I say 'I told you so'" but "OK fine then I quit".

He's fine to do that of course, nobody forces him to do anything, but the flip side is that nobody forces me (or anybody else of course) to have a particularly high opinion of somebody who chooses to deal with imaginary problems that way.

And yes they are imaginary problems at this point. They don't become non-imaginary unless and until there actually *is* a (technical) problem that needs maintainer interaction, and things have not progressed that far yet AFAIK.

rust?

Posted Feb 26, 2025 22:03 UTC (Wed) by jmalcolm (subscriber, #8876) [Link] (7 responses)

Nobody was forcing him to work with the Rust people.

He was demanding that they not exist, at least not officially, in the kernel.

He tried to say that they could not work with his code. Nobody was asking him to work with their code.

Linus did not have to strip him of authority OUTSIDE of dma. He never had any to begin with.

If he does not want to work on a project that allows Rust code, that is his choice. Let's call it what it is though.

rust?

Posted Feb 26, 2025 22:10 UTC (Wed) by pizza (subscriber, #46) [Link] (3 responses)

> Nobody was forcing him to work with the Rust people.

30+ years of Linux history shows that maintainers _are_ forced to work with the in-tree users of their subsystems.

Sorry.

rust?

Posted Feb 27, 2025 0:14 UTC (Thu) by Wol (subscriber, #4433) [Link] (2 responses)

That WAS the rule. If you break in-tree code you fix it.

But that was explicitly changed for Rust - if your C changes broke Rust, it was down to the Rust guys to fix it.

And as a maintainer of ANY subsystem, you need a clearly defined API. That's the problem here - the DMA code had (has) a contradictory API. Forget Rust, that's something that needs fixing.

The deal as it stands is that if your C changes break C, you're still expected to fix it. Sane programming demands that if you change an API you should document it. Leave Rust out of it, the deal is if you have an API, you should document it and then you can forget about Rust - it's their problem to follow your API.

Cheers,
Wol

rust?

Posted Feb 27, 2025 7:42 UTC (Thu) by Mook (subscriber, #71173) [Link] (1 responses)

I think the ultimate question comes down to: will Linus merge a pull request that breaks building the rust DMA bindings?

Down thread of this week's quotes of the week: https://lwn.net/ml/all/CAHk-=wjg1PJ81E23DB1QbvPBQ04wCf7mJ...

> The most common situation is that something doesn't build for me, [snip…]

> My build testing is trying to be wide-ranging in the sense that yes, I do an allmodconfig build on x86-64 [snip…]

rust?

Posted Feb 27, 2025 9:17 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

Theoretically yes according to the policy. In practice that's extremely unlikely to happen as it would be detected in linux-next.

rust?

Posted Feb 28, 2025 11:46 UTC (Fri) by LtWorf (subscriber, #124958) [Link] (2 responses)

> Nobody was forcing him to work with the Rust people.

True. The alternative being quitting. Which he did.

rust?

Posted Feb 28, 2025 11:49 UTC (Fri) by amarao (guest, #87073) [Link] (1 responses)

There going to be the moment, when the same may happen to C people.

It's a very slippery slope, when you reject people based on their preferences of language, mascot and deity.

rust?

Posted Feb 28, 2025 12:17 UTC (Fri) by LtWorf (subscriber, #124958) [Link]

> when the same may happen to C people.

It is happening to C people…

rust?

Posted Feb 26, 2025 16:30 UTC (Wed) by wkudla (guest, #116550) [Link] (15 responses)

> Why would anyone want to continue to be officially responsible for something they do not have the authority/power to make decisions over?

Could you expand on that? My understanding was that Hellwig would not be responsible for rust code in the slightest. He was opposing rust targeting his apis. But I might be missing something important here.

rust?

Posted Feb 26, 2025 16:57 UTC (Wed) by judas_iscariote (guest, #47386) [Link]

He is entirely right on stepping down because that will end up happening, complains WILL end in his plate, WILL make he waste time..maybe not now..not it will surely happen in the near future.

rust?

Posted Feb 26, 2025 17:28 UTC (Wed) by pizza (subscriber, #46) [Link] (13 responses)

> Could you expand on that? My understanding was that Hellwig would not be responsible for rust code in the slightest. He was opposing rust targeting his apis. But I might be missing something important here.

Maintainers have _always_ been responsible for all [1] in-tree users of their subsystems -- they change an API, all users need to be fixed up. Additionally, they're the point of contact of bug reports and other problems.

Saying "no, you're not responsible for _that_ class of in-tree users" directly contradicts longstanding mainline Linux policy, and leads to spider-man meme situations.

[1] And I do mean _all_. Even the "optional" [2] components such as every device driver.
[2] Rust is only "optional" until something you need is written in it. Such as a display driver for those extremely rare Arm Macs.

rust?

Posted Feb 26, 2025 17:33 UTC (Wed) by pbonzini (subscriber, #60935) [Link]

Rust is not really going to be optional. That's not what the policy means, it only says that updating bindings and/or Rust drivers for API changes falls on the shoulders of Rust maintainers. *If* the C-side maintainers want the Rust maintainers to do the work and won't even entertain the idea of sharing with them a topic branch, the breakage will be handled in linux-next.

I expect that Rust will not break Linus's tree except in extremely rare cases that are more mistakes than policy, because the development process is already designed to allow and coordinate tree-wide changes (which aren't that frequent anyway).

rust?

Posted Feb 26, 2025 18:42 UTC (Wed) by koverstreet (✭ supporter ✭, #4296) [Link]

> Maintainers have _always_ been responsible for all [1] in-tree users of their subsystems -- they change an API, all users need to be fixed up.

That's not the rule for maintainers, that's the rule for everyone because we work in a monorepo.

Maintaining a critical subsystem cannot give you absolute veto power over the rest of the kernel, it has _never_ worked that way.

There's no absolute rules here, except for maybe - try to work with your fellow engineers, be reasonable, and keep the whole thing working. No one person's interests or wishes override everyone else's, we have to balance everyone's priorities.

And to keep the spider man memes going, with power comes responsibility.

rust?

Posted Feb 27, 2025 9:49 UTC (Thu) by amarao (guest, #87073) [Link] (10 responses)

> Maintainers have _always_ been responsible for all [1] in-tree users of their subsystems -- they change an API, all users need to be fixed up. Additionally, they're the point of contact of bug reports and other problems.

As far as I know the story, there is a condition for Rust code, that maintainer can break them and change thing without thinking about Rust problems, and Rust people will fix their code to match breaking changes. So, it's not a 'usual in-tree user'. But, perhaps, the mere existence of Rust code was unpleasant.

rust?

Posted Feb 27, 2025 11:00 UTC (Thu) by Wol (subscriber, #4433) [Link] (9 responses)

To be honest, the problem is probably that the existence of Rust now means that the *lack* *of* *documentation* has become a merge blocker.

If a C maintainer changes an API, it's now no longer enough to update all the (C) callers of that API, the guy now has to document the API as well!

Seriously? And they expect the lack of documentation to be acceptable?

Cheers,
Wol

rust?

Posted Feb 27, 2025 11:21 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (8 responses)

> Seriously? And they expect the lack of documentation to be acceptable?

How often has our dear editor remarked that some fancy new merged behavior of the kernel is "rigorously undocumented" (or along those lines)?

rust?

Posted Feb 27, 2025 11:54 UTC (Thu) by amarao (guest, #87073) [Link] (7 responses)

That the thing which surprise me about the kernel the most. Documentation.

Some things are just plainly not documented. And I'm not talking about sysctl.osbcure.mode, I'm talking about big things like nftables. There is a wiki, but the kernel itself (repo) literally have nothing about it. Not how to do it, not how to program it. There is just few mentioning about netfilter, and that's all.

rust?

Posted Feb 27, 2025 13:34 UTC (Thu) by daroc (editor, #160859) [Link] (6 responses)

For those readers who may be unaware: Jon is the official documentation maintainer for the kernel, alongside everything else he does. And he's very friendly. So if you were looking to help the kernel project, but feel uncomfortable writing kernel C code, you could consider helping to clean up and expand the documentation.

And, if you think you can do that documentation in the form of an LWN.net article of about 1500 words, we will even pay you for it. See the "Write for us" link in the sidebar. Lots of the kernel's official documentation links to LWN.net articles, so it's not without precedent.

rust?

Posted Feb 27, 2025 14:04 UTC (Thu) by amarao (guest, #87073) [Link] (5 responses)

That's we should do. I feel I can write a bit, but it quickly gets to the point if I understand it well or not. You need to be at least a senior level to understand code well enough to document existing code.

But we definitively should try. Better to have incomplete and jerky docs, than serene and concise "no docs".

rust?

Posted Feb 27, 2025 14:53 UTC (Thu) by Wol (subscriber, #4433) [Link] (4 responses)

Don't forget, writing good documentation is a skill. And good coders are often crap documenters.

Apologies for getting on my high horse here - but has anybody noticed what happened to kernel raid wiki? It was aimed at USERS - people running raid, people who didn't know what raid was, people who wanted to set up a system. It quite deliberately did NOT attempt to duplicate the official kernel documentation.

So someone came along, archived it with a big notice saying "this is obsolete, refer to the official kernel documentation if you need anything". WTF!!!

For the target readers of the wiki, the official kernel documentation is probably written in something worse than double dutch !!!

And this is a major problem with modern documentation - it usually completely ignores the user and is written BY experts, FOR experts. Which is why most user documentation is a case of the blind leading the blind. (My new TV is a case in point - it's a complicated computer, and the user documentation consists pretty much entirely of "plug the power lead here, the network cable there, and the aerial this other place". There's loads of fancy stuff we don't have a clue how to use!)

PLEASE KERNEL GUYS - *DON'T* piss off users who are trying to teach people how to USE your software. Without users, there's no point you writing it !!!

Cheers,

Wol

rust?

Posted Feb 28, 2025 8:48 UTC (Fri) by taladar (subscriber, #68407) [Link] (3 responses)

To be fair writing documentation for a beginner as an expert is genuinely hard since the thought processes can be entirely different and you don't even consciously think about many of the things other people might not know that you have known for decades.

However having expert documentation for experts would be a good first step at least to allow someone else with a different skill set to write documentation for users. Having no documentation at all requires both the skill set to read undocumented code (which is much harder than writing undocumented code) and the skill set to write good documentation in the same person.

rust?

Posted Feb 28, 2025 14:06 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Agreed. One strategy I use is to at least get my brain dump down somewhere as a seed. Then try to prod others to go through the steps of the process via the docs and to involve me in review of the results. Any missed steps (because they're "implicit" to me) get written down for posterity.

rust?

Posted Feb 28, 2025 15:44 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)

> However having expert documentation for experts would be a good first step at least to allow someone else with a different skill set to write documentation for users.

Actually no. From the user's PoV, (a) that documentation is probably written in double dutch, and (b) addresses completely the wrong problem, anyway. If I want to learn to be a chauffeur, why on earth do I want a detailed schematic of an Internal Combustion Engine? (Yes, knowing that schematic might be useful, but it's completely orthogonal to the problem at hand.)

The gap between the maker's understanding, and the user's understanding, of any product grows wider much faster than I suspect many of us here realise. A good maker is curious about what they're making. A user usually cares very little beyond "how do I get this to work" (because they don't have time for much more). Assuming the documentation will be cross-comprehensible between the two groups is asking for problems ...

Cheers,
Wol

rust?

Posted Feb 28, 2025 18:34 UTC (Fri) by draco (subscriber, #1792) [Link]

Umm, reread the quoted text again.

It's not saying that "expert documenting for experts, the end" = "user documentation, though of lower quality"

It's saying that "expert that's bad at user documentation documenting for experts that can't figure it out from raw code" leads to "experts that are good at user documentation creating decent user documentation" with higher probability than the alternative.

Now, if you're saying that documenting how it works instead of intended behavior for end users doesn't necessarily help describe the intended behavior, that's a fair statement, but I'd argue that if you don't have both, you don't have adequate expert documentation either.

Or maybe ...

Posted Feb 27, 2025 11:27 UTC (Thu) by CChittleborough (subscriber, #60775) [Link]

Perhaps Mr Hellwig was prompted by recent events to re-assess his current lifestyle, and decided to reduce his workload? If so, good on him.

Humans have a tendency to leap to negative conclusions. We should all fight that tendency.

Maintainer != Contributor

Posted Feb 25, 2025 21:39 UTC (Tue) by PeeWee (guest, #175777) [Link] (1 responses)

Christoph Hellwig only stepped down as a maintainer, which does not necessarily mean he will cease all contribution activity. In this particular case it might but that remains to be seen, so already predicting that his contributions will be missed, may be premature.

Maintainer != Contributor

Posted Feb 26, 2025 13:26 UTC (Wed) by daroc (editor, #160859) [Link]

I believe that Jon meant that his contributions to the maintenance of DMA would be missed; even if he continues contributing, he certainly won't be doing more of the work of a maintainer if he's not a maintainer any more.

Not only DMA-mapping, but configfs too

Posted Feb 25, 2025 23:12 UTC (Tue) by alphyr (subscriber, #173368) [Link]

He has also stepped down from maintaining configfs, for which there are also Rust bindings being made.

Welcome Marek

Posted Feb 26, 2025 0:24 UTC (Wed) by jmalcolm (subscriber, #8876) [Link]

All hail our Korean / Polish overlords

Well

Posted Feb 26, 2025 1:07 UTC (Wed) by Phantom_Hoover (subscriber, #167627) [Link] (11 responses)

Well, I did think that if Hellwig’s position on Rust in the kernel was as harshly opposed to Linus backed-consensus as he made out, it amounted to him losing confidence in the project and resigning would be the proper choice. But it’s a real shame it’s come to that given how willing everyone was to accommodate his practical concerns.

Well

Posted Feb 28, 2025 11:58 UTC (Fri) by LtWorf (subscriber, #124958) [Link] (10 responses)

I suspect he didn't believe the reassurances were real, just a way to placate things and postpone the issues down the line.

Well

Posted Feb 28, 2025 13:30 UTC (Fri) by pizza (subscriber, #46) [Link] (6 responses)

> I suspect he didn't believe the reassurances were real, just a way to placate things and postpone the issues down the line.

This is my take too.

...Replace the word "Rust" with anything else, and there's a long, long, long history of Linus utterly flaming folks for it.

If those "reassurances" are true, they represent a _massive_ change to how the process of development and maintainership is supposed to work. The (now-former) processes weren't arbitrary; they were carefully honed and battle-tested in no small part to prevent "issues down the line".

Well

Posted Feb 28, 2025 17:41 UTC (Fri) by Wol (subscriber, #4433) [Link] (5 responses)

> If those "reassurances" are true, they represent a _massive_ change to how the process of development and maintainership is supposed to work.

I think you're right ... (not the way you think you are)

> The (now-former) processes weren't arbitrary; they were carefully honed and battle-tested in no small part to prevent "issues down the line".

At which they failed miserably. Rust *FORCES* you to document your interfaces. The fact this is not true of C leads to the steady stream of CVEs and exploits all C projects seem to suffer from, as people on both sides of the interface mis-understand each other (and you can see that here in this storm).

Linus has learned that not documenting things is costly. I know that writing good documentation is difficult (and costly) - I can understand people not wanting to do it. Unfortunately, the attitude of "I know what I'm doing, why do I need to tell other people" is no longer acceptable. I've spent most of today getting extremely frustrated with (a) users who can't explain what they want, and (b) Excel formulae which can't explain what they're doing! Oh for some decent documentation! Excel actively frustrates attempts at decent documentation!

At the end of the day, Christoph has paid the price for not working well with others. Linus has always been a good people manager, and maybe he's now realising that prima donnas are more trouble than they're worth (or maybe he always knew that, he may just be being forced to face up to it).

Cheers,
Wol

Well

Posted Feb 28, 2025 19:23 UTC (Fri) by pizza (subscriber, #46) [Link] (4 responses)

> At which they failed miserably. Rust *FORCES* you to document your interfaces.

(Funny you say that. Linux is routinely held up as the most successful software engineering project of all time, in no small part due to the development methodology and processes that you're calling a miserable failure)

Meanwhile, Rust, in of itself, does not force anything of the sort; One can easily commit all manner of horrible sins with Rust. What gets committed depends on what the project considers acceptable. The same goes for interface documentation.

If Linus wants to change the development model/processes of Linux, then he needs to be explicit about it, and the discussion can revolve around the pros and cons of those changes, and from there the pros/cons of various approaches.

But this way is bass-ackwards and reeks of disingenuineness. Be honest about your intentions and goals up front, because the *only* logical outcome of this entire effort is Rust-in-the-core-kernel and completely deprecating (new) C. Anything less is gives you the worst of both worlds.

Well

Posted Mar 1, 2025 22:37 UTC (Sat) by raven667 (subscriber, #5198) [Link]

> because the *only* logical outcome of this entire effort is Rust-in-the-core-kernel and completely deprecating (new) C. Anything less is gives you the worst of both worlds.

I don't think this is accurate, it's not that black and white, either/or, win/lose, I would take it at face value that introducing Rust to the kernel is a hopeful experiment and that new drivers can be written in C or Rust and that while Rust is experimental that Rust maintainers would be responsible for keeping it in synch with other kernel interfaces as they change, not just the person making interface changes. I would expect that both styles will coexist for a long time and that only if the predominant consensus of the kernel developer community changes would new work in C be *forbidden*. Even after Rust is no longer experimental I think that there is still going to be a long period where predominantly Rust developers will be needed to maintain internal bindings and wrappers for the predominantly C core, and having C developers update Rust bindings depends on how many see value in spending time learning Rust.

Well

Posted Mar 1, 2025 22:57 UTC (Sat) by khim (subscriber, #9252) [Link]

> Meanwhile, Rust, in of itself, does not force anything of the sort

It kinda-sorta does. Lifetime markup may be perceived as part of the code, but in reality it's the documentation.

Proof: mrustc ignores it yet generates valid code.

What I find really strange is such an active resistance to it. Linux was doing that same thing for years with sparse.

Rust just turns that same thing (that was part of Linux development process for more than 20 years!) “up to eleven”.

Well

Posted Mar 2, 2025 0:41 UTC (Sun) by Wol (subscriber, #4433) [Link] (1 responses)

> But this way is bass-ackwards and reeks of disingenuineness. Be honest about your intentions and goals up front, because the *only* logical outcome of this entire effort is Rust-in-the-core-kernel and completely deprecating (new) C. Anything less is gives you the worst of both worlds.

If you want honsety, don't accuse others of dishonesty! There's far too many people who take the attitude "I've made up my mind, i don't care about the facts!". What's that saying? "What you see in others is a mirror of yourself"?

I'll be honest with you - I suspect that what you claim will turn out to be the future. I do NOT think that is the aim of this "Rust in the kernel" experiment, I just expect it will be what ends up happening. Because when people realise they can write a mostly-bug-free driver in Rust in two weeks, but it takes two years to write a similar driver in C, they will refuse to use C.

Which is why the Rust refuse-niks are making such a fuss. They can see the writing on the wall just as clearly as you or me, and they don't want that future.

Cheers,
Wol

Assume good faith, please

Posted Mar 2, 2025 0:51 UTC (Sun) by corbet (editor, #1) [Link]

Wol, I have asked you this before. Please assume good faith on the part of the developers involved in these discussions. Please do not attribute such base and cowardly motives to people who have worked for years to build the kernel you use and who are concerned about its ongoing maintenance. The people who are worried about Rust may well turn out to be wrong (I suspect they will), but they are not driven by fear of developers who can write a driver faster than they can. Seriously. That kind of stuff just makes the conversation harder for no good purpose.

Not all delay is bad

Posted Feb 28, 2025 19:27 UTC (Fri) by draco (subscriber, #1792) [Link] (2 responses)

IMO, it's definitely about pushing the conflict over Rust to later, but I don't think it's as simple as conflict avoidance. I'm sure that's a part of it—after all, if every maintainer quit all at once, it would definitely be very disruptive (not necessarily fatal—Linux is too important to the industry to just let it fail outright). If tweaking the rules to appease them solves the problem for now, great!

But another part of it could easily be to give Rust more time to prove its worth (or lack). For example, meaningful metrics on CVE/bug counts correlated to changes due to adding Rust (whether it's drivers being written in it or API correctness fixes/documentation to support it) would have a significant impact on the discussion. Or improvements in review bandwidth for maintainers that decide to embrace it, or the lack of that. Or coming through with better platform support and feature stabilization (or not).

It's okay to delay arguments if doing so gives everyone better information & circumstances and doesn't make things substantially worse in the meantime. It's avoidance for the sake of it while letting things get worse (the typical situation) that's bad.

Not all delay is bad

Posted Feb 28, 2025 19:43 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)

> But another part of it could easily be to give Rust more time to prove its worth (or lack). For example, meaningful metrics on CVE/bug counts correlated to changes due to adding Rust (whether it's drivers being written in it or API correctness fixes/documentation to support it) would have a significant impact on the discussion.

"Don't let the facts spoil a good argument". I think we already have those metrics.

The Rust video driver(s) have pretty much 0 CVEs I believe, and were written in much less time than equivalent C drivers. I expect the stats will be the same for other Rust drivers.

Cheers,
Wol

Not all delay is bad

Posted Feb 28, 2025 19:59 UTC (Fri) by corbet (editor, #1) [Link]

The "Rust video drivers" are not part of any kernel release at this point, how would you expect them to accumulate CVEs? Please, arguing for the sake of argument doesn't help anybody.

A prediction with no data to support it

Posted Feb 26, 2025 2:01 UTC (Wed) by dowdle (subscriber, #659) [Link] (44 responses)

First of all, I'm not a programmer and certainly not a kernel developer... but I suspect as the aging kernel developers start to retire (I'm 60 myself and I want to retire ASAP) and the struggle continues for others to step up... that Linux will be completely re-written in Rust. That might take 10+ years. That assumes Rust continues to be a healthy community and is up to the task... and that some other, more-preferred language doesn't come out of nowhere to steal a significant chunk of the Rust development community. That is, in no way, a technical assessment... as to which language is better suited (I have no idea)... but a prediction about practicality. Mr. Linus Benedict Torvalds has always been known for being pragmatic and it may very well be that there are more Rust devs in the future than C devs... so he doesn't want to discourage the talent... and wants to see how it pans out.

Linux surpassed 40 million lines of code a while back and has consistently broken all records and norms... so those who proclaim that multi-language projects do worse may be right... or maybe not. One of the common knowledge tenants of development is you have to be prepared to throw the first version away and start over. That has been very common with various things in the kernel that didn't work out. They got yanked and replaced by something better. Rust may end up being an experiment that didn't work out... or maybe not. Ups and downs are to be expected. Who remembers how long it took the 2.4.x kernel to stabilize? A lot is going to change over the next 20 years as those who manage things now will have long moved on assuming they are even still on this earth. The next gen gotta next gen.

A prediction with no data to support it

Posted Feb 26, 2025 3:07 UTC (Wed) by dralley (subscriber, #143766) [Link] (1 responses)

A pretty significant majority of Linux code is in drivers rather than the core kernel, and I would bet that's also where most of the churn is in terms of new patches. You also don't really need as much experience in the kernel to write a driver, than, say, trying to overhaul some subsystem, and there's a lot more sandboxes to play in. So I would think it's a pretty good way to train up new future kernel developers.

All this is to say that it's not a bad bet long-term, but also it's not actually necessary to touch the core kernel to make a big dent in those 40 million lines. But if the pipeline for new kernel developers is predominated by Rust, then yes it probably will make it's way into the kernel over time.

A prediction with no data to support it

Posted Feb 26, 2025 9:11 UTC (Wed) by taladar (subscriber, #68407) [Link]

I can't speak for other Rust developers but I wouldn't even consider writing a driver in the kernel unless all the dependencies for drivers of that type have Rust bindings or maintainers open to adding them.

So I would really expect the core subsystems to get bindings (or Rust replacements once it is clear that Rust is necessary to build the kernel anyway) first before the majority of existing drivers are replaced by Rust ones, especially considering how the Rust community tends to lean towards thinking things through and doing things in the right order with the RFC process for language changes.

A prediction with no data to support it

Posted Feb 26, 2025 6:58 UTC (Wed) by Alterego (guest, #55989) [Link] (7 responses)

+1 to this prediction and arguments

It seems rust is good for
- drivers
- get fresh blood in kernel ecosystem
- keep an open mind collectively (like described by Mr Greg K-H)

A prediction with no data to support it

Posted Feb 26, 2025 8:46 UTC (Wed) by jengelh (guest, #33263) [Link] (6 responses)

you forgot:
- causing a stir

A prediction with no data to support it

Posted Feb 26, 2025 10:13 UTC (Wed) by danieldk (guest, #27876) [Link] (5 responses)

This would most likely be true for any language that is not C. In this particular stir, Hellwig even called out that he believes that the kernel will become unmaintainable if it is a cross-language codebase and that it is not about Rust specifically: https://lwn.net/ml/all/20250128092334.GA28548@lst.de/

A prediction with no data to support it

Posted Feb 26, 2025 10:56 UTC (Wed) by jengelh (guest, #33263) [Link] (4 responses)

The argument has probably been made before: There are plenty of projects which combine two languages. C & ASM is a somewhat common combination, as seems to be C & C++, or C & (compiled) Python bindings, or $could_be_anything & a non-compiled set of Python/Perl/Shell/tcl files (hi Git!). Those projects seem to be doing fine, though arguably, none casually comes close to the size (LOC-wise) of Linux. Sometimes, such projects are "split in two", e.g. glib and glibmm.

I guess we'll find out sooner or later, for the lovely price of one Linux project. And perhaps we can then tell everybody "I told you so" (or not).

A prediction with no data to support it

Posted Feb 26, 2025 12:12 UTC (Wed) by Wol (subscriber, #4433) [Link] (1 responses)

And what's wrong with combining multiple languages - PROVIDED THE INTERFACES ARE CLEARLY DEFINED ?

And how many (especially language) projects are bootstrapped in one language, and then rewritten in another?

I suspect the big impact Rust will (and already is) hav(ing) on the kernel, is to force people to clearly define the interfaces. And that has to be a good thing, no?

Cheers,
Wol

A prediction with no data to support it

Posted Feb 26, 2025 20:56 UTC (Wed) by edomaur (subscriber, #14520) [Link]

> And how many (especially language) projects are bootstrapped in one language, and then rewritten in another?

typically Rust, which was originally written in OCaml :-D

A prediction with no data to support it

Posted Feb 26, 2025 12:19 UTC (Wed) by danieldk (guest, #27876) [Link]

I'm fully in favor of multi-language projects and often work on such projects for work. I just wanted to point out that the stir caused by Hellwig was not about Rust, but about multi-language projects.

A prediction with no data to support it

Posted Feb 28, 2025 0:37 UTC (Fri) by rgmoore (✭ supporter ✭, #75) [Link]

In fact, the Linux kernel is one of the projects written in C and assembly. It doesn't run into the multi-language problem because the assembly is restricted to places where C is impractical, and nobody is threatening to rip out functioning C code to replace it with assembly.

A prediction with no data to support it

Posted Feb 26, 2025 10:12 UTC (Wed) by butlerm (subscriber, #13312) [Link] (19 responses)

It is also possible if not likely that the gradual conversion of everything to Rust will lead to a fork that retains C or even C++ as its primary language at some point. C and assembly language mostly, maybe C++ for new drivers. The main problem with those two languages right now is that the standards committees have adopted so many undefined behaviors that you have to be an expert to keep the compiler from producing output that will format your hard drive without so much as a warning in many cases. The other problems are probably resolvable, or at least well enough to be worth maintaining a kernel written in C or one of its close cousins, and possibly with decent performance too.

A prediction with no data to support it

Posted Feb 26, 2025 11:13 UTC (Wed) by farnz (subscriber, #17727) [Link]

The limiting factor on forks is always going to be volunteer power. If there's enough people to maintain a "no Rust" (or "no C", or "no C++", or "no Zig", or "only FSF-approved licensing") fork of the kernel, it'll happen; further, the kernel's development methodology (going right back to the Alan Cox forks of the kernel) has always demonstrated a talent for merging in bits from forks wherever there's an advantage to doing so.

As a result, the fork is either going to be neutral (no gain for the mainline, no loss either, since the people working on it wouldn't work on mainline if they had to deal with mainline's choice of languages), or beneficial (since mainline can take the improvements from them).

A prediction with no data to support it

Posted Feb 26, 2025 12:18 UTC (Wed) by tialaramex (subscriber, #21167) [Link] (17 responses)

I don't think the C++ community has what it takes to successfully maintain Linux.

It's a community which only wants to write the happy path. Exceptions enable dilution of responsibility. If I write C++ code which just throws in the unhappy path and you write C++ code which calls my function, both of us can claim at review that it wasn't our job to handle the error. Somebody else should do that, the happy path code I wrote was difficult enough. In Rust whoever panics gets to explain why, and code where nobody handled the error case at all doesn't compile.

That doesn't make the handling magically correct - but it's much less likely that some of the really wild effects happen when you know you're writing error handling code, than when the "handling" is the consequence of a missed check.

I can believe a "C only Linux" fork could exist, particularly if the way Linux gets to 100% Rust platform support is via removing some older platforms some years in the future. If you're involved in maintaining Linux for a CPU architecture that hasn't been made since last century you might well have zero interest in Rust and plenty of reason to fork the last Linux which built correctly for your favourite machine.

A prediction with no data to support it

Posted Feb 26, 2025 12:50 UTC (Wed) by excors (subscriber, #95769) [Link]

I think quite a large part of the C++ community uses -fno-exceptions, mainly for performance reasons but also because they don't like the design of exceptions (https://wg21.link/p0709r4). And there's probably a large overlap between that group and the people who'd be trying to push C++ in the kernel, so any serious proposal would be for an exceptionless subset of C++.

That does cause a bit of friction when some parts of the C++ standard library and language are designed around the assumption that you have exceptions, but in practice it works okay (or at least it's no more problematic than several other aspects of C++).

I can't imagine the Linux kernel actually adopting C++ though, because it would have pretty much all the same technical challenges and cultural pushback as Rust, with significantly fewer benefits to make it seem worthwhile.

A prediction with no data to support it

Posted Feb 26, 2025 14:10 UTC (Wed) by butlerm (subscriber, #13312) [Link] (10 responses)

I would never use or recommend C++ exceptions in or for an operating system kernel. In my view that is basically insane. I understand Google has a similar policy for all C++ code because a C++ exception can occur anywhere and experience has shown with most languages that support exceptions that exceptions tend to be overused, insufficiently analyzed, and go unhandled more often than not until they reach some sort of :"error occured" exception handler.

It like "oh well, just ship this code or release it into production because the catch all exception handler will handle the problem and the user can either try again or we can fix any issue we find or someone reports after the fact in a month or two or maybe sooner if it is really serious." And that is if the problem ever gets fixed at all, within the lifetime of the project, the product, the service, the volunteers (where applicable), managers, leaders, or the developers in question.

I used to write video games in C and assembly language and in my view good code should perform according to specification and be usable a century from now if committed to ROM and sold on store shelves or shipped in products that way. Does anyone doubt that most Nintendo, Sega, or Atari 7800 games will actually work with the appropriate hardware without major malfunctions decades from now? What about something like Netware (which was originally mostly written in 80386 assembly language) or the Amiga operating system (originally written in a mixture of C, BCPL, and 68K assembly language) or a number of other things, at least if deployed into a non-hostile environment?

You can see this problem in web applications written in Javascript all the time these days, especially on the web sites of banks that are not among the largest in the country or on the websites of most non-bank credit card issuers and lenders as well. I use websites on a regular basis where it is a fifty fifty chance that a login with the correct credentials supplied will actually succeed. And that goes for many other actions as well where the user is often required to do things like put in their credit card information to make a payment twice because of mysterious error occured problems that are cured simply by repeating the process. Or worse where a payment will not go through at all for weeks for other and never documented reasons not explained to the user. There is a major funds transfer webapp whose name you would all recognize that often behaves that way these days.

I believe that this is likely and in large part the result of libraries and code included in many modern Javascript applications that is are so extensive that either the exceptions are undocumented or you have to be an expert to handle them properly, and often entry level developers are not given enough time or resources to fix the problem. That is my experience for a few years when I was in the unfortunate position of having to maintain and develop code for a moderately sophisticated web application that was originally programmed to use Javascript only where necessary. When you have a dozen or more developers working on a project it is that much worse.

Anyway I am not surprised that a large team has a difficult time writing safe, correct, and decently performing C, C++, or Java code and don't really see any solution to that other than compilers and static analysis tools that identify the problems and produce and optimize code better than most developers can write by hand even after they stare at a problem for hours at a time. And in a project as big as the Linux kernel or something like a modern database or web browsers in my view it would be worth it to write static analysis tools that are hard coded if necessary to describe and enforce the constraints and rules that govern and apply to that project. A more general tool would be nice but apparently no one has written one yet - not one capable or used enough (apparently) to find the memory safety, locking, and other problems that still make it to deployed production kernels and have to be corrected after the fact in some cases after making national or international news due to problems that ought to be straightforward to analyze and detect.

Finally, although this almost certainly could not be done well or perfectly without heavy use of a new series of #pragmas or language extensions, my idea of a usable C or C++ compiler for a large project is one that refuses to compile code with undefined behaviors at all and require the developer to supply machine architecture and memory model targeting information to make those behaviors implementation or configuration defined if he or she wants to code almost anything that would otherwise result in undefined behavior that developers, vendors, and publishers of contemporary C and C++ compilers feel like they have a license to do anything for any reason such as delete entire code sections or skip appropriate if statements and safety checks as we have read about here from time to time with regard to C compiler optimizers causing serious problems for that reason. That is my two cents on this question.

A prediction with no data to support it

Posted Feb 26, 2025 17:02 UTC (Wed) by matthias (subscriber, #94967) [Link] (9 responses)

> Finally, although this almost certainly could not be done well or perfectly without heavy use of a new series of #pragmas or language extensions, my idea of a usable C or C++ compiler for a large project is one that refuses to compile code with undefined behaviors at all and ...

C and C++ are not designed for this. Of course many cases of UB in C and C++ can be made implementation defined like integer overflow. But there are certain operations that are already UB on the machine code level:
- dereferencing a dangling pointer
- data races between two threads that access the same memory where at least one access is a write
- probably a few more (but not many)

The rust way of eliminating this kind of UB is the borrow checker that verifies at compile time that all references are sound. I really do not see any reason why this should be done in C or C++. If you add borrow checking to these languages they are not really the same language any more. Instead it would be much better to use rust directly which has been developed with this feasture in mind from the start.

Of course you can also use the good old -O0 approach of forbidding any optimizations that could result in UB. Except you also have to prevent UB on the machine code level. So all data accesses need to be atomic to prevent the CPU from doing crazy reorderings that are only sound in the absence of data races. The resulting performance would be worse than -O0.

Then there is the JVM way of doing things. Use a virtual machine and only code against the virtual machine. I do not see how this should work in the kernel. Also you need a language to write the virtual machine in.

In my opinion, rust already is this hypothetical C++ language without UB. Maybe at some point a clever person will find better alternatives, but I do not see a way to get rid of UB without the borrow checker. And it is really the borrow checker that defines what kind of language rust is. There are of course other differences, but the borrow checker is the most prominent one.

A prediction with no data to support it

Posted Feb 27, 2025 0:50 UTC (Thu) by neggles (subscriber, #153254) [Link]

> Then there is the JVM way of doing things. Use a virtual machine and only code against the virtual machine. I do not see how this should work in the kernel. Also you need a language to write the virtual machine in.

Well that's essentially what eBPF is, a virtual machine model and runtime environment that's suitable for use in the kernel. But the limitations of eBPF (and wasm for that matter, since a number of people are of the opinion that eBPF is "just worse wasm") show why that's not a practical model for the kernel as a whole.

As an aside, It might be an interesting project to try and write a microkernel almost entirely in eBPF, where (say) each individual microkernel service is a verified eBPF program and only the base message passing layer / helper functions aren't. Probably a Ph.D or two to be had there.

A prediction with no data to support it

Posted Feb 27, 2025 1:56 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (6 responses)

> dereferencing a dangling pointer

This is not, strictly speaking, UB on the machine code level (at least in the general case). Depending on what you mean by "dangling," it could be well-defined as having either of the following meanings:

* You access some area of memory that you did not intend to access, but it's still within your address space. It is a perfectly well-defined operation. By assumption, it is not the well-defined operation that you intended to do, but that doesn't make it UB.
* You trap, and the OS does something about it (in practice, usually it kills the offending process, but page faults can use a similar or identical mechanism depending on the architecture, and a page fault is not even a real error). This is also a perfectly well-defined operation (regardless of how the OS decides to respond to it).

Remember, the heap is entirely a construct of libc, and the stack is mostly a construct of libc. The notion of "corrupting" either of them does not exist at the machine code level, because at the machine code level, memory is memory and you can read or write whatever bytes you want at whatever address you want in your address space. If you write the wrong bytes to the wrong address, and confuse some other part of your program, that's your problem. It does not magically cause the CPU to believe that your program is invalid, and to start doing things other than what your machine code tells it to do (or, in the case where the instruction pointer is no longer pointing at your original machine code, whatever the new code tells it to do).

> data races between two threads that access the same memory where at least one access is a write

Most architectures do not provide the full semantics of the C abstract machine under the as-if rule. That is, most architectures are at least willing to promise that you get some sort of value when you execute a data race. It's probably the wrong value, it's probably nondeterministic-but-not-in-a-cryptographically-useful-way, and it might not look like any of the values you would "logically expect" to see (e.g. because of tearing), but it is still not quite the same thing as UB.

UB specifically means "an optimizing compiler is allowed to assume that this never happens." It cannot exist at the machine code level, because there is no compiler. The closest we can get (within the context of the C and C++ standards) is implementation-defined behavior, which roughly translates from the standardese to "if this happens, we don't know what your system will do, but you can read your compiler, CPU, and OS manuals and figure it out if you really want to."

The C and C++ standards committees could, at any time, wave a magic wand and eliminate all UB from their respective languages. The reason that nobody is seriously advocating for that is not because it would not work, but because it would necessarily involve saying something like "all UB is hereby reclassified as IB," and (this general category of) IB is almost as much of a problem as UB. It also requires more documentation that nobody is actually going to read (do *you* want to carefully study a heap diagram for your particular libc's malloc, just so you know what happens if the heap is corrupted?), since all IB must be documented by each implementation (that's the "you can read your manuals" bit). So you'd lose a lot of optimization opportunities, and waste a lot of the implementers' time, in exchange for practically nothing.

A prediction with no data to support it

Posted Feb 27, 2025 7:31 UTC (Thu) by matthias (subscriber, #94967) [Link] (2 responses)

> It does not magically cause the CPU to believe that your program is invalid, and to start doing things other than what your machine code tells it to do (or, in the case where the instruction pointer is no longer pointing at your original machine code, whatever the new code tells it to do).

So what are the semantics if you corrupt the stack and as a consequence jump to uninitialized memory or memory that you intentionally filled with random data to construct a key or even worse, memory filled by data controlled by an attacker. By the very definition of the instruction set anything can happen. You can call the resulting behavior however you like it to call, but it is essentially as undefined as it can possibly get.

And independently from how you call this behavior, this is clearly behavior that has to be avoided. Corrupting the stack clearly leads to exploits so this UB free variant of C(++) that we are talking about has to avoid it. So we are back at square one and we need the borrow checker to avoid this.

> UB specifically means "an optimizing compiler is allowed to assume that this never happens." It cannot exist at the machine code level, because there is no compiler.

But you have a very similar thing. An optimizing out-of-order architecture in the CPU. And this architecture makes similar assumptions on what can happen vs. what cannot happen. And again, you can call this behavior by different names, but it is essentially undefined. The CPU does not have the global sense of what is going on as the compiler, but messing up locally is enough to corrupt your data. And again, we effectively need the borrow checker to prevent data races. You can get rid of some of this behavior if you make each and every data access atomic, but this is obviously undesirable and I am not even sure that this would be enough.

> ...saying something like "all UB is hereby reclassified as IB," and (this general category of) IB is almost as much of a problem as UB.

It is essentially this, giving a new name to the same behavior. And it is not almost as much as a problem as UB, it is exactly as much of a problem as UB, as it can still lead to the same "if you do not follow the rules, I am allowed to format you hardrive" kind of behavior.

I would be absolutely in favor if the committee would eliminate all this nonsense kind of UB like integer arithmetics can be UB. But once you try to avoid the UB of dangling pointers and data races, you essentially have to construct a whole new language.

Architecture, microarchitecture, and undefined behaviour

Posted Feb 28, 2025 8:24 UTC (Fri) by anton (subscriber, #25547) [Link] (1 responses)

So what are the semantics if you corrupt the stack and as a consequence jump to uninitialized memory or memory that you intentionally filled with random data to construct a key or even worse, memory filled by data controlled by an attacker. By the very definition of the instruction set anything can happen.

Not at all. First of all, the architectural effects of every instruction up to that point continue to hold, while, e.g., in C++ undefined behaviour is reportedly allowed to time-travel. Next, in a well-designed architecture what happens then is defined by the actual content of the memory and the architecture description, which does not contain undefined behaviour (remember, we are discussing well-designed architectures). Maybe you as programmer do not deem it worth reasoning about this case and just want to put the label "undefined behaviour" on it, but as far as the architecture is concerned, the behaviour is defined.

An optimizing out-of-order architecture in the CPU.

The architecture does not specify out-of-order execution, on the contrary, it specifies that each instruction is executed one by one. There may be a microarchitecture with out-of-order execution like the Pentium Pro below it, or a microarchitecture with in-order execution like the 486, but the end result of executing a sequence of instructions is the same (except for the few cases where the architectures differ; IIRC the CMOVcc instructions were in the Pentium Pro, but not the 486).

And this [micro]architecture makes similar assumptions on what can happen vs. what cannot happen. And again, you can call this behavior by different names, but it is essentially undefined.

Computer architects have learned what later became Hyrum's law long ago, and therefore define completely (or almost completely for not-so-well designed architectures) what happens under what circumstances. Microarchitectures implement the architectures, and they do not assume that something cannot happen when it actually can. When the microarchitects fail at implementing the architecture, as with Zenbleed, that's a bug.

The CPU does not have the global sense of what is going on as the compiler, but messing up locally is enough to corrupt your data.

Microarchitectures with out-of-order execution do not commit any changes that do not become architectural, and therefore do not corrupt data (rare architecture-implementation bugs like Zenbleed excepted).

Architecture, microarchitecture, and undefined behaviour

Posted Feb 28, 2025 14:01 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

> in C++ undefined behaviour is reportedly allowed to time-travel.

This has been argued, but it seems that no one has been able to show an instance of a compiler actually doing so. There are some solutions for it in the works (by saying "it's not allowed"), but it is practically an no-op as compiler have already behaved that way (though I am certainly not well-steeped in the matter for the details):

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/...

A prediction with no data to support it

Posted Feb 27, 2025 12:31 UTC (Thu) by excors (subscriber, #95769) [Link] (2 responses)

> UB specifically means "an optimizing compiler is allowed to assume that this never happens." It cannot exist at the machine code level, because there is no compiler. The closest we can get (within the context of the C and C++ standards) is implementation-defined behavior, which roughly translates from the standardese to "if this happens, we don't know what your system will do, but you can read your compiler, CPU, and OS manuals and figure it out if you really want to."

I don't think that's really true. x86 and Arm have a number of things that are explicitly documented as "undefined" or "unpredictable" in the architecture references, and are not documented in CPU-specific manuals (as far as I can see), so you can't figure out the behaviour even if you really want to.

E.g. on x86 there's the BSF/BSR instructions ("If the content of the source operand is 0, the content of the destination operand is undefined"). Many instructions leave flags in an undefined state. With memory accesses to I/O address space, "The exact order of bus cycles used to access unaligned ports is undefined". Running the same machine code on different CPUs can give different behaviour, in the same way that running the same C code through different compilers (or the same compiler with different optimisation flags) can give different behaviour, with no documentation of what will happen, so I think it's reasonable to equate that to C's concept of UB.

(And the C standard says UB specifically means "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements", so it's not literally dependent on there being an optimising compiler.)

In practice, all the undefined/unpredictable CPU behaviour that's accessible from userspace is probably documented internally by Intel/Arm for backward compatibility and security reasons, since the CPU is designed to run untrusted machine code (unlike C compilers, which are designed to compile only trusted code). Armv8-A has a lot of "constrained unpredictable", where it's documented that an instruction might e.g. raise an exception or be treated as NOP or set the destination register to an unknown value but it isn't allowed to have any other side effects; but there's still plenty of non-constrained "unpredictable" behaviours. They're not fully unconstrained: they are documented as obeying privilege levels, but they can have arbitrary behaviour that would be achievable by any code within that privilege level, which is the same as C's UB in practice (e.g. UB in an application is not allowed to break the kernel). So I think it's very much like C's UB.

A prediction with no data to support it

Posted Feb 28, 2025 8:55 UTC (Fri) by taladar (subscriber, #68407) [Link]

UB is not really just a single undefined operation, it is more what happens as a consequence of relying on an invariant the optimizer assumes that then gets broken, e.g. you could have a piece of code that relies on the invariant that an enum discriminant is only ever 0, 1 or 2 and optimizes a jump to go to a base address plus the discriminant multiplied by 8 without checking bounds so if that invariant is broken you end up literally jumping to memory that could have any kind of code and so the behavior is undefined and could launch missiles for all we know.

A prediction with no data to support it

Posted Feb 28, 2025 9:18 UTC (Fri) by anton (subscriber, #25547) [Link]

E.g. on x86 there's the BSF/BSR instructions ("If the content of the source operand is 0, the content of the destination operand is undefined"). Many instructions leave flags in an undefined state. With memory accesses to I/O address space, "The exact order of bus cycles used to access unaligned ports is undefined". Running the same machine code on different CPUs can give different behaviour, in the same way that running the same C code through different compilers (or the same compiler with different optimisation flags) can give different behaviour, with no documentation of what will happen, so I think it's reasonable to equate that to C's concept of UB.
(And the C standard says UB specifically means "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements", so it's not literally dependent on there being an optimising compiler.)

C language lawyers make a fine-grained difference between different forms of lack of specification in the C standard. IIRC they have "unspecified value" for cases where the result of an operation is unspecified (as in the BSF/BSR case and the unspecified flags results). I think they do not have a special name for an unspecified order.

And while I agree with the idea that the C standards committee originally used "undefined behaviour" for cases where different implementations produced different behaviour, and where they did not have a more specific term (such as "unspecified value"), for several decades C compiler writers have used "undefined behaviour" to assume that this behaviour does not occur in the programs they support (unless the program is "relevant" for some reason), and there are people around that advocate the position that this has been the intent of "undefined behaviour" from the start.

And the latter form of "undefined behaviour" has quite different results from the former; e.g., with the latter form a loop with an out-of-bounds access can be "optimized" into an endless loop, while with the former form it will perform the memory access, either giving a result, or producing something like a SIGSEGV.

In practice, all the undefined/unpredictable CPU behaviour that's accessible from userspace is probably documented internally by Intel/Arm for backward compatibility and security reasons

Especially backwards-compatibility; the security benefits fall out from that. As for the bad design in the ARM architectures, maybe they have had too much contact with compiler people and become infected by them. I expect that at some point the implementors of ARM architectures will find that existing programs break when they implement some of the ARM-undefined behaviour in a way different than earlier implementations of that architecture, and that behaviour then becomes an unofficial part of the architecture, as for the Intel and AMD cases mentioned above. A well-designed architecture avoids this pitfall from the start.

A prediction with no data to support it

Posted Feb 27, 2025 3:18 UTC (Thu) by raof (subscriber, #57409) [Link]

Then there is the JVM way of doing things. Use a virtual machine and only code against the virtual machine. I do not see how this should work in the kernel. Also you need a language to write the virtual machine in.

There was a very interesting research OS at Microsoft that did exactly this - Singularity. A bit of bootstrap written in assembly, then jumping into a fully managed environment written in a variant of C# (called Sing#, which was the source of a bunch of C# features over time). Being fully managed meant that one of the core weaknesses of microkernels - context switch overhead - didn't exist, because it just didn't use the process-isolation hardware.

There's a really interesting series of blog posts about Midori, the very-nearly-complete project to replace Windows with a Singularity-derived codebase.

Rust will not reduce platforms

Posted Feb 26, 2025 22:23 UTC (Wed) by jmalcolm (subscriber, #8876) [Link] (3 responses)

I do not expect that Rust will reduce the platforms that Linux runs on.

Today, where Rust is going in is the drivers. Drivers are often fairly platform specific already. You can also have competing drivers for the same hardware if it turns out that there needs to be a mainstream and a niche option. But the fact that Apple Silicon users are writing their GPU drivers in Rust is not going to threaten Linux support for my niche architecture.

Rust support is also being added to GCC (gccrs). That may take a while to bake but I expect it to mature before we start seeing Rust in core Linux systems that are non-optional across platforms. In other words, Rust in the kernel will not threaten platform support as long as your platform is supported by either GCC or Clang (LLVM).

What platforms are we worried about that cannot be targeted by GCC or Clang? Can Linux run there now?

As a final back-stop, their is mrustc. This allows Rust to target any system with a capable C++ compiler.

By the time Rust becomes non-optional in Linux, Rust will be as portable as C or C++.

Rust will not reduce platforms

Posted Mar 1, 2025 18:32 UTC (Sat) by mfuzzey (subscriber, #57966) [Link] (2 responses)

Drivers are generally *hardware* dependent (of course) but most are *platform* independent.

This applies to virtually all drivers for hardware that isn't in the SoC itself (eg chips connected to the CPU using busses like I2C / SPI / PCI / USB ).

Even when the hardware is actually inside the SoC it's quite common for IP blocks to be reused in multiple SoCs, even ones from different manufacturers (because manufacturers often buy the IP for an ethernet controller, USB controller or whatever and integrate it in their SoC). In that case the register interface is the same so the driver code is the same but the registers will be at different addresses (and that's taken care of by injecting the appropriate base address by DT / ACPI)

So, in many cases, having drivers in Rust will impact Linux support for platforms that don't yet have a Rust implementation. And while it is indeed possible to have competing implementations this usually frowned upon in the kernel for duplication / maintenance reasons and usually exists only temporarilly.

Rust will not reduce platforms

Posted Mar 3, 2025 10:24 UTC (Mon) by taladar (subscriber, #68407) [Link]

The real question is why do those niche platforms have such an overwhelming impact on Linux development decisions despite largely being out of production for decades?

Rust will not reduce platforms

Posted Mar 5, 2025 0:59 UTC (Wed) by edgewood (subscriber, #1123) [Link]

Are there kernel architectures that are not supported by Rust that can support new devices in need of new drivers? My understanding is that all but one of the kernel architectures not supported by Rust are legacy architectures that are out of manufacturing. Are they really getting new devices? Can you point me to a concrete example of one of these?

A prediction with no data to support it

Posted Feb 27, 2025 0:44 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> some older platforms some years in the future

Hm. Right now Rust is missing the following in-tree archs: sh, parisc, openrisc, nios2, microblaze, csky, arc, alpha.

Out of these architectures, only sh is still being manufactured. And maybe arc (from Synopsys). I'd be surprised if these architectures stay in-tree by the time Rust becomes mandatory. Except for Alpha, people love it for some reason.

A prediction with no data to support it

Posted Feb 26, 2025 17:28 UTC (Wed) by magnus (subscriber, #34778) [Link] (13 responses)

I wonder if it is even possible to rewrite the core kernel (like mm, page fault and irq handling, core scheduling code etc) in Rust in a meaningful way or will you just end up with a huge "blob" of unsafe code in the middle that gives no real benefit over the C version?

A prediction with no data to support it

Posted Feb 26, 2025 17:49 UTC (Wed) by mb (subscriber, #50428) [Link] (11 responses)

Yes, it is possible.
There are more than enough examples for this.

The key concept here is to create zero-cost abstraction layers just above your unsafe hardware and to implement basic primitives in partial unsafe code and use these safe primitives in your implementation.

It's easily possible to write bare-metal Rust code without a single line of unsafe code.
Almost all of the unsafe code is in the PAC* and (if you use one) the HAL*.

The PAC is an extremely simple and mostly auto-generated zero-cost abstraction of the microcontroller hardware. The HAL is one layer above that putting together higher level hardware abstractions with only little unsafe code. Think about driver code for hardware primitives like I2C, SPI, etc...

Typical kernel code is *much* more high level than that. Even in the core kernel.

With these concepts it's easy to write irq-handling, scheduling, mm, traps, etc... in safe Rust code.

*PAC = Peripheral Access Crate.
*HAL = Hardware Abstraction Layer.

A prediction with no data to support it

Posted Feb 26, 2025 21:56 UTC (Wed) by Wol (subscriber, #4433) [Link]

> Yes, it is possible.
> There are more than enough examples for this.

Eggsackerly.

All you need is a properly defined API. And that's all the Rust guys were asking for. All I care about when using 3rd party code is that I have a definition of the interface I can comply with. Anything else is an opaque box I don't want to have to give a monkeys about.

Lack of such interfaces generally indicates badly designed (or implemented) spaghetti code. One only has to think back to Alan Cox and the tty drivers (okay, the younger linux kernel guys are probably too young to remember ... :-).

The more Rust makes people define their interfaces, the easier it will be to replace chunks of the kernel - with C++, PL/1, Basic, ... or Rust. Doesn't matter. The cleaner the boundaries, the easier it will be to replace bits.

Cheers,
Wol

A prediction with no data to support it

Posted Feb 26, 2025 22:01 UTC (Wed) by magnus (subscriber, #34778) [Link] (9 responses)

I don't disagree with what you write and i think Rust can be great for that kind of code but the core of Linux is a lot more complex than a basic bare metal os. You have all the SMP concurrency issues, page faults and interrupts happening on multiple cpus and io dma flying in and out all at the same time etc, all aggressively optimized. Can what the Linux kernel is doing be expressed in the Rust "framework"? Maybe it can. I dont know Linux or Rust deeply enough to know the answer.

A prediction with no data to support it

Posted Feb 27, 2025 3:03 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (7 responses)

The point is not that you avoid unsafe entirely. The point is that you wrap whatever operation is unsafe in some kind of type-state object or other zero-cost abstraction, and then provide a safe API based on that abstraction. This API enforces rules in the form of type safety, so the compiler will not let you hold it wrong (unless you unsafe-cast the type restrictions away, of course, but if you try hard enough, you can defeat almost any safety mechanism).

Rust is generally pretty good at expressing rules like the following (which your wrapper would selectively combine in such a way as to prohibit incorrect use of the underlying unsafe primitive, whatever that may happen to look like):

* Before you can [do the thing], you must [do the other thing].
* After you [do the thing], you may no longer [do the other thing].
* Only one thread may [do the thing] at a time.
* If any thread can [do the thing], then no (other) thread is allowed to [do the other thing].
* [The thing] may not outlive [the other thing].
* You may only [do the thing] on the same thread that [did the other thing].
* Your callback function must [do the thing].
* You may only [do the thing] from a (specific) callback.
* Many variations of "you may only [do the thing] if I say so" (which could give rise to runtime checking, if you are so inclined).

These are type-safety rules, so they officially apply to objects rather than verbs. But it is easy enough to tie an object to a verb (by making it a required argument or the return type, as appropriate), and it is also easy enough to make zero-cost wrappers or zero-byte objects in Rust, so this is not a real restriction.

Rust can fully express the following, but it is painfully complicated to do so (see std::pin or core::pin, and note there are multiple people looking into ways of making this less painful):

* [The thing] must live at a fixed address, and may not be relocated (moved or copied) under any circumstances.

Rust can express the following in most cases, but there is no direct support for it, and the most common workaround (an API like std::thread::scope) is a bit more involved and restrictive than it ideally should be (search for "affine types" if you want to find the people who are looking into improving this one):

* After you [do the thing], you must [do the other thing].

A prediction with no data to support it

Posted Feb 27, 2025 3:17 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

Note also that many of those rules can be expressed in most languages that have visibility restrictions, not just in Rust. Rust is special mostly because of the "after / no longer" rule, the "may not outlive" rule, all of the thread-related rules, and the fact that it can enforce these rules with zero runtime cost.

A prediction with no data to support it

Posted Feb 27, 2025 4:31 UTC (Thu) by draco (subscriber, #1792) [Link]

Rust has affine types. You meant to say that they're working on linear types.

I don't find the terminology terribly helpful either 😂

A prediction with no data to support it

Posted Feb 28, 2025 17:23 UTC (Fri) by magnus (subscriber, #34778) [Link] (4 responses)

Thanks for the answer and sorry for harping on but I just wonder if there are situations in the kernel that can not be nicely contained in Rust abstractions with unsafe code and safe before/after points. Like situations where you can safely act on shared data without locking because you know that the overall state of the system is such that you cant have races but you cant describe it with Rust features. Maybe in scheduler code you might know that all the cpus got interrupted and you know what code the others are executing (as they are all running the same kernel code).

A prediction with no data to support it

Posted Mar 3, 2025 11:23 UTC (Mon) by laarmen (subscriber, #63948) [Link]

> but I just wonder if there are situations in the kernel that can not be nicely contained in Rust abstractions with unsafe code and safe before/after points.

I'm pretty sure such situations will come up, but I don't see this as a problem. Again, the goal is not to avoid `unsafe` entirely. The entire point of `unsafe` is to have a way to communicate that there are invariants that cannot be upheld by the compilers that guarantee the safety of operations that are in general not safe.

Use of unsafe in kernel Rust code

Posted Mar 3, 2025 12:02 UTC (Mon) by farnz (subscriber, #17727) [Link] (2 responses)

The goal isn't to remove unsafe completely when you're writing a kernel; rather, you want to constrain it to small chunks of code that are easily verified by a human reader. For example, it's completely reasonable to require unsafe when you're changing paging related registers, since you're changing something underneath yourself that the compiler cannot check, and that can completely break all the safety promises Rust has verified.

It does tend to be true in practice that you can encapsulate your unsafe blocks behind zero-overhead safe abstractions (e.g. having a lock type that is unsafe to construct because it depends on all CPUs running kernel code, not userspace), but that's an observation about code, not a requirement to benefit; even without that encapsulation, you benefit by reducing the scope of unsafe blocks so that it's easier to verify the remaining bits.

Use of unsafe in kernel Rust code

Posted Mar 4, 2025 3:50 UTC (Tue) by magnus (subscriber, #34778) [Link] (1 responses)

The point i am trying to make is that you may not always be able to completely encapsulate the unsafe code into safe abstractions but have to propagate it upward in some way, either by going up the call stack and making all users of the abstraction unsafe or by lying to the compiler that your abstraction is safe but then requiring the additional constraints to be managed by the callers outside of the type system. So you may be forced to weaken the idea of "safe by construction" a little bit.

For example if the ownership or concurrency management of something central like a struct page or something depends on a lot of tangled state that the Rust compiler can not verify you can create abstractions that modify it but then you also require that the other conditions are satisfied. If they are called at the wrong time they would still compile but yield unsafe behavior.

On the other hand the hard distinction between safe/unsafe and logic bug may not make much sense deep in the kernel anyway as any logic bug would be both a safety and functional issue.

Use of unsafe in kernel Rust code

Posted Mar 4, 2025 10:23 UTC (Tue) by farnz (subscriber, #17727) [Link]

You may have to weaken it as compared to #![forbid(unsafe_code)], but that's still a lot stronger than you get from plain C. Bubbling up unsafe to a high level is absolutely fine, though - it just tells callers that there's safety promises that Rust can't check, but relies on you checking manually instead.

And even in the kernel, the hard distinction makes a lot of sense; the point of unsafe is that the "unsafe superpowers" allow you to cause "spooky action at a distance" by breaking the rules of the abstract machine. The core purpose of the unsafe/safe distinction is to separate out these two classes of code, so that you can focus review efforts on ensuring that unsafe code doesn't break the rules of the machine, and hence that bugs can be found by local reasoning around the area of code with a bug.

The problem that Unsafe Rust and C both share is that a bug doesn't have to have local symptoms; for example, a bug in an in-kernel cipher used for MACsec can result in corruption of another process's VMA structures, causing the damaged process to crash for no apparent reason. That means that you have to understand the entire codebase to be certain of finding the cause of a bug; Safe Rust at least constrains the impact of a bug in the code to the things that code is supposed to be touching, so you can (e.g.) rule out all parts of the Safe Rust codebase that don't touch VMAs if what you're seeing is a corrupt VMA.

A prediction with no data to support it

Posted Feb 27, 2025 5:50 UTC (Thu) by mb (subscriber, #50428) [Link]

>You have all the SMP concurrency issues, page faults and interrupts happening on multiple cpus and io dma flying in and out

These things already have abstraction layers in the C code today.
And in most cases Rust code can just use the existing abstractions (put another zero-cost or in some cases nearly-zero-cost safe Rust abstraction around them).

If you have SMP concurrency in C code today, you can't express that in plain C. You have to use the abstraction layers that the kernel provides. Their inner workings mostly come from "arch" where the hardware specific magic happens.

>Linux is a lot more complex than a basic bare metal os.

Yes. But I wasn't talking about the OS itself. I was talking about the primitives required to build an OS.
Primitives like dispatching an interrupt from the bare metal into a safe and well defined high level language routine.

Rust is *not* about avoiding unsafe code.
Unsafe code is the tool to write your lowest level primitives that your whole stack builds upon.

Unsafe blocks are not magic. They don't let you magically write low level code and ignore all Rust safety rules.
All Rust safety rules also apply to unsafe blocks and they are all enforced by the compiler inside of unsafe blocks. Adding a unsafe to existing safe code changes nothing.
But unsafe blocks give you a couple of more tools (mainly raw pointers) that are needed to build safe abstractions (e.g. with safe references on the outside instead of raw pointers).

A prediction with no data to support it

Posted Feb 26, 2025 18:10 UTC (Wed) by asahilina (subscriber, #166071) [Link]

I wrote GPU page table management in Rust (including GPU core dumping), with 11 unsafe blocks each of which is a single function call/operation:

https://github.com/AsahiLinux/linux/blob/gpu/rust-wip/dri...

The idea that low-level memory management ends up being one big unsafe blob is a myth. Even for low-level mm/kernel code, you still only end up with the unsafe bits walled off into small sections. And you can build safe abstractions around them as necessary, like I did there with `with_pages()` which is used as the basic primitive for PTE walking and mutation (that's a generic function, so it gets monomorphized/optimized into a separate variant for each usage in the rest of the file, which means I only have to write the error-prone page table walking code once for every possible op).

For core IRQ handling you don't even need any unsafe code at all (in principle, other than the vectors of course but that ends up written in assembly anyway). There's nothing memory-unsafe about IRQ management.

Of course, you can introduce memory safety issues outside the unsafe blocks when you're writing core kernel code (and some driver code), such as by mapping the wrong physical memory pages into the page tables. Rust doesn't protect against that, but it does provide many more convenient tools to make things like address math less error-prone and build safer zero-cost abstractions to handle things, so you still get a lot of benefits over C.

This is not unique to kernels either. In general, soundness is defined/expected on crate/module boundaries, so the safety of unsafe blocks within a crate is allowed to be conditional on invariants maintained by "safe" code (in this case, the safety of the unsafe blocks in PT management is conditional on the incoming address inputs being correct). Kernel code is more challenging than most Rust code in this regard, but it's not a total free-for-all like C at all. You still get a lot of mileage out of not having unsafe code that can cause memory unsafety in parts of the code that aren't doing low-level things, and you also get a lot of mileage out of the powerful abstractions.

As long as your interface boundaries are as safe and sound as possible (ideally fully sound at the lowest level of abstraction you can manage), the potential for bugs goes way down and the reliability of code review goes way up. For example, in my GPU driver, the GPU MM memory unsafety is limited to the page table code and the next higher level module which handles tracking mappings at the object level (mmu.rs). Above that, excluding some rare special-case operations like mapping a raw I/O page, the rest of the driver has no ability to accidentally supply the wrong physical address or free a GPU memory page without first unmapping it and flushing the TLBs. The Rust lifetime and ownership rules guarantee that if a GPU object is being freed, it can have no active GPU mappings.

Quiet quitting

Posted Feb 26, 2025 8:59 UTC (Wed) by jezuch (subscriber, #52988) [Link]

> quietly

More like "passively-aggressively", that commit looks like to me.

I hope Christoph Hellwig finds peace now :) The kernel moves on, despite the fear mongering that everything will fall apart without him.

Thank you, Christoph.

Posted Feb 26, 2025 9:12 UTC (Wed) by hailfinger (subscriber, #76962) [Link] (1 responses)

Christoph Hellwig has been cleaning up many parts of the kernel, making it more maintainable and ensuring long-term viability of the codebase.
From an outsider's point of view, some of those cleanups probably have also benefited the effort to include Rust in the kernel because (some of) the burden of cleaning up interfaces before others could create Rust bindings was shouldered by Christoph.

Thank you, Christoph.

Posted Feb 26, 2025 11:29 UTC (Wed) by tlamp (subscriber, #108540) [Link]

I did not follow the effort in his key maintainer areas to closely so I cannot comment on that, but from following kernel development in general I certainly saw lots of positive work from Hellwig, so I would hardly be surprised if you're right that he also benefitted the Rust effort indirectly.

And while I might not agree with all his opinions and stances, especially as we already use Rust successfully at work a lot, I also am a bit appalled by the amount of (seemingly hive-mind) toxicity that one can find in the various threads and also comment section of platforms like here.

I mean sure, everyone should feel free to disagree with other acts or opinions, but if one cannot do that without personal attacks, "schadenfreude" or ignoring all the (past) effort that maintainers poured into projects by basically saying "good riddance" then I'm really unsure about how anybody can think that adding additional fuel here is better than just being silent, especially if one is not more directly involved in these things anyway.

A sincere thank-you to Christoph for years of hard work in a difficult and critical area

Posted Feb 26, 2025 10:15 UTC (Wed) by sdalley (subscriber, #18550) [Link] (1 responses)

Doing DMA right is inherently really hard. A multitude of tricky low-level issues around multithreading and resourse contention, interrupt handling, address spaces, memory allocation and protection, and weird and cantankerous hardware devices can come into play, to say nothing of making it work across a plethora of different CPU platforms. My hat's off to Christoph, and all who have made this all work and kept it working over the years.

A sincere thank-you to Christoph for years of hard work in a difficult and critical area

Posted Feb 27, 2025 11:59 UTC (Thu) by taladar (subscriber, #68407) [Link]

Sounds to me like the exact kind of code where investing in better tooling and language features would be a good idea.