Whither the Apple AGX graphics driver? [LWN.net]

Index entries for this article
Kernel	Development tools/Rust
Kernel	Device drivers/Graphics

C Job Security

Posted Sep 4, 2024 14:15 UTC (Wed) by epilys (subscriber, #153643) [Link] (18 responses)

> ""You just have implemented the references differently than they are supposed to be"". He never addressed the objective of making the API intrinsically safe; that objective appears to not be on his radar.

This sounds extremely like protecting your territory (and most probably job security too) by making yourself irreplaceable. Maintainers should be as objective as humanly possible, yet we have people who if they were replaced by other also knowledgeable and capable individuals, everyone's experiences would be positive over all. When that happens, there is a people problem, not a code problem.

C Job Security

Posted Sep 4, 2024 14:24 UTC (Wed) by corbet (editor, #1) [Link]

I do not know the maintainer in question, but I do believe, from my reading of the conversations, that people are not acting in bad faith here. It definitely strikes me as more of a case of misunderstanding and a lack of a set of fully shared objectives. We should be able to get past that without calling for people to be replaced.

C Job Security

Posted Sep 4, 2024 14:40 UTC (Wed) by pizza (subscriber, #46) [Link] (1 responses)

> This sounds extremely like protecting your territory (and most probably job security too) by making yourself irreplaceable.

That is a very uncharitable interpretation; passively-aggressively accusing the maintainer of working in bad faith is not going to lead to a positive outcome.

>Maintainers should be as objective as humanly possible,

Yet anything less than this platonic spherical cow ideal of perfection (which always seems to be translate to "does what *I* want them to do") is grounds for accusations of malfesance.

> we have people who if they were replaced by other also knowledgeable and capable individuals, everyone's experiences would be positive over all

That's quite a logical leap you're making, based on very-not-supported-by-evidence assumptions.

C Job Security

Posted Sep 4, 2024 15:36 UTC (Wed) by epilys (subscriber, #153643) [Link]

I don't disagree with you, this is only the shallow optics of this exchange from someone who doesn't know the parties involved in a meaningful level. A direct NAK instead of a discussion might as well be in bad faith, in first look, after all...

> Yet anything less than this platonic spherical cow ideal of perfection (which always seems to be translate to "does what *I* want them to do") is grounds for accusations of malfesance.

Humanly possible is not an ideal of perfection, you misinterpreted the sentence. Your response reads to me as "uncharitable interpretation; passively-aggressively accusing [the poster] of [commenting] in bad faith" too. **I** do not want them to do something specific; perhaps I should have written "should strive to be as much objective as possible". Besides, I only said "as humanly possible" which is not another way of saying "ideal".

But in any case, I get what you're trying to say. But this is in response to subtext that just wasn't there, that is all.

> That's quite a logical leap you're making, based on very-not-supported-by-evidence assumptions.

Well, I'm talking about this case in particular -and the occasion it was brought up again (A maintainer resigning) - where we have the evidence right here: a general air of malaise because people do not like any sort of change.

C Job Security

Posted Sep 4, 2024 15:14 UTC (Wed) by Wol (subscriber, #4433) [Link]

Unfortunately, that's probably not uncommon. The ext3/4 debacle seriously dented my faith in that maintainer ...

The move from ext2 to 3 was obviously great - all of a sudden we had a journal, and you no longer had to fsck huge disks after a crash - the system was back up in minutes, not hours.

The move from ext3 to 4 was - from the end-user perspective - a disaster. Yup, the system still came back up in minutes, maybe even faster than before. But the DEFAULTS changed - such that for the user to get their system back it went back to the ext2 days and worse. What on earth is the point of protecting the file system, when you don't protect the user data on that file system, and leave your users looking at a "mkfs; restore backup" to get a system they can trust!? What's the point of protecting the file system, when the associated cost includes the user just deleting it!

(For those who don't remember the details, the ext3 default was "journal the file data, journal the file metadata". Ext4 swapped that around, such that after a crash large parts of the file metadata could easily be left pointing at random garbage on disk that the journal system hadn't managed to save before the crash.)

I'm left with the impression that quite a few maintainers see their primary job as protecting the computer itself. It isn't! Okay, it is probably the best way of achieving the primary objective, but the primary objective is to protect the user data. Protecting the computer is the best way of achieving that, but don't lose sight of the REAL objective !!!

A lot of this fuss over Rust can probably be seen in the same light. My spats over Relational can certainly be seen in this light - I value a fast response time more highly than a guaranteed response time, especially when the difference between the typical response times is so glaringly huge! The point of a database isn't to squirrel data away, it's to provide it to the user as required, not to take a day to find it because response times have gone through the floor because you've saved everything in sight!

Cheers,
Wol

C Job Security

Posted Sep 4, 2024 15:52 UTC (Wed) by pbonzini (subscriber, #60935) [Link] (12 responses)

The question is 1) whether making the API intrinsically safe requires changes to all the drivers, and of what size 2) whether it is even worth making the whole API intrinsically safe, for example whether "modern" devices only need a small subset or the whole complexity.

Lina had no problem not using the scheduler API (initially she wanted to avoid using workqueues, because they had no Rust bindings, but when that changed she dropped the drm scheduler and used workqueues instead). Therefore, it's quite possible that the maintainer is seeing the balance differently.

In any case, it doesn't seem like the drm scheduler is a blocker for AGX.

C Job Security

Posted Sep 4, 2024 16:00 UTC (Wed) by pbonzini (subscriber, #60935) [Link]

... I added "It may even be wort writing a new, simpler and intrinsically safe scheduler in Rust", and then deleted because I thought I was probably talking out of my ass. But as it turns out:

https://vt.social/@airlied@fosstodon.org/113052975526246061:

> you and Christian are not going to solve this roadblock without and intermediary. As Danilo works his was through the roadblocks to upstream drivers, this is something we expect to engage with in terms of the bigger picture. We have an engineer looking into this, but I'm also not against a rust reimplantation if it's useful beyond asahi

To which Lina replied:

> the scheduler in the driver [...] will be roughly modeled after drm_sched (just much simpler and idiomatic Rust) and it shouldn't have any driver dependencies, so if Nova wants to use it it would be simple to pull out into common code.

So yeah, the job security angle is completely wrong. It's just a maintainer doing his job. At the same time there are people like Dave that handle the overall interaction between subsystems, are aware of what's going on, and are recruiting help towards removing the roadblocks.

C Job Security

Posted Sep 4, 2024 16:51 UTC (Wed) by asahilina (subscriber, #166071) [Link] (10 responses)

> The question is 1) whether making the API intrinsically safe requires changes to all the drivers, and of what size 2) whether it is even worth making the whole API intrinsically safe, for example whether "modern" devices only need a small subset or the whole complexity.

1) It doesn't.

2) This is the diffstat for what I asked for:

Asahi Lina (3):
drm/scheduler: Add more documentation
drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name
drm/scheduler: Clean up jobs when the scheduler is torn down.
drivers/gpu/drm/scheduler/sched_entity.c | 7 ++-
drivers/gpu/drm/scheduler/sched_fence.c | 4 +-
drivers/gpu/drm/scheduler/sched_main.c | 90 ++++++++++++++++++++++++++++++--
include/drm/gpu_scheduler.h | 5 ++
4 files changed, 99 insertions(+), 7 deletions(-)

58 of those 90 added lines were added documentation.

Not even 50 lines of actual code delta to make things robust enough to make a Rust abstraction possible, but the maintainer would rather I add a bunch of craziness in the abstraction to work around it. I don't think that is a sensible tradeoff.

C Job Security

Posted Sep 4, 2024 17:02 UTC (Wed) by asahilina (subscriber, #166071) [Link]

Also I should mention I wasn't making "the whole" API "intrinsically safe". That's not really possible in C anyway. What I was doing is eliminating two specific convoluted lifetime requirements which are 1) plain hard to reason about, 2) impossible to encode in the Rust type system, and 3) Impractical/very ugly to enforce via dynamic constructs.

There are plenty of other lifetime/safety requirements in drm_sched, but for the rest I was able to either directly encode them as Rust lifetimes and Rust types, or enforce them via an extra reference counting layer in the abstraction. It's when those approaches fail that I really have to change the C code, and so far drm_sched has been the only component with an API crazy enough to need that when abstracted in Rust, of all the abstractions I've written.

Changes to drm_sched to make it safe

Posted Sep 4, 2024 17:18 UTC (Wed) by farnz (subscriber, #17727) [Link] (6 responses)

Would you be able to link to the relevant commits somewhere? Even just a git URL and a set of commit IDs from which we can recreate that diffstat would be good enough - I basically just want a chance to read your C code and get a sense for myself of what the changes you were asking for look like, and whether they can be justified in the absence of Rust (my gut feeling is "yes, they make sense for a pure C kernel, too", but it'd be nice to be able to point to code to explain that).

Changes to drm_sched to make it safe

Posted Sep 4, 2024 17:24 UTC (Wed) by asahilina (subscriber, #166071) [Link] (4 responses)

This is the second submission (some of the discussion was in the first but this one has the self-contained scheduler changes I wanted): https://lore.kernel.org/lkml/99f9003f-d959-fff3-361a-25b2...

Changes to drm_sched to make it safe

Posted Sep 4, 2024 17:27 UTC (Wed) by farnz (subscriber, #17727) [Link]

Perfect, thank you! It makes what you were doing there nice and easy to follow, and shows that the diffstat is misleading in that virtually all the additions are documentation.

Changes to drm_sched to make it safe

Posted Sep 21, 2024 11:43 UTC (Sat) by ras (subscriber, #33059) [Link] (2 responses)

The story made it seem like a clash off C vs Rust views on programming safety. But by the time you get to the end of that thread, it's clear the problem runs deeper than that.

Three comments stood out to me. The first is from Christian, the drm maintainer:

> I have seen at least halve a dozen approach in the last two years where people tried to signal a dma_fence from userspace or similar.
>
> Fortunately it was mostly prototyping and I could jump in early enough to stop that, but basically this is a fight against windmills.

Then this one, from an independent third party, possibly in response to the fact the panfrost had got dma_fence life times badly wrong (apparently a windmill sailed past unnoticed):

> I really think this needs to be documented if nothing else out of this thread.
>
> Clearly nobody is going to get it right and hidden here in this thread, this info isn't useful.

The last comment in the the thread comes from the maintainer, Christian:

> Ah shit, yeah of course. We use IRQ threads in amdgpu for the second interrupt ring as well.
>
> Ok, nail that coffin. Any other ideas how we could enforce this?

The significance of "IRQ threads" is the ban against signalling dma_fence's from userspace "or similar". "IRQ threads" fall under "so similar", apparently.

So it seems it's not just the Rust dev's who have a problem with the current API. Everyone does. The difference looks to be that Asahi tolerance for this state of affairs is so low she refuses to use it in it's current state. Perhaps the connection to Rust is it's borrow checker has made sloppy lifetime accounting an anathema to it's adherents.

Changes to drm_sched to make it safe

Posted Sep 22, 2024 16:36 UTC (Sun) by Rudd-O (guest, #61155) [Link]

> Any other ideas how we could enforce this?

🤣 oh that question was so rich. I'm chuckling, sitting on my couch.

He's been presented with a giant 50kW laser with an automated turret to completely annihilate this class of API defect (the Rust type system), and when presented with this epic gift, he says NYAAAH.

Cue scroll of truth meme.

Asahi's "sin" was pointing out bad API design that a few noticed before, and a few are too cringely attached to. I understand that no driver needs to be completely rewritten if this problem is fixed. But if this problem needs to be fixed and that required the rewriting of a bunch of drivers, then that is exactly why it should happen. Over the years of existence of the DRM system, I have seen quite a few upses and other problems caused by it. And I have little doubt this is the sort of code quality issue that I've been bitten by.

W.r.t. the lifetime/ownership problem, rightfully raised by Asahi:

So, printk() needs to dereference the scheduler to print what, exactly, of importance? Easy solutions all around! Either cut out that printk() function call altogether, or make it not deref the scheduler (e.g. print a string referring to the scheduler name), or just use a weak reference which is easily modelable in any reasonable type system and there's a sane behavioral fallback in case the scheduler has died in between.

I completely understand why in some circumstances you may need to have circularity in ownership. But this specific case is exactly one of those cases, one of those many, many cases, where circularity is completely unjustified.

Years ago when I first heard of Rust, I was turned off to it by a few unsavory personalities in that community. I have been coding for two straight years in Rust now, and I have to say: it is absolutely wonderful; all the times before where I reached for C++ or Go or Python? Oh, how badly do I wish I had actually just used Rust back then.

If I was professionally engaged in writing a kernel driver for video cards or any other such device, I would probably just rip out the DRM, rewrite it in Rust, and then provide a C interface / API that is somewhat or mostly compatible with the current API for existing drivers, so I don't have to rewrite these other drivers, at least not totally. This is possible, and (I believe) Federico Mena Quintero has basically rewritten librsvg to be safe using these techniques (the GNOME planet carried his posts detailing these efforts). But, even if that was not doable, Asahi's call to just ignore the DRM subsystem and write her own workqueue-based implementation is the right call. Just route around the brain damage and keep going forward. Hopefully future drivers have based their own development on that work, rather than continuing to rely on a C API that is technically fixable, but (for human reasons) probably won't be anytime soon.

Changes to drm_sched to make it safe

Posted Sep 22, 2024 16:39 UTC (Sun) by Rudd-O (guest, #61155) [Link]

> Perhaps the connection to Rust is it's borrow checker has made sloppy lifetime accounting an anathema to it's adherents.

I would wager this is exactly it, because that's exactly what has happened to my own brain.

Even now, when I must use something like Python, I follow the advice of a fascinating article (I'm not sure if it was echoed in this publication here) called Write Python Like It's Rust. That article has some solid advice of Rust patterns that you can use when you're writing a Python application, which definitely lead to improving the quality of your Python code.

Heck, just using types in Python, accompanied with a suitable IDE built-in type analyzer, has already helped me dodge hundreds of potential bugs.

Changes to drm_sched to make it safe

Posted Sep 4, 2024 17:26 UTC (Wed) by corbet (editor, #1) [Link]

Most of those were linked in the article:

C Job Security

Posted Sep 4, 2024 17:32 UTC (Wed) by pbonzini (subscriber, #60935) [Link] (1 responses)

Hi Lina :) I don't know the code so I cannot comment on whether the maintainer rejected the patch rightly or wrongly. Sometimes 5 lines of code take longer to review that 500 and have a bigger impact.

That said, I was referring to the quote from the article ("making the API intrinsically safe [...] appears not to be on his radar") which seemed to be about more than your patch—especially considering the words of Dave Airlie, it seems to be something that requires a lot of study and planning. ¯⁠\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯

C Job Security

Posted Sep 22, 2024 16:40 UTC (Sun) by Rudd-O (guest, #61155) [Link]

> Sometimes 5 lines of code take longer to review that 500 and have a bigger impact.

Not those 5, in my opinion.

‘C Job Security’ OMGRUSRS?

Posted Sep 4, 2024 17:53 UTC (Wed) by Deleted user 129183 (guest, #129183) [Link]

> This sounds extremely like protecting your territory (and most probably job security too) by making yourself irreplaceable.

Or maybe the Linux maintainers want things to be consistent, rather than each every driver doing things its own way? I don’t think it is unreasonable to expect new contributors to adhere to existing policies.

Supplicating Linus

Posted Sep 4, 2024 15:30 UTC (Wed) by dmv (subscriber, #168800) [Link] (2 responses)

People seem awfully quick to want Linus to make firm decisions on things when there are active or seemingly intractable conflicts, and one of his strengths is kinda pretending it doesn’t even exist to let it work itself out.

But allowing the Rust experiment seems like a significant enough decision that it really does need more guidance. Even high level guidance. Like, “maintainers, be flexible and allow Rustaceans to implement what they need since this is an experiment” or whatever, but it does seem manifestly unfair to the whole experiment to require them to figure out a separate approach for each individual maintainer without the maintainers having some sort of firmer direction from Linus.

Supplicating Linus

Posted Sep 4, 2024 15:35 UTC (Wed) by dmv (subscriber, #168800) [Link] (1 responses)

A point Asahi Lina made towards the end of their comment here, I just discovered: https://vt.social/@lina/113072852522845116

Supplicating Linus

Posted Sep 22, 2024 16:44 UTC (Sun) by Rudd-O (guest, #61155) [Link]

> The whole thing started with a simple proof of concept that Rust could be used to represent GPU object lifetimes to help get firmware API usage correct.

Having suffered from unclear pointer ownership bugs pretty much my entire professional life, I can only nod furiously at this comment.

.clone().unwrap() 🤣

Bad design?

Posted Sep 4, 2024 15:52 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> Well that is actually not correct. Schedulers are supposed to stay
> around until the hw they have been driving is no longer present.
> E.g. the reference was scheduler_fence->hw_fence->driver->scheduler.

To me it seems like a bad design regardless of the programming language. A scheduler should be a component of a driver, not vice versa. It doesn't make sense otherwise, what would a scheduler do without a driver to make its lifetime longer?

Bad design?

Posted Sep 22, 2024 16:47 UTC (Sun) by Rudd-O (guest, #61155) [Link]

Exactly correct. And Judging from other comments posted in this thread, it looks like other people have been bitten by other API deficiencies in that subsystem.

Thoughts and clarifications

Posted Sep 4, 2024 16:22 UTC (Wed) by asahilina (subscriber, #166071) [Link] (142 responses)

I think I need to clarify a few things...

> König is correct, in that these problems do not (hopefully) happen with existing, in-tree graphics drivers.

I think you missed the email where I pointed out how the codepaths in the first in-tree C driver I looked at (panfrost) suffered from the same exact bugs. The only reason this isn't actively crashing on the daily for users is that in-tree drivers mostly use a global scheduler and only tear it down on device removal, so any potential problems can only happen when you unbind/unplug the GPU, which is only relevant in practice for eGPU users outside of testing scenarios.

In other words, (at least some of) the existing C code is just as broken as Rust would be without the fixups I sent, it just happens to hit the breakage less often due to other driver design differences. The problem is not just that the interface cannot be reasonably used safely from Rust, it's that it can't reasonably be used safely from C either (in practice), since users don't understand and don't uphold the undocumented requirements for it to be. C developers just get away with being blissfully unaware of the mistakes until the code actually crashes, because the language doesn't force them to consider these things upfront as part of its syntax, but that doesn't mean the same problems don't apply. A C API that can be sensibly abstracted in Rust is also a C API that is easier to use and reason about than one that cannot, and one that is more likely to be used correctly by C consumers.

But it's not just the API. There are actual, serious memory safety bugs in drm_sched that affect all drivers since they have nothing to do with the usage pattern of the driver [1]. I tracked that one down recently, but it's just more evidence that the design of this component is just poor, not just in terms of API, but also its internal architecture: Even the maintainers themselves can't maintain code quality and stop memory safety bugs from creeping into drm_sched, because it's so difficult to understand its architecture in practice. This points to a deeper underlying problem that has nothing to do with the existence or not of Rust abstractions. This is the reason why I'm rewriting it in Rust, not just the rejected patches. At this point I consider the entire component too brittle and buggy to be worth using, and given my interactions with the maintainer he clearly isn't willing to allow me to make robustness improvements or work towards a better design (at least not without way too much pushback to be worth attempting, for my own mental health even).

This is not a pervasive problem with all C maintainers or all C code. I haven't had any pushback to minor API changes I had to make in other parts of DRM, and I haven't seen an API as poorly designed as drm_sched anywhere else. For example, I recently refactored my driver to use the DRM GPUVM Manager component (incuding writing an abstraction for it), and after working out some locking subtleties (of the "it deadlocks" kind, not the "it races" kind, which is important!), it just worked. We're shipping that now and haven't gotten any reports of regressions. I do want to have a conversation about how to improve that API a bit in the future (particularly around the deadlock risks), but that doesn't block anything, and it doesn't imply fundamental design issues.

The end result is that you end up making mental lists of maintainers you will deal with and maintainers you won't. This has been a thing forever in kernel development as far as I know, it's just getting extra painful with Rust because we get extra animosity from certain maintainers who also don't want to lose their position of control in the C world over time, and who see another programming language they don't want to learn as a threat. Christian may not have been at this extreme, but he at least definitely didn't see any value in fixing the issues I brought up in the context of Rust "because drivers that work exist (in C)".

Some of the other Rust upstreaming mailing list threads are quite honestly painful to read. I don't want to point at anyone in particular since I think I've already stirred the pot enough and I'd rather go back to having some peace in my life, but I have to say I would not be able to handle that, and I'm glad other people with more perseverance than me are working on stuff. I completely understand Wedson's pain and why he left.

The reason I haven't sent anything out myself is because it's become clear that things are moving slowly, much too slowly to be worth fighting for anything right now. It took one whole year for a few lines of code to wrap a `struct device` to be upstreamed. There was a lot of initial momentum when Rust was merged, and all that fizzled out when it became clear just how drawn out and painful the abstraction upstreaming work was going to be.

Once the platform device abstractions are finally upstream, whenever that may be, I don't expect much trouble getting whatever is left of the DRM abstractions upstreamed and the drm/asahi driver. But I won't be pushing for anything major until then, because that is a hard dependency, and I don't have the patience to push for upstreaming that myself after seeing how things are working out. As for Nova, it's not going to get upstreamed first. They depend on all the same abstractions my driver does and more, and Nova as a driver doesn't really exist yet. I may be somewhat burned out but not burned out enough to completely give up upstreaming drm/asahi once the dependencies are in, which Nova depends on too anyway.

Honestly, I think giving existing kernel maintainers veto/ack power over Rust abstractions was a mistake. That only works with cooperative maintainers who are willing to learn about why Rust works the way it does (if not outright learn the language) and who accept delegating Rust design decision making to the Rust experts if they are not willing to become one themselves. That is, sadly, not what we're seeing in practice. What we're seeing is a whole spectrum from that, to maintainers who expect the Rust team to hand-hold them and effectively teach them Rust and the Rust way (very, very inefficiently and painfully, since mailing lists are not an efficient teaching platform) because they insist on micro-managing the Rust approach while not really putting in any effort themselves into understanding things, to maintainers who just don't care for anything Rust has to offer and won't be convinced by any arguments that mention that name, to maintainers who are outright hostile to Rust and are overtly or covertly trying to sabotage the project. And that just isn't working, because given the current rules, the project cannot succeed unless all maintainers cooperate at least to a reasonable extent. I really hope Linus says something about this, because he is the only person with the power to push things in the right direction at this point. If he doesn't, I see a very real possible future where RfL becomes a failed experiment due to the responses of some maintainers.

[1] https://github.com/AsahiLinux/linux/issues/309#issuecomme...

Thoughts and clarifications

Posted Sep 4, 2024 16:40 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (125 responses)

> we get extra animosity from certain maintainers who also don't want to lose their position of control in the C world over time, and who see another programming language they don't want to learn as a threat.

I really think this is the type of comments which is causing trouble between developers of both teams. There *might* possibly be such people but quite frankly I doubt it.

The reality is more that all kernel maintainers are just buried alive in maintenance and review work and that adding even more stuff to have to think about not breaking with each future change will inevitably cause them more efforts for each move and make them even less efficient at their tasks. We're *lacking* kernel maintainers. Many of them would probably just give up if someone could come with a *guarantee* that their work will continue to be well supported for their users. Developing is not just a matter of having a job. It's also a matter of being responsible for many users of your work and not willing to abandon them nor send them to what you suppose to be a dead end.

No programming language is a threat to any developer. A programming language at best is a thread to another programming language, and may be an opportunity or not for a developer, that's all.

It's really important to understand what being committed to one's users means in the daily life. Not having week-ends, checking e-mails during vacation, constantly getting comments from family "you're again on your laptop?". I'm fairly certain that many would likely not redo that, but they're in it and they still love the feeling that we all know, seeing their stuff work fine somewhere, to think "hehe for them it's natural that it works out of the box, I'm glad that it works fine and proud to have been part of that thing".

Really, accusating maintainers of trying to keep full control of the parts they're responsible for in front of the whole comunity, of just doing that to keep their job is unfair and even disgusting to some extents. You're not doing you a good service doing this, by showing a possible lack of empathy and even understanding of what their responsibilities are, and the efforts they have to make to ensure everything works fine together so that you can bring your stuff, connect it and see it work (even if for now it's not as simple).

Thoughts and clarifications

Posted Sep 4, 2024 16:46 UTC (Wed) by asahilina (subscriber, #166071) [Link] (114 responses)

> There *might* possibly be such people but quite frankly I doubt it.

Well, there was Ted Ts'o's rant at Wedson and the others that started with an accusation of wanting to "convert" people to the Rust "religion", followed by a pile of strawman arguments, followed by another person making jokes comparing Rust to Java and more strawmen.

It may not be something that is expressed in the open on the mailing lists daily, but at this point it has become quite clear to me that those people do in fact exist.

This should in fact not be surprising, because those anti-Rust people who cling to C absolutely do exist on the internet (you will meet many of them on Twitter where masks are off much of the time, or you can go to the Phoronix comment section for an extra dose of toxicity if you want). It is logical that at least some of these people would have ended up in Linux maintainer positions.

Thoughts and clarifications

Posted Sep 4, 2024 17:00 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (112 responses)

But can you at least put yourself in the shoes of someone suddenly pressured to change everything that they're trying hard to keep in good working condition ? Has it ever happened to you ? When you start to apply blind changes that you don't sense well, and that for the next 10 years, once a month you have to work several days on a bug that is a direct consequence of a rushed choice that you later regret. Some parts do require deep thinking and the thinking must be made with multiple people.

Saying people "ah you don't want my work, so you're against me and my preferences" is horrible. It simply denies the right to *think* from the people. When you see people that are overly confident in their choices and that are unable to step back and think forward about possible evolutions and consequences in 1 year, 3 years, 5 years etc, and you're still dealing regularly with the consequences of some 10-year old choices, what can you think except "that person pressuring me is lacking experience and refuses to listen to my concerns".

To be honest, when I read your long message, I was a bit scared by the extreme confidence you have in the quality of your own work and your ability to spot many other "older" areas that are wrong. I've worked with people doing that in the past. They'd stay 3 years for the time it takes to transform a forest into a desert, then they suddenly quit without warning because "it's impossible to work with a team that constantly rejects my art" yet the ones in place have to deal with the consequences.

It's important to be able to argue (and sometimes strongly) with others around technical arguments, concerns, difficulties, but beliefs and personal preferences may never be used to try to disprove the other party. That just creates a bias and puts a discussion to a stop, resulting in nothing good. The only possibility that remains then is to just make a lot of noise, saying "look what they do to me", but sometimes it's even quite visible and doesn't do good service to any of the parties.

And I agree with you on one point: mailing lists are not suitable for great explanations. They're fine to exchange points of views, ideas, patches, code reviews or suggestions, but when it becomes a matter of culture or approach, discussions in person are absolutely needed. And if possible with few people so that the discussion can heat a bit if that helps, without anyone having to witness that.

Changing something that works is extremely difficult. Changing it without breaking it is even more difficult. Changing it in a way that guarantees that it will still be possible to change it later is the most difficult. This requires experience and honest cooperation, not bold accusations nor criticisms of everything.

Thoughts and clarifications

Posted Sep 4, 2024 17:13 UTC (Wed) by asahilina (subscriber, #166071) [Link] (32 responses)

> But can you at least put yourself in the shoes of someone suddenly pressured to change everything that they're trying hard to keep in good working condition ? Has it ever happened to you ? When you start to apply blind changes that you don't sense well, and that for the next 10 years, once a month you have to work several days on a bug that is a direct consequence of a rushed choice that you later regret. Some parts do require deep thinking and the thinking must be made with multiple people.

Now we're back to the same strawman arguments... the Rust team aren't asking the C maintainers to "change everything" or practically anything. If anything I've been finding bugs in drm_sched and its users in C code as part of my work. Me not doing this work would have led to the C folks having to debug those things. And Wedson wasn't asking FS developers to change anything in C, he was asking FS developers to help understand the existing C API so it could be mapped to Rust properly.

Of all the Rust abstractions I've written drm_sched was the exception, the one case where the existing design was so bad I really needed to improve it. Those changes were completely non-intrusive, the patch wasn't even 50 lines of code and did not change the API for existing drivers in any way.

You're making it sound like us Rust folks are pushing through rushed, questionable changes to C code just because we "need" it and that just isn't true, at all.

> Changing something that works is extremely difficult. Changing it without breaking it is even more difficult. Changing it in a way that guarantees that it will still be possible to change it later is the most difficult. This requires experience and honest cooperation, not bold accusations nor criticisms of everything.

None of the changes I asked for negatively impacted existing correct code, that was trivial to see since they only affected conditions which supposedly were previously not allowed to happen. They definitely didn't break anything. They definitely didn't introduce some horrible maintenance burden or touch things deep in the architecture. It was just some cleanup code and cloning some debug strings so they weren't dangling pointers. That's it.

Every time people excuse the C maintainers' reaction to Rust it's always strawmen. Things that aren't true. Things they wish were true so they could discount and attack Rust, but which aren't, so they just pretend they are.

And I and the other Rust people are getting really, really, really tired of this nonsense.

> To be honest, when I read your long message, I was a bit scared by the extreme confidence you have in the quality of your own work and your ability to spot many other "older" areas that are wrong.

Well, I'm pretty confident that drm_sched is buggy because I got oops reports from users and had to debug it, multiple times. And so far I haven't had any oops reports in my own driver code from users. So there's that...

Thoughts and clarifications

Posted Sep 4, 2024 17:37 UTC (Wed) by Wol (subscriber, #4433) [Link] (8 responses)

> And I and the other Rust people are getting really, really, really tired of this nonsense.

Welcome to the club.

I'm known for being very anti-Relational. Well, why wouldn't I be, when I seem to spend most of my time waiting for Snoracle and the like to find data.

Relational is O(log(n)) which means searching 100K rows is O(5). Yes it's a guaranteed O(5), but I can provide a 5 nines guarantee my database will be faster (it's O(1) nearly all the time - IRREGARDLESS of the size of the data set).

It's a tradeoff. Do you want an "almost always 80% faster response", or a "guaranteed to get a result but all of them will be slow".

People can't understand why you want "it's impossible to specify a faulty memory operation", people can't understand why I want a database that returns data almost faster than I can hit "return" ...

Cheers,
Wol

Slow down a little

Posted Sep 4, 2024 17:54 UTC (Wed) by corbet (editor, #1) [Link] (1 responses)

Wol, slow down please? Perhaps we don't need to drag this stuff into every unrelated discussion? Please?

Slow down a little

Posted Sep 4, 2024 19:59 UTC (Wed) by Wol (subscriber, #4433) [Link]

Sorry Jon. I think I go in phases, the problem is I'm so frustrated at the moment fighting Excel I'm probably over-compensating elsewhere.

I do feel for Asahi, though, because I do my best to explain things and feel that people just don't want to understand.

I'll do my best to back off. Part of the trouble is that (especially as I'm in the wrong time zone for most people) that seriously gets in the way of a good discussion. Maybe that's a good thing. Stops me getting really out of hand :-)

Cheers,
Wol

Thoughts and clarifications

Posted Sep 4, 2024 17:57 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (1 responses)

> It's a tradeoff. Do you want an "almost always 80% faster response", or a "guaranteed to get a result but all of them will be slow".

I love this example because it's often a difficult choice that changes over time. For example those who pay for the CPU time will most of the time prefer "almost always 80% faster responses". And those who pay strong penalties on SLA will often prefer "guaranteed time but slower". And it happens that these choices are taken in a product at a moment, the product's area of popularity evolves over time, it's good, it moves to other areas and then the initial better choice is suddenly considered as an absurd one. Just because where it acts, the default preference is different.

Thoughts and clarifications

Posted Sep 22, 2024 17:18 UTC (Sun) by Rudd-O (guest, #61155) [Link]

At a certain nameless company that was very, very big and was running, at least my team, a very, very big, private GPU compute cluster, They turned off, by mandate, all of the spectre and meltdown protections. Why? Well, they were paying for the compute time in the electricity, and it was a lot of money. But turning these off meant the workloads that would often take weeks or months would finish days before. And sometimes days is what you have before a product launch (or a tapeout, same thing from our vantage point).

I am not sure entirely, however, how this connects to the conversation of Rust in the curve. Rust is focused on providing what they call cost-free abstractions. That is, all the typesafety and niceties that you see that some people like, some people hate, I myself hated them before... They don't cost anything. You don't pay for them in compute time. You don't pay for them in electricity. At least not doing execution. They're all handled at compile time.

The cost you pay for this is not measured in compute or electricity. The cost you pay for this is that you develop somewhat (a lot!) slower at the beginning. The compiler is constantly fighting you. Because you're doing things that you find natural and normal with your experience level at whenever language you were using before, you started using Rust, where you naturally have less experience at the beginning. Things that Rust defines as impossible using a compiler in a borrowed checker in the compiler.

It's an important cost; you wouldn't want to prototype something rapidly in Rust, especially if you don't have a lot of experience within it, because it's just gonna take you much longer to finish the prototype. But it is a cost that certainly worth paying when what you're doing is going to be used by potentially millions or hundreds of millions of people and some of those people are going to be very interested in seeing how they can exploit that so that they can harm others.

There are definitely trade-offs to be made, but I don't think the Linux kernel as a whole should be making the trade-off of "just keep doing what we were doing, mang, it's gonna be fine", because it really is not.

To be perfectly honest, one can write subpar, suboptimal, slow, and even crashy Rust code. Cough .clone().unwrap() cough. But that is an argument that applies equally to every language. In fact, more to other languages where it's actually easier to write all of those things. Because the language will help you do it.

Thoughts and clarifications

Posted Sep 22, 2024 17:08 UTC (Sun) by Rudd-O (guest, #61155) [Link] (3 responses)

> People can't understand why you want "it's impossible to specify a faulty memory operation",

Not everyone can't understand that. But some can...

Rust (its "safe" part, at least) as a language is not a magical, can't crash your car, safety system. But it does provide certain niceties like, for example, seat belts. Or, you can't shift into park or reverse while you're rolling forward in drive. Or, it won't start until you've depressed the brake pedal. Or, backup cameras. Or, LATCH baby seat anchors. Or, airbags.

C's the ratrod.

Analogies aside, I am actually quite surprised, after only two years of programming in Rust, that it is almost impossible to deliberately cause a invalid memory access bug in Rust, unless you are using "unsafe" or you are using bindings to a C library which itself has a bug. Before that, I didn't even know that such a thing was possible. I thought the only way to get a memory-safe system is to have a garbage-collected system that just tracks pointers for you and you never have to think about that. And of course, never call into a C library.

I strongly suspect that the entire world of systems development, especially embedded in low-level programming, is going to experience a shift that in 20 years, or maybe less, will have most people wondering, why would anyone use C? With roughly the same amount of puzzlement as you would find yourself in, if you heard someone asking for a car with a steering wheel. Obviously every car has a steering wheel. That's just not something you ask.

The borrow checker as a significant advance

Posted Sep 23, 2024 9:03 UTC (Mon) by farnz (subscriber, #17727) [Link] (2 responses)

Arguably, the biggest advance Rust brings is the borrow checker; it's hugely intrusive, but it allows Rust to formally verify that you're paying attention to whether or not a given pointer (in Rust's case, a class of pointers called "references") points to a valid place at time of use, without runtime assistance (e.g. a garbage collector).

Combine that with "either shared, or mutable, but not both", and you get a powerful tool for making it easier to reason about your code's behaviour.

The borrow checker as a significant advance

Posted Sep 23, 2024 11:24 UTC (Mon) by intelfx (subscriber, #130118) [Link]

Another hugely impactful thing is Rust's linear/affine type system. This one has actually changed the way I think about variables, objects and resources in _any_ language.

The borrow checker as a significant advance

Posted Sep 25, 2024 9:54 UTC (Wed) by Rudd-O (guest, #61155) [Link]

Oh, 100% right! Ownership checking and the prevention of pointer aliasing are awesome.

Thoughts and clarifications

Posted Sep 4, 2024 17:53 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (21 responses)

> Now we're back to the same strawman arguments...

I'm starting to sense how it can be difficult to have a technical-only discussion with you...

> the Rust team aren't asking the C maintainers to "change everything" or practically anything. If anything I've been finding bugs in drm_sched and its users in C code as part of my work. Me not doing this work would have led to the C folks having to debug those things.

I simply don't know because it's not my code. What I'm trying to explain is that it's very common when coming with a 2-line patch to reveal a deeper problem that needs more work, and *that* work can have serious long-term consequences, and as a result requires more thinking. Then the problem is that the patch author can feel like "they're rejecting my trivial patch, it's just because they hate me" while that trivial patch is in fact a brown paper bag over a deeper problem. This has happened to be many times and I have caused myself frustration many times to contributors who revealed such problems in my code. And believe me, a 2-line patch can end up with 6 months of work to redo lots of things differently. And in my project I don't have to deal with ABI compatibility issues nor stuff like this. That's why I'm saying that I understand to some extents when this can happen. I'm not saying this is the case but the way you seem to instantly map a patch rejection to "C vs Rust", I really don't like it because such form of victimization has already hurt the Rust community a lot in my opinion. I mean, this is a perfect example that will make me more careful later about discussions around this langauge, by fear of entering a maelstorm of false accusations.

> And Wedson wasn't asking FS developers to change anything in C, he was asking FS developers to help understand the existing C API so it could be mapped to Rust properly.

Maybe, I honestly don't know. But similar things happened to me in the past where some people asked me levels of details about my code that I simply didn't have, because the code is the doc when it comes to APIs, so I couldn't do more than reading it again and it takes a lot of time, plus I feel like I'm just wearing my eyes so that the requester can spend his time playing candy crush, which is not really cool.

> Of all the Rust abstractions I've written drm_sched was the exception, the one case where the existing design was so bad I really needed to improve it. Those changes were completely non-intrusive, the patch wasn't even 50 lines of code and did not change the API for existing drivers in any way.

Possibly, I don't know. But it's not the size of the change that matters when you're getting close to the core, it's the impacts and what the change reveals. You said yourself that you've found deep problems with the current API and that it's totally unsafe even for C. Did you ever think that the maintainer himself probably doesn't trust this code at all anymore and is not willing to permit more code to get more intimate with it ?

> You're making it sound like us Rust folks are pushing through rushed, questionable changes to C code just because we "need" it and that just isn't true, at all.

No, I'm not saying that it's what happens, I'm saying that you need to understand that it may be perceived like this sometimes by the person whom you're asking to accept the change, regardless of any language. It would be the same from C to C as well. The language has nothing to do there, yet you're bringing it on the table all the time as the supposed reason for your work not being accepted.

> None of the changes I asked for negatively impacted existing correct code, that was trivial to see since they only affected conditions which supposedly were previously not allowed to happen. They definitely didn't break anything. They definitely didn't introduce some horrible maintenance burden or touch things deep in the architecture. It was just some cleanup code and cloning some debug strings so they weren't dangling pointers. That's it.

Possibly. At least that's your analysis and I totally trust you that it's the intent. Sometimes for a maintainer, opening the possibility that some unsafe code is easier to use means more problems in the future, especially when trying to replace it. I don't know the details, but sometimes that can be an explanation as well.

> Every time people excuse the C maintainers' reaction to Rust it's always strawmen. Things that aren't true. Things they wish were true so they could discount and attack Rust, but which aren't, so they just pretend they are.

Languages again again again... "it's not us who started first it's them!" Please! Maybe if you tried to reach out to people to fix generic API bugs without presenting yourself as representing a language team and putting them by default in the supposed other language one they would be more confident in your motivations ? What do you have against C that you want to put everything non-rust in it and present it as a perpetual ennemy. It looks like pure politics where there's no place for facts nor technical excellence. We could go on with all the other languages in the kernel at this game.

> Well, I'm pretty confident that drm_sched is buggy because I got oops reports from users and had to debug it, multiple times.

Oh I've never questioned that, I totally trust you on that one. What I'm saying is that some maintainers might prefer to keep (for some time) bugs that are understood by them and impossible to trigger under some conditions rather than risk more complicated ones. Many of us had to run through such choices over time. They're unpleasant, they make you like the dirty coder in that corner overthere, and you know that at some point you need to address them. Sometimes you just can't have enough time on the table to deal with them, and you figure it's even harder to bring someone up to speed on them, so you're waiting for finding the best person for the task and it can take years. But such rare persons definitely don't start by placing people in boxes with a language name written on the side :-/

Thoughts and clarifications

Posted Sep 4, 2024 18:00 UTC (Wed) by daroc (editor, #160859) [Link]

> I'm starting to sense how it can be difficult to have a technical-only discussion with you...

[And several other parts on similar lines.]

Please do avoid personal attacks in the comments. I think you may be talking past each other a bit; it's easy for all the attention to end up focused on the times when things go wrong, instead of the times when things go right, and that can make it hard to be charitable. Consider leaving this thread of comments here.

Thoughts and clarifications

Posted Sep 4, 2024 19:43 UTC (Wed) by rywang014 (subscriber, #167182) [Link] (12 responses)

If a maintainer finds some seemingly trivial patch may lead to big consequences and therefore requires a second look, I guess the best response should not be "Well completely NAK" but some constructive discussions about the issue and the code the patch changes.

Thoughts and clarifications

Posted Sep 4, 2024 20:11 UTC (Wed) by wtarreau (subscriber, #51152) [Link]

I totally agree. There can be bad reason why this was not done here (80th time asked the same thing, bad mood, tired etc) but that would have at least deserved a complement later to detail the reasons.

Thoughts and clarifications

Posted Sep 4, 2024 20:26 UTC (Wed) by pbonzini (subscriber, #60935) [Link] (10 responses)

Indeed, NAK is not constructive and the maintainer acknowledged that. There was also a proposal (part tongue in cheek, part not) that replying with "NAK" would lead to complete blocking of your patches[1] until you document the reason.

[1] the patches of the guy who NAKs

Thoughts and clarifications

Posted Sep 5, 2024 0:59 UTC (Thu) by Ashton (guest, #158330) [Link] (6 responses)

Putting my management hat on, persistent NAKs is a red flag that something’s going on. Either the person in question is overloaded and cannot handle more, or they’re being uncooperative and need a stern talking to. Either way, it’s a clear sign that someone with more authority needs to step in and do something.

Thoughts and clarifications

Posted Sep 5, 2024 2:29 UTC (Thu) by viro (subscriber, #7872) [Link] (5 responses)

Charming. Could you explain who you are and where do you work? Just to make sure I never end up anywhere near you in your managerial role...

Thoughts and clarifications

Posted Sep 5, 2024 2:44 UTC (Thu) by viro (subscriber, #7872) [Link]

PS: ... and I'd love to hear that it's _not_ anywhere near airspace - not considering "no" even a theoretically valid answer, no matter what can be annoying in software, but there's annoyance and then there's what this kind of attitude had produced on that concall in January 1986...

Thoughts and clarifications

Posted Sep 5, 2024 7:23 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

I mean, he did provide a very charitable explanation (which I agree with). Even "You're doing the same thing that has already been NAKed. Please reread my comments from before and tell me what wasn't clear" is better than nacking without a comment.

I've gotten my share of stern NAKs from you, but you've always been extremely constructive and explained what I was doing wrong. But if there's no explanation whatsoever, it is not a win for anyone.

Thoughts and clarifications

Posted Sep 5, 2024 12:08 UTC (Thu) by Ashton (guest, #158330) [Link]

If you find the idea that a manager should step in and figure out why someone is NAKing without feedback offensive, then the feeling is extremely mutual.

Code rejections should come with clear actionable feedback, and it shouldn’t be the requesters job to extract that feedback from the reviewer. Even “I see this and need more time to review” is far more useful than an unexplained “no”.

This kinda behavior is totally fine if it’s an occasional thing, nobody is perfect. But if it’s common then it’s a sign something is up and someone with authority needs to step in and figure out why it’s happening.

Thoughts and clarifications

Posted Sep 5, 2024 12:41 UTC (Thu) by Wol (subscriber, #4433) [Link]

So you're quite happy with more and more straws being loaded on YOUR back?

I'm bitching slightly, I know, but this is the perfect example of people not stepping back and thinking. The GP said "something is going wrong". If your boss comes to you and asks - politely - "what's happening", and you respond "look at the size of my intray!", wouldn't you appreciate your boss going through it and saying "this is important, that's not, what do you think of the other?"

If that's what the GP meant - and I'm sure it is - he's exactly the sort of boss I would like! Yes, you're going to have to piss some people off, but at least you know your boss has got your back.

Cheers,
Wol

Thoughts and clarifications

Posted Sep 5, 2024 16:50 UTC (Thu) by rsidd (subscriber, #2582) [Link]

Ibased on that comment (and past history) I would focus on making sure viro is never in my orbit.

Sorry if this comment violates lwn standards. But I remember Linus stepping back for a bit to examine his behaviour and learning to be better. I don't think he or was by any means the worst offender and if by publicly reassessing his interactions he was hoping to set an example that others would follow, well, nice try.

Thoughts and clarifications

Posted Sep 5, 2024 3:11 UTC (Thu) by viro (subscriber, #7872) [Link]

Depends. Usually some explanation of a NAK is called for, but e.g. if it's a large series that keeps reposted with objections quietly ignored, at some point plain NAK is the only possibly reply - some kinds of persistence really should not be rewarded. And then there's Markus and similar special cases. Or trivial and obviously _in_correct patch (that overlaps with the previous group, though).

Thoughts and clarifications

Posted Sep 12, 2024 3:39 UTC (Thu) by milesrout (subscriber, #126894) [Link] (1 responses)

He did give reason! Just because Jon only quoted the NAK doesnt mean it is all he said.

Thoughts and clarifications

Posted Sep 13, 2024 14:32 UTC (Fri) by MrWim (subscriber, #47432) [Link]

I think this is what the GP is referring to:

https://lore.kernel.org/lkml/CAPM=9txcC9+ZePA5onJxtQr+nBe...

Quote Dave Airlie:

> The next NAK I see on the list will mean I block all patches from the
> sender until they write a documentation patch, because seriously this
> stuff is too hard for someone to just keep it in their head and expect
> everyone else to understand from reading the code.

So it's not about documenting the reasons for a given NAK, it's about adding documentation to the drm_sched code actually describing how it works and how it can be used safely.

Thoughts and clarifications

Posted Sep 5, 2024 0:22 UTC (Thu) by Ashton (guest, #158330) [Link] (3 responses)

> I'm starting to sense how it can be difficult to have a technical-only discussion with you...

No definition of politeness has ever required that someone not point out falsehoods. If having someone point out that what you’re saying is contrary to the written record is “difficult” then that’s your issue.

Thoughts and clarifications

Posted Sep 5, 2024 13:44 UTC (Thu) by Wol (subscriber, #4433) [Link] (2 responses)

I have to point out though, that "truth" often depends on your viewpoint.

Case in point - journalists are ever eager to demand "scientific proof positive". THERE IS NO SUCH THING. And when two people are talking past each other, it's pretty obvious they have two different (and they could both be right!) definitions of truth.

When two logical, rational people disagree, it's a pretty safe bet they don't share the same set of facts. That appears especially true in this case.

Cheers,
Wol

Thoughts and clarifications

Posted Sep 5, 2024 13:57 UTC (Thu) by pizza (subscriber, #46) [Link]

> When two logical, rational people disagree, it's a pretty safe bet they don't share the same set of facts. That appears especially true in this case.

I disagree (heh).

Two logical, rational people can easily share the same set of underlying facts, but disagree about how how much weight each individual fact should carry in any given decision.

(This is epitomized by the expression "Fast, cheap, good -- pick two")

Thoughts and clarifications

Posted Sep 22, 2024 17:40 UTC (Sun) by Rudd-O (guest, #61155) [Link]

Oftentimes yes, and you are completely correct about that.

However, the two controversies that I am seeing clearly here do not (at least to me) seem to be matter of opinion.

The technical controversy arising from that patch that fixed the circularity issue in the ownership of objects under DRM seems quite clear cut to me. Circularity is almost always bad design; and having some object hold a reference to another which holds a reference to the first (for printk()'s sake), which then prevents the first object from freeing the other or the other from freeing the first safely? Bad. Asahi's fix should have gone in without question, rather than invoke curt NAK.

The more social or human controversy about the animosity that Rust developers are getting from some kernel developers that do not want or do not appreciate some strictures that (they perceive) Rust is imposing on them? Well, you could argue it could go either way, but we have video. We have video of a rust developer explaining how the type system prevents certain classes of errors to an audience of C file system kernel developers. And one of the kernel developers interrupts him, doesn't let him finish and accuses him of trying to spread the Rust religion or impose it on the kernel space. This is comically easy to judge.

No one in the Rust for Linux project is trying to rewrite the kernel into Rust or trying to make everybody ditch C and learn Rust. (I would if I had a magic wand, but I don't). What I have seen indicates to me that Rust for Linux developers are discovering deficiencies in existing subsystems, and structures, and algorithms, and drivers, and APIs, and have been trying to fix those deficiencies with the best of intentions, and also have been trying to help the C developers see that there is a way that they can prevent having those deficiencies. And not everybody, not the majority, not many, But sadly a few kernel developers have reacted in a negative and destructive way to this effort.

A computer language that is more rigid than another has its drawbacks. And sometimes these drawbacks are very serious. But one thing that is not a drawback is with that language and its strictures help you find issues that were previously unseen before. Folks who are exposed to these previously unseen issues should be thankful that these issues have been brought to light because now that they can be seen and addressed, they can be fixed. Can you make bad APIs with Rust? Absolutely. Is it easier to make bad APIs with Rust than with C? struct void* struct {}.

Thoughts and clarifications

Posted Sep 6, 2024 5:36 UTC (Fri) by marcH (subscriber, #57642) [Link] (1 responses)

> > Of all the Rust abstractions I've written drm_sched was the exception, the one case where the existing design was so bad I really needed to improve it. Those changes were completely non-intrusive, the patch wasn't even 50 lines of code and did not change the API for existing drivers in any way.

> Possibly, I don't know. But it's not the size of the change that matters when you're getting close to the core, it's the impacts and what the change reveals. You said yourself that you've found deep problems with the current API and that it's totally unsafe even for C. Did you ever think that the maintainer himself probably doesn't trust this code at all anymore and is not willing to permit more code to get more intimate with it ?

I see this as the core issue and key contradiction here. On one hand the design is "very bad and incredibly complicated and brittle", but on the other hand you submitted a "small, non-intrusive patch that barely changes anything". Mmmmm... anyone with a little bit of experience now how this song _usually_ ends; I won't repeat wtarreau.

You look like one of those exceptional developers who are indeed capable of playing that sort of Jenga successfully. But it's not enough to be correct: building up the corresponding _trust_ unfortunately requires a massive amount of explanations, demonstrations, demo / test code and generally: time that you said you don't have. While the Rust versus C "racisms" do not help, I also suspect it would not be so different if you removed Rust from the picture. We've seen that story many times before - in a single language.

If Linux gets a new, generic scheduler entirely in safer Rust code then we all win? A complete rewrite is also a typical next chapter after "it's very bad and too complicated" - again even with a single language. Cause it's... faster.

Thoughts and clarifications

Posted Sep 6, 2024 12:37 UTC (Fri) by daenzer (subscriber, #7050) [Link]

Very well put!

Thoughts and clarifications

Posted Sep 22, 2024 17:24 UTC (Sun) by Rudd-O (guest, #61155) [Link]

> because the code is the doc when it comes to APIs,

Quite ironically, this is exactly what the rust for Linux people are trying to do. And it is not exaggeration to say that exactly that, wanting the code to document the behaviors of the API is what got Wedson interrupted by Ted.

Put the behavior of the API in the types. That is all what the Rust people are asking for. Nothing more. Actually, they're asking for something that's even less than that. They're asking for cooperation from the C part of the kernel so that they can do that work of putting the meaning of the API and the behaviors of the API into the type system so that the system can be safe and sound.

To me this controversy does sound like a few kernel developers, not everybody, not the majority, but a few, are seriously looking at a gift horse in the mouth. Maybe they just don't want the horse? And as a result, nobody else gets the pony we want.

Now I'm not a kernel developer, I'm simply a roast programmer, a fairly new one, but I do know, having used Linux since 1996, that I want superior kernel code quality and stability and fewer security issues. We are never getting there, unless we move on from C, at least for the critical parts of the system. And I frankly do not care if it is in Rust or in another language, But if we are going to jump out of a puddle and not fall into another, it has to be a type safe and sound language that protects against the sort of problems that see causes. There is this precious opportunity right now to take that chance by using Rust. I hope we don't squander the opportunity and have to wait 10 more years until either everybody is using a different operating system, or someone invents a superhuman AI that can type, check, see, and its runtime properties properly, or Linux has finally decided that it's going to incorporate sound and type safe programming tooling beyond C + glorified linters.

Thoughts and clarifications

Posted Sep 4, 2024 18:50 UTC (Wed) by mb (subscriber, #50428) [Link]

Lina, I just want to say thanks to you. Thanks for the work you did and hopefully still plan to do.

Be assured that there are many people on your side of the discussion.
Even if they are more silent than the verbose people who don't admit that they are wrong.

Thoughts and clarifications

Posted Sep 4, 2024 18:13 UTC (Wed) by Deleted user 129183 (guest, #129183) [Link] (20 responses)

> They'd stay 3 years for the time it takes to transform a forest into a desert, then they suddenly quit without warning because "it's impossible to work with a team that constantly rejects my art"

This observation is very on point, since this is what has recently *actually* happened in relation to the ’Rust for Linux’ project. If anyone does not remember:

https://lwn.net/Articles/987635/

Even the time frame is largely accurate, lol.

Thoughts and clarifications

Posted Sep 4, 2024 18:25 UTC (Wed) by daroc (editor, #160859) [Link] (19 responses)

I think that's somewhat unkind, given that one of more than thirty Rust-for-Linux developers (looking only at people with changes in the rust folder in the past year) chose to leave the project. Working on open source software can be hard for everyone.

Thoughts and clarifications

Posted Sep 4, 2024 18:45 UTC (Wed) by Deleted user 129183 (guest, #129183) [Link] (18 responses)

> I think that's somewhat unkind, given that one of more than thirty Rust-for-Linux developers (looking only at people with changes in the rust folder in the past year) chose to leave the project.

So far. But reading such articles like the one above, I think that more people will likewise resign in the near future. Especially since it seems that the Rust culture (that likes to ‘move fast and break things’) is a poor match for the Linux culture (where even the very important changes can take more than two years to be done).

Thoughts and clarifications

Posted Sep 4, 2024 18:48 UTC (Wed) by corbet (editor, #1) [Link] (12 responses)

The Rust folks have neither moved fast nor broken things. This kind of comment is not helpful.

Thoughts and clarifications

Posted Sep 4, 2024 19:45 UTC (Wed) by khim (subscriber, #9252) [Link] (10 responses)

Technically Rust culture is releasing new, incompatible, versions of crates very often.

That's what C++ folks call “move fast and break things”, but compared to Linux development that's actually “move slow”, because breaking changes in Rust crates are still coordinated and pre-announced, while Linux-internal APIs sometimes are broken without warnings and it's not even possible to use code that is designed against old API while in Rust linking in two incompatible versions of crates into one binary is allowed and supported case.

Rust developers are often liking to use latest version of Rust compiler, while Linux kernel is extremely conservative in regard to use of new features of gcc or clang, but that's much less pressing concern: kernel is known to to include optional features that even require use of pre-release versions of gcc or clang if they are worth it.

Thoughts and clarifications

Posted Sep 4, 2024 19:58 UTC (Wed) by mb (subscriber, #50428) [Link] (8 responses)

>Technically Rust culture is releasing new, incompatible, versions of crates very often.

Well, you could not be more wrong than that.
The opposite of what you say is true.

Crate maintainers almost all care deeply about compatibility and use Semver to express that.
Breaking changes are not frequent in most crates and if breaking Semvers are released, it's often trivial to upgrade. Breaking changes are not frequently done for most crates.

Yes, there are some crates that update and break often. But saying that this is "the Rust culture" is just plain wrong and shows more about your experience with the Rust community than the Rust community itself.

This is all the complete opposite of the C/C++ universe, where a commonly agreed versioning scheme does not exist, everybody does versioning differently.
The kernel is a prime example of not having a stable internal API and breaking things all the time.

Thoughts and clarifications

Posted Sep 4, 2024 20:29 UTC (Wed) by khim (subscriber, #9252) [Link] (7 responses)

> The opposite of what you say is true.

Seriously? Even rand, a very narrow crate that you first encounter in a Rust Book have 9 versions. Clap (that you find in many other tutorials) have 13 versions, ndarray have 16 and so on.

That's quite a lot, compared to many other languages where you may find 3-4 versions released over decade instead of 10-15, and where every release is “a big deal”™.

> Breaking changes are not frequent in most crates and if breaking Semvers are released, it's often trivial to upgrade.

Sure, but that doesn't change the fact that changes are breaking and support for old version is, often, immediately dropped when new version is released.

As I have said: it's still better than how in-kernel APIs are treated, but that's unusual from POV of Java or even C++ developers.

> But saying that this is "the Rust culture" is just plain wrong and shows more about your experience with the Rust community than the Rust community itself.

Can you show me any Rust apps that doesn't rely on these crates that have dozen releases or more?

> This is all the complete opposite of the C/C++ universe, where a commonly agreed versioning scheme does not exist, everybody does versioning differently.

True, but how many C++ libraries that have more than dozen incompatible releases can you name? They exist, sure, but how common they are?

Qt had fewer incompatible releases in ⅓ century than rand in 10 years! And if you compare size of API that Qt offers to what rand offers… difference is even more stricking.

Thoughts and clarifications

Posted Sep 4, 2024 21:18 UTC (Wed) by mb (subscriber, #50428) [Link] (6 responses)

>Seriously? Even rand, a very narrow crate that you first encounter in a Rust Book have 9 versions

The latest version 0.8.x is supported and compatible since more than three years.

>Clap (that you find in many other tutorials) have 13 versions,

version 4 is compatible since two years.

>That's quite a lot

No, it's not. The criteria for bumping the major are completely different compared to almost all major C libraries.
Even extremely small theoretical breakages cause a major bump.

>where every release is “a big deal”™.

It's not.

>Sure, but that doesn't change the fact that changes are breaking and support for old version is, often, immediately dropped when new version is released.

So? That's exactly the same for basically every Open Source software out there.
There are only very few projects providing long term support of old versions.

And nobody stops you from supporting your favorite old "rand".

You are asking for long term support that you get nowhere else.

>Can you show me any Rust apps that doesn't rely on these crates that have dozen releases or more?

The times any build broke in the whole time I used Rust is in the low single digits. I think it's two or three times.
Updates are extremely smooth.

>Qt had fewer incompatible releases in ⅓ century than rand in 10 years!

Oh. So you also don't have any experience with Qt major upgrades.

Great. Let me explain it to you: Most of the major Qt version bumps require massive changes in the project.
Whereas most of the crate major version bumps just work with little to no change.

The number of major versions does not mean anything, if you accumulate the changes until an extremely loud big bang release.
It can even be argued that a big release every 5 years is worse than small incremental changes every year.

Thoughts and clarifications

Posted Sep 4, 2024 22:26 UTC (Wed) by khim (subscriber, #9252) [Link]

> It can even be argued that a big release every 5 years is worse than small incremental changes every year.

That's different question, though. It's question of whether “move fast and break things” approach is better than alternatives.

> The latest version 0.8.x is supported and compatible since more than three years.

While similar C++ facility had no breaking changes ever. But was extended few times.

> You are asking for long term support that you get nowhere else.

I had it going for ten years with Python2, Java8 and many other tools, sorry.

> Great. Let me explain it to you: Most of the major Qt version bumps require massive changes in the project.
Whereas most of the crate major version bumps just work with little to no change.

Sure, but that, again, discusses virtues of “move fast and break things” approach versus “keep it stable as long as you can, then do a massive break when you can not anymore” approach.

I think nowadays “move fast and break things” approach becomes more and more popular (and as I have pointed out and you repeated that's how kernel manages internal APIs, too).

But that doesn't change the fact that it's different approach from what many other languages, especially “enterprise” ones, practise (or practised).

Lots of projects, these days, go overboard with “move fast and break things”, though. At least temporarily. Although they eventually change their approach AFAICS: even flagship of that approach, Abseil, these days offers more Rust-like approach with compatible releases every half-year or so. They proudly proclaim them LTS, which, of course, sounds ridiculous since they are only supported for one year, but still… it's closer to what Rust does then to either “everyone should just live on HEAD” or “breaking changes should happen once per half-century” extremes.

Thoughts and clarifications

Posted Sep 5, 2024 3:08 UTC (Thu) by legoktm (subscriber, #111994) [Link] (1 responses)

I agree with mb, "move fast and break things" is not at all how I would describe the attitude of the Rust community. I think people are very intentional about not breaking things and as a result, take a very literal stance with what is a breaking change (see e.g. cargo-semver-checks).

I'd also say that people care a lot about good API design, and as a result iterate (with breaking changes) until they reach 1.0 and then intend to keep it stable forever like serde. If I had to complain about something it's probably that people are, in my opinion, too perfectionist, and don't declare the 1.0 despite their crate being stable. (Of course, I'm also guilty of this in my own crates.)

Thoughts and clarifications

Posted Sep 5, 2024 7:11 UTC (Thu) by khim (subscriber, #9252) [Link]

> If I had to complain about something it's probably that people are, in my opinion, too perfectionist, and don't declare the 1.0 despite their crate being stable.

Indeed, lots of very basic crates are stuck forever at version zero, even such basic crates as libc

> I'd also say that people care a lot about good API design, and as a result iterate (with breaking changes) until they reach 1.0 and then intend to keep it stable forever like serde.

This may be, very well, their intent (and in some rare cases, like with Rust compiler itself, even actual accoplishment), but that's not what developer have to deal with. In the absence of that mythical version 1.0 crate people are forced to use what they have available. And what they have available is, very often, not that bad! For all practical purposes, in a Rust world, version 1.0 is just a number: if crate is version zero crate then minor number work like major number for crates after version 1.0. And it's not as if breaks stops after version 1.0: syn is at version 2.0, clap is at version version 4.5, etc.

And cargo-semver-checks is certainly not unique, that's Rust version of abidiff, essentially.

And it maybe even true that radical and drastic breakages every dozen of years may be harder to deal with than regular and frequent yet minor breakages, but that doesn't change the fundamental approach to how Rust community operates: while many developers dream of replicating the Rust compiler feat of breaking things and moving fast in the beginning while reaching eventual stability, after which development still advances but at glacial speed, but often they only manage to achieve the first part. That's still more honest and better than many C libraries that proclaim to release compatible versions which in reality break programs, but one couldn't claim your are not breaking things if you routinely release new, incompatible, versions while simultaneously stop supporting old versions.

Thoughts and clarifications

Posted Sep 5, 2024 10:01 UTC (Thu) by LtWorf (subscriber, #124958) [Link] (2 responses)

> Most of the major Qt version bumps require massive changes in the project.

I maintain a few Qt projects. Since when are massive changes needed to bump? That is not my experience at all.

Thoughts and clarifications

Posted Sep 5, 2024 10:09 UTC (Thu) by mb (subscriber, #50428) [Link] (1 responses)

>Since when are massive changes needed to bump?

2 to 3, 3 to 4 and 5 to 6 were pretty massive changes in my projects.
That only leaves 4 to 5 as a small upgrade with small changes for me.

Qt major version upgrades

Posted Sep 8, 2024 8:53 UTC (Sun) by chris_se (subscriber, #99706) [Link]

What in 5 to 6 was a massive change that actually caused pain? 5 to 6 was extremely painless in my experience, even less so than 4 to 5 (which was already fine). 3 to 4 was a huge pain though.

Thoughts and clarifications

Posted Sep 22, 2024 17:46 UTC (Sun) by Rudd-O (guest, #61155) [Link]

> Technically Rust culture is releasing new, incompatible, versions of crates very often.

If you are a Rust programmer, you are not forced to upgrade to the latest and greatest crate. You could just keep using the old crate, it's still published, you can still download it, and unless you have security issue, there's no issue for you. You can even use multiple versions of the same crate in the same project, and it just works. This is the opposite of move fast and break things. It is rather move fast, keep old things working the way they were.

Moreover, kernel developers don't gratuitously use crates like your comment would seem to imply. The vast majority of crates are simply unusable in the kernel because they depend on the standard library. And the standard library cannot be linked into the kernel because the standard library has some requirements regarding memory allocation that cannot be fulfilled by the kernel.

I find it absolutely amazing that you can add a few YAML lines to your GitHub project and there's an entire computer network that will automatically upgrade all of your crates in your Rust project that's on GitHub. And then subsequently your CI fires and everything is tested so that you know all the upgrade didn't break anything. I used that all the time. But this is absolutely unrepresentative of how kernel development with Rust code is done. Maybe someday that will be the case. Maybe in 25 years. We're not even close to that. We need to get even a few crates going in the kernel before that's even a concern in anyone's radar.

If anything, Rust in the kernel is actually move slow. And if we are to conclude anything from the rust for Linux developers' contributions to the Linux kernel, it has been move slow and fix other people's things.

Thoughts and clarifications

Posted Sep 4, 2024 19:48 UTC (Wed) by corbet (editor, #1) [Link]

I should clarify that I was talking about the behavior of the Rust developers in the kernel project. I'm taking no position on all proponents of any language.

Thoughts and clarifications

Posted Sep 5, 2024 0:51 UTC (Thu) by Ashton (guest, #158330) [Link]

Rust culture likes to “move fast and break things”? I am genuinely baffled how you came to this conclusion, it is the exact opposite of what I see.

The most recent drama was about some C developers asserting that they will break things and not even inform the rust developers.

Thoughts and clarifications

Posted Sep 5, 2024 10:56 UTC (Thu) by agateau (subscriber, #57569) [Link] (1 responses)

> Rust culture (that likes to ‘move fast and break things’)

There is a difference between A) breaking things unannounced and B) breaking things by bumping the major version of your project.

In my experience it's much more common in the Rust ecosystem to go with B than with A. And B is usually not a problem in that dependent projects are unlikely to hit unexpected build breakages. My experience in other ecosystems is very different...

Thoughts and clarifications

Posted Sep 5, 2024 12:13 UTC (Thu) by Ashton (guest, #158330) [Link]

Also, the discussion should be about how the rust for Linux people are behaving, not rust developers in general. Different sub-groups of a language committee can and do develop different attitudes and norms around things, especially stuff like versioning, dependencies, and backwards compatibility.

In the abstract if someone asserted that a major, risk sensitive project in a language took a much more conservative approach to dependencies and change than the average user of the same language I would be utterly unsurprised.

Thoughts and clarifications

Posted Sep 5, 2024 19:38 UTC (Thu) by MarcB (guest, #101804) [Link]

> So far. But reading such articles like the one above, I think that more people will likewise resign in the near future. Especially since it seems that the Rust culture (that likes to ‘move fast and break things’) ...

Where is this coming from?! "Moving fast and breaking things" is basically the least fitting description of "Rust culture" (whatever that may be).

Thoughts and clarifications

Posted Sep 22, 2024 17:41 UTC (Sun) by Rudd-O (guest, #61155) [Link]

> Especially since it seems that the Rust culture (that likes to ‘move fast and break things’)

🤣 where would anyone get that opinion from? Honest question!

Thoughts and clarifications

Posted Sep 14, 2024 15:20 UTC (Sat) by sunshowers (guest, #170655) [Link] (57 responses)

> Changing something that works is extremely difficult. Changing it without breaking it is even more difficult. Changing it in a way that guarantees that it will still be possible to change it later is the most difficult. This requires experience and honest cooperation, not bold accusations nor criticisms of everything.

This is a very C mindset, one that centers fear.

Thoughts and clarifications

Posted Sep 14, 2024 15:39 UTC (Sat) by Wol (subscriber, #4433) [Link] (56 responses)

> This is a very C mindset, one that centers fear.

It's called "technical debt".

Most people don't like maintenance, for PRECISELY that reason - it's a slog.

And idiots who aren't afraid of breaking things are people who cause thousands of flights to be cancelled, credit card payments to stop working, etc etc. It's not a C mindset, it's the natural mindset of older people who've seen (and been hurt by) the consequences of young inexperienced people cocking up.

I'm lucky - I messed up very early in my career, and ever since while I'm quite happy to plough ahead and break things, I've always been conscious of the fact that breakage needs to be avoided if possible.

Cheers,
Wol

Thoughts and clarifications

Posted Sep 14, 2024 23:16 UTC (Sat) by sunshowers (guest, #170655) [Link] (55 responses)

Right. The C mindset is that "I'm afraid to make changes" is the end of the conversation. I get it, having maintained C before. Every line of C you write or modify, your hairs are probably standing on end.

The Rust mindset is that "I'm afraid to make changes" is the start of the conversation. It's reasonable to be concerned of making changes, but how do you make it as easy as possible? Encoding lifetimes into the type system, having a separation between shared and mutable access, going all-in on encapsulation, etc.

Thoughts and clarifications

Posted Sep 14, 2024 23:46 UTC (Sat) by viro (subscriber, #7872) [Link] (54 responses)

>Right. The C mindset is that "I'm afraid to make changes" is the end of the conversation. I get it, having maintained C before. Every line of C you write or modify, your hairs are probably standing on end.

Not to discourage your noble efforts, but could you possibly aspire to somewhat higher quality of trolling? There are some standards to language holy wars, and your contribution is... falling short, to put it very mildly. There is a lot of examples of that genre available for study - search the comp.lang.* archives and you'll find really impressive ones. If you must use chatgpt, at least train it on good examples...

Overall: D-.

Thoughts and clarifications

Posted Sep 15, 2024 0:57 UTC (Sun) by intelfx (subscriber, #130118) [Link] (2 responses)

> could you possibly aspire to somewhat higher quality of trolling?

Well, could you?

Accusing people who happen to hold an opinion you disagree with of trolling to silence or discredit them is so <insert a timestamp well in the past>.

Thoughts and clarifications

Posted Sep 15, 2024 4:13 UTC (Sun) by viro (subscriber, #7872) [Link] (1 responses)

> Accusing people who happen to hold an opinion you disagree with of trolling to silence or discredit them [...]

Huh? Why would I want to silence them? And what does opinion being claimed (nevermind "held") have to do with anything?

Language holy war is an art form. When done right, it can be subtle and highly amusing to watch, but that kind of move is just plain wrong at this stage. Overwrought rhetoric in the first part would be about right for a retort deep in a subthread that has already devolved into a pure exchange of insults; here it's in the wrong place. And appending to that a paragraph of stock praises to $OBJECT_OF_WORSHIP is always a faux pas, especially when execution is so uninspiring - stylistic mismatch is awful.

Objections had been about the style, not the "contents"; I thought I made that very clear, but apparently that didn't come through
well enough. As for the alleged contents... do we really need to discuss that, starting with the equivalent of "I ate a fruit and I know how awful do they taste"? Not to mention the expression "$LANGUAGE mindset", which is a shining example of the same fallacy...

Language is not an identity. It's a tool. "$X is written in C" covers a huge range of styles/degrees of cleanliness/etc. So does "$X is written in Rust"; those ranges overlap a whole lot and as for the factors in cost of modifications... C vs Rust is really, really minor compared to the variability among C programs and variability among Rust ones. I will not insult the poster by assuming they are too ignorant to realize that, and that's precisely what taking the first part at the face value would imply.

I'm all for taking the piss out of self-righteous cretins who blather about immense superiority/inferiority of languages; as I said, language holy wars can be highly amusing, especially if aforementioned cretins get maneuvered into exposing their ignorance in their $OBJECT_OF_WORSHIP. As long as editors' requests to stop a subthread that goes in direction unacceptable for lwn.net get promptly honoured, I see no problem with that. But for pity sake, do that in style...

Thoughts and clarifications

Posted Sep 15, 2024 12:32 UTC (Sun) by sunshowers (guest, #170655) [Link]

> C vs Rust is really, really minor compared to the variability among C programs and variability among Rust ones.

I understand where you're coming from — folks have been burned by the promise of so many languages in the past — but this is not true in the case of Rust specifically. Using a language which directly tackles mutability xor sharing makes code just fundamentally better and more correct. Rust programs are consistently higher quality than C ones.

Thoughts and clarifications

Posted Sep 15, 2024 2:17 UTC (Sun) by sunshowers (guest, #170655) [Link] (50 responses)

I very sincerely believe in everything I said, based on many years of C and Rust experience.

Thoughts and clarifications

Posted Sep 15, 2024 4:56 UTC (Sun) by viro (subscriber, #7872) [Link] (49 responses)

In my experience the costs of modification in C codebases vary so much that any universal statements regarding those costs are flat-out unbelievable.

Thoughts and clarifications

Posted Sep 15, 2024 5:52 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (48 responses)

While the costs vary, I have yet to see even a moderately complex C codebase where refactorings are _easy_.

Thoughts and clarifications

Posted Sep 15, 2024 8:41 UTC (Sun) by Wol (subscriber, #4433) [Link] (47 responses)

Have you ever seen a moderately complex *RUST* codebase that is easy to refactor? I would have thought the phrase "moderately complex" was enough to make it clear *any* codebase would be hard to refactor.

The thing is, how easy is it to "write what you mean"? A language that makes it easy to express complex requirements in simple language just pushes a "complex codebase" further down the road before you hit it. Rust comes over as that sort of language, but I've never used so I wouldn't know.

Cheers,
Wol

Thoughts and clarifications

Posted Sep 15, 2024 11:37 UTC (Sun) by pizza (subscriber, #46) [Link] (44 responses)

> Have you ever seen a moderately complex *RUST* codebase that is easy to refactor? I would have thought the phrase "moderately complex" was enough to make it clear *any* codebase would be hard to refactor.

Rust (and codebases using it) haven't really matured (ie been around long enough) to reach this critical point.

All of this emphasis on "specifying it right" is all well and good, but... the definition of "right" changes over time (sometimes drastically) along with the requirements.

Linux has historically placed great emphasis on (and heavily leaned into) internal interfaces and structures being freely malleable, but the assertion of "just change the definition and your job is done when the compiler stops complaining" is laughably naive.

"Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth

Thoughts and clarifications

Posted Sep 15, 2024 11:54 UTC (Sun) by Wol (subscriber, #4433) [Link]

> All of this emphasis on "specifying it right" is all well and good, but... the definition of "right" changes over time (sometimes drastically) along with the requirements.

Banging on again, but with a state table you can (and should) address all possible options. Some things are hard to specify that way, some things you personally don't need to address, but if you have three possible boolean state variables, then you have eight possible states. If you only recognise five, and your solution precludes solving one of the other three, then your code will need replacing. If you can't be bothered to address the other three, but your code is designed to make it easy for someone to come along later and add it, then that's good programming.

A good logical (mathematical?) spec/proof should point out all the possible "wrong" paths so if they become a right path they're easily fixed.

> "Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth

The real world is like that :-)

Cheers,
Wol

Thoughts and clarifications

Posted Sep 15, 2024 12:19 UTC (Sun) by asahilina (subscriber, #166071) [Link] (10 responses)

> the assertion of "just change the definition and your job is done when the compiler stops complaining" is laughably naive.

My experience having gone through several major refactorings of the drm/asahi driver is that it is correct. Some of the bigger ones are:

- Going from a "demo" fully blocking implementation (the first one we shipped) to implementing queues and scheduling and asynchronous work
- Dropping in GPUVM and fully implementing VM_BIND, which along the way ended up changing how non-GPUVM kernel objects are managed including making changes to the heap allocators [1]

It is hard to explain just how liberating it is to be able to refactor and rearrange non-trivial things in the code, fix all the compiler errors, and then end up with working code more often than not. Sure, if you're unlucky you might run into a logic error or a deadlock or something... but with C it's almost impossible to escape adding a new handful of memory safety errors and new flakiness and bugs, every time you make any non-trivial change to the structure of the code.

This is true even when you're interfacing with C code, as long as its API is documented or can be easily understood. The GPUVM change involved first writing abstractions for the underlying C code. That API is reasonably nice and well documented, so it wasn't hard to get the Rust abstraction right (with some care) [2], and then when it came to dropping it into the Rust driver, everything just worked.

Most people don't believe this until they actually start working with Rust on larger projects. All the Rust evangelism isn't just zealotry. There really is something magical about it, even if it might be overstated sometimes.

[1] https://github.com/AsahiLinux/linux/commit/93b390cce8a303...
[2] https://github.com/AsahiLinux/linux/commit/e3012f87bf98c0...

Thoughts and clarifications

Posted Sep 16, 2024 9:05 UTC (Mon) by Wol (subscriber, #4433) [Link] (4 responses)

> > the assertion of "just change the definition and your job is done when the compiler stops complaining" is laughably naive.

> My experience having gone through several major refactorings of the drm/asahi driver is that it is correct. Some of the bigger ones are:

Out of curiosity, would you describe *your* codebase as complex? Or would you say "my code is simple because Rust handles the complexity for me"?

Or even "the driver problem itself is fairly simple, and Rust just makes it easy to express it"? (Put another way, "C makes the problem a lot more complicated than it should be"!)

Cheers,
Wol

Thoughts and clarifications

Posted Sep 16, 2024 9:48 UTC (Mon) by asahilina (subscriber, #166071) [Link] (3 responses)

Hmm... there are a few dimensions here.

I would say the driver has medium complexity for a GPU driver (in terms of line count it's almost 4x drm/panfrost and around the same as the GPU part of drm/msm). Rust doesn't directly reduce complexity (the driver has to do what it has to do), but it does handle a lot of error-prone boilerplate for you (for example enforced RAII) and it strongly encourages design that makes it easier to reason about the complexity (separation of concerns/encapsulation). So Rust makes it easier to maintain the complexity, understand it, and avoid bugs caused by it. I'm a lot more comfortable dealing with complex code in Rust than in C.

Then, there are some aspects where Rust is specifically a very good fit for this particular GPU driver. One of them is using Rust proc macro magic to describe multiple firmware version and GPU generation interfaces (the firmware interface is not stable) in a single implementation, as cleanly as possible. To do the same thing in C you either end up duplicating all the code, or using ugly preprocessor or build system tricks (drm/apple in our tree is a C driver that has to do this, and it's not pretty. Rust would be a good fit for a rewrite of that driver too for multiple reasons, but we need DRM KMS bindings first). The other one is (ab)using Rust lifetimes to represent GPU firmware interface lifetimes, which makes handling the firmware interface much less error-prone (and this is critical, because an error crashes the GPU firmware irrecoverably). So Rust helps with those more specific kinds of complexity.

At the end of the day it all really boils down to Rust benefiting from decades of programming experience and history in its design. C was designed at a time when programs were a tiny fraction of the size they are today. The entire 1983 UNIX kernel had around the same line count in C as my drm/asahi driver does in Rust. Linux is more than a thousand times more code today, and it shouldn't be a surprise that a programming language designed for codebases 1000x smaller might not be the best option these days. We have learned a lot since then about how to manage complexity, and Rust takes a lot of that and applies it to the kind of systems language that is suitable for writing kernels.

Thoughts and clarifications

Posted Sep 16, 2024 11:16 UTC (Mon) by Wol (subscriber, #4433) [Link] (2 responses)

> Rust doesn't directly reduce complexity (the driver has to do what it has to do), but it does handle a lot of error-prone boilerplate for you (for example enforced RAII) and it strongly encourages design that makes it easier to reason about the complexity (separation of concerns/encapsulation). So Rust makes it easier to maintain the complexity, understand it, and avoid bugs caused by it. I'm a lot more comfortable dealing with complex code in Rust than in C.

So in other words "It's a complex problem, but Rust makes it simple to express that complexity"?

I'm just trying to get a handle on where Rust lies on the problem-complexity / language-complexity graph. I'll upset Jon with this, but I hate Relational/SQL because imnsho Relational lies too far on the simplicity side of the graph, so SQL has to lie way over on the complexity side. So in terms of Einstein's "make things as simple as possible, but no simpler", Relational/SQL lies way above the local minimum. Rustaceans probably feel the same is true of C and modern hardware.

Do you feel Rust lies close to the sweet spot of minimal possible complexity? It certainly comes over you find it easy to express the complexity of the hardware.

Cheers,
Wol

Thoughts and clarifications

Posted Sep 16, 2024 11:57 UTC (Mon) by jake (editor, #205) [Link] (1 responses)

> I'll upset Jon with this

Wol, it's more than just Jon who is tired of you bringing up this stuff in every thread, often multiple times, when it is not particularly relevant. Your comments are voluminous, people have complained to you about that and the content of your posts in comments here, and you are one of the most filtered commenters we have. I think you should perhaps reduce your volume of comments and try to ensure that the drum you are banging does not come up everywhere.

just fyi,

jake

Thoughts and clarifications

Posted Sep 16, 2024 12:21 UTC (Mon) by paulj (subscriber, #341) [Link]

Perhaps putting some stats on "ranking by volume of comments (over the site in last X period, for X in {a, b, c} time|this story)" for a user to that user somewhere would help softly nudge people, where needed, on a self-educating basis?

Thoughts and clarifications

Posted Sep 22, 2024 18:05 UTC (Sun) by Rudd-O (guest, #61155) [Link] (4 responses)

I have reason to believe that you are talking to people who have never seen a match statement in their lives. And so they're used to knowing that when they make a change somewhere in the code, somewhere else very, very far away, and if or case select statement no longer matches that condition that you just added to the code, and therefore things break spectacularly at runtime.

That lack of experience is why they continue issuing the truism that refactoring is "very difficult" and you don't really know when you're changing code if something else is going to break. They haven't gotten the compiler to yell at them, "you're missing this case", because they have never experienced it. Reflectoring is super easy when the computer is doing the thinking of myriad otherwise irrelevant trivialities for you!

There really is something magical about it. And to try and explain to people that haven't seen that, quote, magic, is almost impossible. It's like trying to explain electricity to someone from the 1600s. And it is equally frustrating. In fact, it is doubly frustrating because unlike electricity in the 1600s, this is something that is very easy to witness, you just have to read a little code and push a button in a webpage and you can see it. And they just refuse. It is so oddly disconcerting.

Thoughts and clarifications

Posted Sep 22, 2024 18:29 UTC (Sun) by pizza (subscriber, #46) [Link] (3 responses)

> you just have to read a little code and push a button in a webpage and you can see it. And they just refuse. It is so oddly disconcerting.

$ sloccount `find projdir -name *.[ch]`
[...]
Total Physical Source Lines of Code (SLOC) = 2,278,858

Call me naive, but "read a little code and push a button on a web page" isn't going to cut it.

Thoughts and clarifications

Posted Sep 25, 2024 9:44 UTC (Wed) by Rudd-O (guest, #61155) [Link] (2 responses)

That's a lot of code.

To get back to the topic (common pitfalls of refactoring and how Rust helps avoid errors):

Can you articulate what a match statement does, and how it behaves when you add a new case somewhere very far away from the match statement? How is it different from, say, a chain of if/else or a select case? If your codebase was (hypothetically) Rust, what would the compiler say to such a change, versus what the C compiler says today?

My intention is to figure out if you have had a chance to compare both C and Rust in order to form an honest, informed opinion.

Thanks in advance.

Perhaps that is far enough

Posted Sep 25, 2024 9:48 UTC (Wed) by corbet (editor, #1) [Link] (1 responses)

It is my suspicion that this conversation will go nowhere useful after this point. Perhaps it's best to stop it here?

Perhaps that is far enough

Posted Sep 25, 2024 10:01 UTC (Wed) by Rudd-O (guest, #61155) [Link]

Sure. Have a nice day.

Thoughts and clarifications

Posted Sep 15, 2024 12:27 UTC (Sun) by sunshowers (guest, #170655) [Link]

> the assertion of "just change the definition and your job is done when the compiler stops complaining" is laughably naive.

It's almost completely true, though.

Think of the type system as a set of proofs you need to provide to the compiler for it to accept your code. To the extent that you can encode your program's properties into the type system, by the time your code compiles you have proven that those properties hold. Mathematically, rock-solid proven.

You can't encode 100% of your properties into the program (for example you can't totally prove linearity, since Rust has an affine system i.e. you can drop values on the floor), but you can get very far.

Thoughts and clarifications

Posted Sep 15, 2024 13:17 UTC (Sun) by mb (subscriber, #50428) [Link] (10 responses)

>but the assertion of "just change the definition and your job is done when the compiler stops complaining"
>is laughably naive.

Yes, it's extremely hard to believe.
And yes, it is an oversimplification.

But it is in fact true, to some extent.

Think of it as being the opposite of what Python does.
In Python code it is extremely difficult and sometimes practically impossible to do large scale refactorings, because almost all things are only checked at run time and almost nothing is checked at "build" time.

Rust is the exact opposite of that. And it also adds many more static checks than Python or C++ program could ever do. The language lets you express properties of the program in the type system. At the core of all this is the lifetime system, the borrow checker and move semantics.

If new Rust code is written, it sometimes has logic bugs or similar bugs in it.
But if Rust code is refactored and the algorithms are not changed, you're 99% done when the compiler stops complaining.

In Python it is extremely scary to pull out part of some code and put it somewhere else. It's extremely easy to forget something that you will only notice years after. It's no fun at all.
In Rust such things are easy, fun and the feeling of "did I forget something?" is not present, because the compiler guides the developer throughout the process.

Rust compiler messages are helpful. If you hear "Rust compiler complaining" translate that to "Rust compiler trying to help the developer".
"To fight the compiler" is not what actually happens. It's not a fight. It's a friendly helping hand.
And that really shines in refactorings.

Thoughts and clarifications

Posted Sep 15, 2024 13:52 UTC (Sun) by pizza (subscriber, #46) [Link] (9 responses)

> But if Rust code is refactored and the algorithms are not changed, you're 99% done when the compiler stops complaining.

That same argument applies to any statically-typed language, even (*gasp*) C.

Meanwhile, if you're not changing the algorithms/semantics/whatever in some way, why are you refactoring anything to begin with?

Thoughts and clarifications

Posted Sep 15, 2024 14:02 UTC (Sun) by mb (subscriber, #50428) [Link]

> That same argument applies to any statically-typed language, even (*gasp*) C.

No. That is not true.
There's a very big difference between what you can encode in the C type system and what is possible with the Rust type system.

>if you're not changing the algorithms/semantics/whatever

That is not what I said.

Thoughts and clarifications

Posted Sep 15, 2024 14:51 UTC (Sun) by asahilina (subscriber, #166071) [Link]

> That same argument applies to any statically-typed language, even (*gasp*) C.

Not at all, not to the extent it does with Rust.

In C, if you change a structure to be reference-counted, the compiler does nothing to ensure you manage the reference counts correctly. In Rust it does.

In C, if you add a mutex to protect some data, the compiler does nothing to ensure you actually hold the mutex before accessing the data. In Rust it does.

In C, if you change the width or signedness of an integer member or variable, the compiler does nothing to ensure you actually update the type of any variables it's copied to or passed into, and it will happily let you truncate or convert integers, even with -Wall. In Rust it won't compile until you change all the types or add explicit casts, and you probably won't even need to touch any code that just creates temporary bindings since Rust has type inference for that and C does not (without extensions like __auto_type or weird macros).

In C, if you need to add cleanup code to a structure that was previously just freed with free(), you need to find all the sites where it is freed and change them to call a helper that does the extra cleanup manually. In Rust none of this code exists to begin with since freeing is automatic, you just implement the `Drop` trait on the struct to add extra cleanup code and you're done, no need to refactor anything at all.

Thoughts and clarifications

Posted Sep 16, 2024 10:36 UTC (Mon) by farnz (subscriber, #17727) [Link] (5 responses)

No; the degree to which you're done when things start compiling depends critically on how much is checked at compile time versus at run time. My experience over 8 years of doing Rust, and approximately 20 years doing C, is that Rust programs tend to have much more in the way of compile time checking than C programs, which in turn means that "it compiles" is a much stronger statement than in C (although not as strong as it tends to be in Idris or Agda).

A more interesting question is whether this will continue to hold as more people write Rust code - is this current behaviour an artifact of early Rust programmers tending to write more compiler-checked guarantees, or is this something that will continue to hold when the Rust programmer pool expands?

Thoughts and clarifications

Posted Sep 16, 2024 11:41 UTC (Mon) by pizza (subscriber, #46) [Link] (4 responses)

> A more interesting question is whether this will continue to hold as more people write Rust code - is this current behaviour an artifact of early Rust programmers tending to write more compiler-checked guarantees, or is this something that will continue to hold when the Rust programmer pool expands?

Personally, I strongly suspect the latter.

Current Rust programmers are self-selecting, in the upper echelon of skill/talent, and largely using Rust for Rust's sake. That is very much non-representative of the software development world as a whole. [1]

Rust will have its Eternal September, when relatively average-to-mediocre corporate body farms start cranking it out. At that point, "Rust Culture" goes out the window as the only "culture" that matters is what falls out of coroporate metrics<->reward mappings.

[1] So is C, for that matter. If I were to pull out a bad analogy, if effectively coding in C represents the top 10th percentile, Rust is currently the top 1%.

Thoughts and clarifications

Posted Sep 16, 2024 11:50 UTC (Mon) by pizza (subscriber, #46) [Link]

> Personally, I strongly suspect the latter.

Gaah, make that 'the former'. (As I hope was clear from the rest of the post)

Thoughts and clarifications

Posted Sep 16, 2024 13:21 UTC (Mon) by farnz (subscriber, #17727) [Link]

I disagree in part; I think it'll get worse than it is today, for the reasons you outline, but that it'll still remain a lot more true of Rust than it is of C.

I have access to a decent body of code written by a contract house (one of the big names for cheap outsourcing), and the comments make it clear that they used their cheap people, not their best people, to write the code. Of the four most common causes of issues refactoring that code, three are things that are compiler-checked in Rust:

Assumptions about the sizes of arrays passed as arguments; where in C, I can pass a 2 element array to a function that expects a 4 element array, Rust either makes this a compile-time error (if the argument type is an array) or a runtime panic (if the argument type is reference to slice).
Assumptions about wrapping of unsigned computations. C promotes unsigned bytes to signed int for calculations, and then does the computation, but there's chunks of this code that assume that every intermediate in a long sequence without storing to a known type remains unsigned (otherwise there's UB lurking when the intermediate exceeds INT_MAX).
Failure to check all possible values of an enum, in large part because it's clear that the value got added after a module was "code complete", and nobody thought to add a default: or a special handler for this value.

Those all become panics or compile failures in Rust, leaving the errors in business logic (of which there are remarkably few) to deal with during refactoring.

And more generally, the biggest issue with cheap contractor C and C++ is the amount of code they write that depends on UB and friends being interpreted a particular way by the compiler, even in cases where there's no way to check that interpretation from code; Rust does seem to reduce this, even in beginner code, since unsafe is easy to find and be scared by.

Thoughts and clarifications

Posted Sep 22, 2024 18:24 UTC (Sun) by Rudd-O (guest, #61155) [Link] (1 responses)

> Rust will have its Eternal September, when relatively average-to-mediocre corporate body farms start cranking it out. At that point, "Rust Culture" goes out the window as the only "culture" that matters is what falls out of coroporate metrics<->reward mappings.

I haven't seen so far, at least in decades of me working in the industry, that eternal September has arrived to Haskell.

And I don't think that's going to happen. At least in Haskell. Maybe in Rust will. Maybe it won't.

There is a set of conceptual difficulties associated with learning any programming language, and it is not the same, depending on the language. Learning ATARI basic is one thing, (by the way that's the first language I learned). Learning Python is another Learning assembly is yet another Learning Haskell is another.

To pull the conversation away from the realm of language and just talk about concepts, pretty much any programmer can program using a stringly typed interface (which we all know leads to shitty code). But not every programmer is capable of learning the Haskell type system (I know I can't but ikcan understand how it leads to improved type safety and thus code quality).

All of this is to say that we're not all made equal. And because we're not all made equal, we are not all able to use the same tools. Just as we are not all able to wield a sledgehammer that weighs 30 pounds and break down a wall, so we are just as unequal to wield a specific programming language with skill and produce the results that one wants. Evolution does not stop at the neck.

But what do i know? Perhaps Haskell will get its eternal September? All i know is i can't learn it. Or at least I'm humble enough to admit that.

Thoughts and clarifications

Posted Sep 22, 2024 18:30 UTC (Sun) by Rudd-O (guest, #61155) [Link]

Addendum:

In case it's interesting for the readers here, the current firm I'm working at started the system that we are developing with Haskell. We had a lot of researchers that were super talented and were able to crank out what I consider pretty high quality code at the very beginning using nothing but Haskell.

The problem is that once you need to grow past 10 engineers, or in this case computer scientists, you can't. Finding 10 Haskell programmers in a Haskell conference is fairly easy. Finding the 11th to join your team when there's no conference going on is almost impossible. Hasklers are wicked smart, and because they're wicked smart, they're wicked rare.

So what did we do after that? We switched our system to Rust. Of course, the system continues to have the same type structure to the extent that it is possible, that it had back in the era when it started as Haskell. And all the Haskell programmers quickly adapted to using Rust because the type system in Rust is less complex than the type system in Haskell, so for them it was a downgrade. But we were able to quickly triple the amount of programmers that we had developing the system.

And the system continues to grow, and it has pretty high code quality for the standards of my career — I've seen code maybe 25 years? I routinely refactor the innards of the system without fearing that I'm going to break something somewhere else, somewhere deep in the system. I don't think I've ever felt so free to actually change the code without having the dread inside of me that it's going to catastrophically explode in production. Two years, and I have yet to put a bug in the system. That is almost completely magical.

Thoughts and clarifications

Posted Sep 22, 2024 18:19 UTC (Sun) by Rudd-O (guest, #61155) [Link]

> That same argument applies to any statically-typed language, even (*gasp*) C.

No. Not even close.

You can make a change in C struct, such that you're missing a member of the struct, and maybe the compiler will complain that you missed initialization somewhere of that struct member. That is true.

But if you add to an enum somewhere, which represents a new state of the object that you are using the enum on, and that enum is used in an if, case or select statement somewhere else, C will happily ignore that you've done this, and compile just fine. Then your code doesn't work.

In Rust, when you do this, generally a selector case statement equivalent, which is called match, will not compile, because it knows that you've added a different case that your code does not handle. Only after you have fixed every site where this enum is used, will it compile.

This simple thing prevents entire classes of logic errors in your code.

There are quite a few other ergonomic changes that the language has over other languages that existed before rust, which work in a similar way. to give you just one other example:

You change a C struct to have a new member that has a pointer to another type. You painstakingly change every place where that struct needs to be initialized so that your program will compile and run. This program is multi-threaded, like the kernel is. You run, your program, and it crashes. In this particular case, that new member that refers to a pointer to this other structure was used at the wrong time. This is due behavior that wasn't there before when the first structure did not have a pointer to the second one.

This is not possible in Rust. The compiler, in fact, the borrow checker in the compiler, keeps track of where every pointer is going and where every pointer is used. And will not let you compile the program if you use a pointer or a reference when you are not supposed to, when it's supposed to be dead or not initialized, or if the lifetime of the object taking a reference to that thing is a different lifetime, incompatible with the lifetime of the object pointed to by the pointer. It even knows when you are using data that you're not supposed to be using because you will have forgotten to grab a lock. And it will tell you you need to change how this is done. Try this, try this other thing, try this other thing. It gives you options.

This is so far ahead of anything that the C language does, that in fact could be construed as magic by Ken, Dennis, and Donald. You need to see it with your own eyes to believe it, but it is amazing.

On a personal note, this conversation on this particular thread has exposed to me the wide difference of perspective that C developers and Rust developers have. Having developed years of my life with both languages, I have the uncomfortable advantage of having perspective from both sides. But to me, it really does feel like we're arguing versus horses and carriages versus automobiles, or electric cars versus gas cars. I, too, thought Teslas were bullshit until I got on one, as a passenger, and the driver punched the pedal. Oh my god. It's a similar experience going from Python, or Go, or C, to Rust.

And I think that explains why a lot of people see what Rust developers say about the language, and then conclude, this must be a religion, or worse, a cult.

Thoughts and clarifications

Posted Sep 15, 2024 20:24 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (8 responses)

> Linux has historically placed great emphasis on (and heavily leaned into) internal interfaces and structures being freely malleable, but the assertion of "just change the definition and your job is done when the compiler stops complaining" is laughably naive.

Others chimed in with examples to the contrary, I also had a similar experience. FWIW, for me, the best feature of Rust was not the lifetimes and borrows, but pattern matching and exhaustiveness checking. I always hated writing code that encodes state machines, but Rust makes that so much better.

To be clear, other languages with pattern matching have similar properties, and even C++ might get it soon.

Thoughts and clarifications

Posted Sep 16, 2024 8:51 UTC (Mon) by adobriyan (subscriber, #30858) [Link] (6 responses)

> and exhaustiveness checking

fn main() {
let i: u32 = 41;
match i % 2 {
0 => println!("0"),
1 => println!("1"),
}
}

error[E0004]: non-exhaustive patterns: `2_u32..=u32::MAX` not covered

But rustc also exposed an embarassing bug in trivial C++ calculator program of mine, so I can't complain too much.

Thoughts and clarifications

Posted Sep 16, 2024 12:30 UTC (Mon) by excors (subscriber, #95769) [Link] (5 responses)

That's easily worked around by adding a catch-all "_ => unreachable!()", ideally with a comment explaining why you believe it's unreachable (assuming the real code isn't quite this trivial), and if you were mistaken then it'll become a runtime panic (unlike C where reaching __builtin_unreachable() is undefined behaviour).

After making that change, you do lose the benefits of compile-time exhaustiveness checking for that match statement; someone might change the condition to "i % 3" and you won't notice until runtime. But you'll still get the benefits for any code that matches integers you can't guarantee are in a particular sub-range (like the inputs to any API), and for any code that matches enums (presumably what Cyberax meant with state machines). I'd guess those situations are more common in most programs, so the exhaustiveness checking is still a valuable feature even if it's not perfect.

If your code is doing lots of work on bounded integers then I guess you'd want something more like the Wuffs type system, but then you'll get the compromises that Wuffs makes to make that work, which seems to restrict it to a very small niche. And that would still be inadequate if you do "match x & 2" since Wuffs doesn't know the value can't be 1. (Though as far as I can tell, Wuffs doesn't actually support any kind of switch/match statement - you have to write an if-else chain instead.)

Thoughts and clarifications

Posted Sep 16, 2024 14:20 UTC (Mon) by andresfreund (subscriber, #69562) [Link] (4 responses)

> That's easily worked around by adding a catch-all "_ => unreachable!()", ideally with a comment explaining why you believe it's unreachable (assuming the real code isn't quite this trivial), and if you were mistaken then it'll become a runtime panic (unlike C where reaching __builtin_unreachable() is undefined behaviour).

Imo this comparison to __builtin_unreachable() is nonsensical. I dislike a lot of UB in C as well, but you'd IMO only ever use __builtin_unreachable() when you *want* the compiler to treat the case as actually unreachable, to generate better code.

Thoughts and clarifications

Posted Sep 16, 2024 14:52 UTC (Mon) by adobriyan (subscriber, #30858) [Link] (1 responses)

Rust does and doesn't do bounds checking at the same time:
without unreachable!() it is compile error 100% of the time,
but at -O1 code generator knows what remainder does to integers.

https://godbolt.org/z/jbefszqa8

Guaranteed behaviour versus permitted optimizations

Posted Sep 17, 2024 8:49 UTC (Tue) by farnz (subscriber, #17727) [Link]

This is normal for any compiled language; the compiler is allowed but not required to remove dead code, and thus when the optimizer is able to prove that a given piece of code cannot be called, it is allowed to remove it (similar applies to unused data). However, it's never required to remove dead code, and when you're not optimizing, it'll skip the passes that look for dead code in the name of speed.

There's a neat trick that you can use to exploit this; put a unique string in panic functions that doesn't appear anywhere else in the code, and then a simple search of the binary for that unique string tells you whether or not the optimizer was able to remove the unwanted panic. It's not hard to put greps in CI that look for your unique string, and thus get a CI-time check for code that could panic at runtime - if the string is present, the optimizer has failed to see that it can remove the panic, and you need to work out whether that's a missed optimization (and if so, what you're going to do about it - make the code simpler? Improve the optimizer?). If it's absent, then you know that the optimizer saw a route to remove the panic for you.

Thoughts and clarifications

Posted Sep 16, 2024 17:40 UTC (Mon) by excors (subscriber, #95769) [Link] (1 responses)

This is getting slightly tangential, but I don't think it's that far-fetched to compare them - they have basically the same name (especially in codebases like Linux that #define it to "unreachable"), and people do use it in C for non-performance reasons, e.g.:

https://github.com/torvalds/linux/blob/v6.11/arch/mips/la... (unreachable() when the hardware returns an unexpected chip ID; that doesn't sound safe)

https://github.com/torvalds/linux/blob/v6.11/fs/ntfs3/fre... (followed by error-handling code, suggesting the programmer thought maybe this could be reached)

https://github.com/torvalds/linux/blob/v6.11/arch/mips/kv... (genuinely unreachable switch default case, explicitly to stop compiler warnings)

https://github.com/torvalds/linux/blob/v6.11/arch/mips/la... (looks like they expected unreachable() to be an infinite loop, which I think it was when that code was written, but it will misbehave with __builtin_unreachable())

https://github.com/torvalds/linux/blob/v6.10/drivers/clk/... (probably to stop missing-return-value warnings; not clear if it's genuinely unreachable, since clk_hw looks non-trivial; sensibly replaced by BUG() later (https://lore.kernel.org/all/20240704073558.117894-1-liqia...))

__builtin_unreachable() seems like an attractive nuisance (especially when renamed to unreachable()) - evidently people use it for cases where they think it shouldn't be reached, but they haven't always proved it can't be reached, and if it is then they get UB instead of a debuggable error message. It seems they usually add it to stop compiler warnings, not for better code generation. Often they should have used BUG(), which is functionally equivalent to Rust's unreachable!() though slightly less descriptive.

If you really need the code-generation hint in Rust, when the optimiser (which is a bit smarter than the compiler frontend) still can't figure out that your unreachable!() is unreachable, there's "unsafe { std::hint::unreachable_unchecked() }" which is just as dangerous but much less attractive than Linux's unreachable().

Anyway, I didn't originally mean to denigrate C, I was mainly trying to explain the Rust code to readers who might be less familiar with it. But it does also serve as an example of different attitudes to how easy it should be to invoke UB.

Thoughts and clarifications

Posted Sep 16, 2024 18:14 UTC (Mon) by mb (subscriber, #50428) [Link]

Yes, looks like you found a couple of actual soundness bugs in the C code.
I wonder if there are any uses of unreachable that actually make sense. As in: Places where the performance gain actually matters.

Thoughts and clarifications

Posted Sep 22, 2024 18:32 UTC (Sun) by Rudd-O (guest, #61155) [Link]

> but pattern matching and exhaustiveness checking

This was magical to me too. At first it felt super awkward because it felt like an inversion of the order in which things are supposed to read like. But when it clicked... oh my god. Combining that with the question mark or the return inside of the match, it really helped simplifying the structure of the happy path that I could read.

I am so happy I learned Rust. And I've even happier that I'm getting paid to do it.

Thoughts and clarifications

Posted Sep 22, 2024 17:58 UTC (Sun) by Rudd-O (guest, #61155) [Link] (10 responses)

> "just change the definition and your job is done when the compiler stops complaining" is laughably naive.

In C. And assembly.

> Donald Knuth

— famous C and assembly developer

Thoughts and clarifications

Posted Sep 22, 2024 18:11 UTC (Sun) by pizza (subscriber, #46) [Link] (9 responses)

> "Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth

..Your blithe dismissal of Knuth as an "assembly and C programmer" doesn't invalidate his point. "Provably correct" doesn't mean that it actually *works*.

Thoughts and clarifications

Posted Sep 23, 2024 9:15 UTC (Mon) by farnz (subscriber, #17727) [Link] (7 responses)

Knuth's point was not that it wouldn't work, but rather that his proof would only cover the things he cared to prove correct, and not necessarily cover everything that you, the reader of his code, would expect. He wrote that jibe in an era when formal methods practitioners were merrily proving all sorts of properties about code that, to a large extent, were irrelevant to users of computation systems (including those implemented using humans as the compute element), and not considering important properties (like "is this proven to terminate in finite time") because they're not always provable.

Thoughts and clarifications

Posted Sep 23, 2024 9:26 UTC (Mon) by Wol (subscriber, #4433) [Link] (6 responses)

While it may not be what Knuth was thinking of, I also think of it as pointing out that a formal proof merely proves that the mathematical model is internally consistent.

It does not prove that what the model does is what reality does! In a properly specified system, maths and science(reality) usually agree, but there's no guarantee ...

Cheers,
Wol

Thoughts and clarifications

Posted Sep 23, 2024 11:56 UTC (Mon) by pizza (subscriber, #46) [Link] (5 responses)

> It does not prove that what the model does is what reality does! In a properly specified system, maths and science(reality) usually agree, but there's no guarantee ...

This has been my experience.

...The software running on the 737MAX's MCAS was "provably correct" .. for its specifications.

(It turns out that the words "properly specified" are about as common as unicorn-riding sasquatches...)

Thoughts and clarifications

Posted Sep 23, 2024 12:01 UTC (Mon) by farnz (subscriber, #17727) [Link] (4 responses)

Reference for "the software running on the 737MAX's MCAS was "provably correct""? I can't find any evidence anywhere that the MCAS was formally verified at all - merely that it was tested correct, and Boeing asserted to the FAA that the testing covered all plausible scenarios.

Thoughts and clarifications

Posted Sep 23, 2024 14:06 UTC (Mon) by pizza (subscriber, #46) [Link] (3 responses)

> Reference for "the software running on the 737MAX's MCAS was "provably correct""?

I'm giving Boeing's software folks the benefit of the doubt, because the MCAS debacle was a failure of specification (on multiple levels), not one of implementation.

After all, one can't test/validate compliance with a requirement that doesn't exist.

> Boeing asserted to the FAA that the testing covered all plausible scenarios.

It did! Unfortunately, many of those "plausible scenarios" required pilots to be trained differently [1], but a different part of Boeing explicitly said that wasn't necessary [2].

[1] ie recognize what was going on, and flip a circuit breaker (!) to disable MCAS
[2] One of the main selling points of the MAX

737 MAX only tested, not proven correct

Posted Sep 23, 2024 14:40 UTC (Mon) by farnz (subscriber, #17727) [Link] (2 responses)

There is a requirement underpinning all avionics that the aircraft's behaviour is safe in the event of a data source failing, and that the avionics are able to detect that a data source has become unreliable and enter the failsafe behaviour mode. This is a specification item for MCAS, and Boeing asserted to the FAA that they had tested MCAS and confirmed that, in the event of an AoA sensor fault, MCAS would detect the fault and enter the failsafe behaviours.

Boeing's tests for this, however, were grossly inadequate, and at least 3 different failure conditions have been found which were not covered by the tests: first is that "AoA DISAGREE" was an optional indication, available during the tests, but not in production MCAS unless purchased (20% of the fleet). Second is that they did not test enough bit error cases, and later investigation found a 5 bit error case that was catastrophic. And third was that the procedure for handling MCAS issues assumed that the pilot would have time to follow the checklist; in practice, the training issues meant that pilots didn't even realise there was a checklist.

737 MAX only tested, not proven correct

Posted Sep 23, 2024 15:49 UTC (Mon) by pizza (subscriber, #46) [Link] (1 responses)

> first is that "AoA DISAGREE" was an optional indication,

It's worse than that -- _redundant sensors_ were optional.

..and they were optional because one set of folks had a different set of functional specifications than another, and management was disincentivized to notice.

737 MAX only tested, not proven correct

Posted Sep 23, 2024 16:23 UTC (Mon) by paulj (subscriber, #341) [Link]

Went and had a read, as it seems you and farnz don't quite agree and/or are talking about slightly different things. ICBW, but my read of the FAA summary report is:

FAA "Safety Item #1: USE OF SINGLE ANGLE OF ATTACK (AOA) SENSOR" - this refers to the use of /data/ from a single AoA sensor by MCAS.

FAA "Safety item #5: AOA DISAGREE:" - refers to the "AOA DISAGREE" warning in the cockpit FDU, which somehow was tied to the optional "AoA Indicator gauge" feature for the FDU, which airlines had to purchases.

AFAICT from the FAA summary. I.e., the change was entirely in the logic - because there was no action item to retro-fit another AoA vane to the 737 Max. Excluding training and maintenance requirements, aircraft changes were all logic update to do better filtering of the 2 AoA signals, with better differential error detection, and cease ploughing ahead with MCAS commands based on measurements from just 1 vane - which could be faulty. Other aircraft safety items added limits, damping and margins, to prevent runaway MCAS crashing aircraft.

Stunning issues really.

Thoughts and clarifications

Posted Sep 25, 2024 9:52 UTC (Wed) by Rudd-O (guest, #61155) [Link]

Not sure what you're referring to with "provably correct", that's Knuth's claim, not mine.

Nevertheless, in my professional experience, the C compiler's silence generally tells you near to nothing about whether the program will run as intended (I concede things have improved over the last 30 years. Whereas the Rust compiler's silence generally does mean the program is either going to run as intended or will have substantially fewer problems than the C one (mostly logic errors introduced by the programmer).

You can choose to be dogmatic and insist that all compilers / languages give the same results (a belief I like to call " blank slatism"), or you can choose to test that theory for yourself. I know what I chose and I am quite happy. My sincere recommendation is that you test the theory for yourself.

Thoughts and clarifications

Posted Sep 15, 2024 12:18 UTC (Sun) by sunshowers (guest, #170655) [Link]

Yes, I've performed many refactorings in complex Rust codebases and it's never been scart. The type system does a great job catching mistakes, and experienced Rust developers can find effective ways to leverage it.

For example, using PhantomData to deliberately introduce a lifetime parameter.

Or carefully using encapsulation to localize very complex code, testing or formally proving certain properties on it, and then using the type system to turn the local property into a global one.

If you've never used Rust, then please listen to those of us who write Rust day in and day out (coming on 8 years full-time for me). Rust really is a massive improvement over both C and C++.

Thoughts and clarifications

Posted Sep 22, 2024 17:57 UTC (Sun) by Rudd-O (guest, #61155) [Link]

> Have you ever seen a moderately complex *RUST* codebase that is easy to refactor?

Yes, I have.

Because the compiler yells at me all the way until I'm done with the refactor, which is awesome. Because I know at the end of the refactor, my refactor is almost certain to work properly. Because Rust forced all of the developers before me to encode the behavior of the APIs and the algorithms in the type system.

And the type system + the borrow checker do not forgive you or give you free passes. You cannot cast to void yourself around it. You cannot pass some sort of shim that could have a bug itself. You cannot hold a pointer for the wrong timespan. You have to fix it correctly.

Sure, you can leave a bunch of functions empty, effectively not doing your job in not finishing the refactor, and obviously the program is going to break after that.

But if you do your job, it is substantially easier than refactoring a C code base, only to discover that the first time you run it corrupts your data, or it crashes, or it works for a good while without any problems, and you discover all those issues much later down the road.

Anecdotally, Python refactorings have the same problem of C refactorings. In fact, it might actually be worse than C refactorings. Python with types is substantially easier to refactor than Python without types. But Rust? Refactoring Python is much harder, with without types, than refactoring Rust.

Thoughts and clarifications

Posted Sep 22, 2024 17:00 UTC (Sun) by Rudd-O (guest, #61155) [Link]

> Well, there was Ted Ts'o's rant at Wedson and the others that started with an accusation of wanting to "convert" people to the Rust "religion", followed by a pile of strawman arguments, followed by another person making jokes comparing Rust to Java and more strawmen.

From the short clip I saw, yes, I agree with you, that was totally inappropriate. They didn't even let him finish; it wasn't even a discussion to begin with. It was just demonstration of how the type system works. and the guy was just trying to explain to them why the type system protects them from problems. And Ted started vaguely accusing RfL devs of converting people to the Rust religion, what the hell? I did a double take when I saw that part of the clip.

Uncalled for, in my opinion. And Ted is an extremely smart man. I wonder what prior interactions he had that led him to that comment.

Thoughts and clarifications

Posted Sep 4, 2024 17:27 UTC (Wed) by Wol (subscriber, #4433) [Link] (4 responses)

> The reality is more that all kernel maintainers are just buried alive in maintenance and review work and that adding even more stuff to have to think about not breaking with each future change will inevitably cause them more efforts for each move and make them even less efficient at their tasks.

Let's call that what it is, shall we. Fire Fighting. Which is a MAHOUSIVE waste of time.

And if we read the article, it's clear that most of the patches were (a) documentation, and (b) bug fixes. Both of which would presumably HELP them in their task.

> We're *lacking* kernel maintainers. Many of them would probably just give up if someone could come with a *guarantee* that their work will continue to be well supported for their users.

Well, somebody who wants to DOCUMENT and CLEAN UP your code should be welcomed with open arms ... okay, what was that attempt to subvert a linux subsystem by someone? But that got caught by the "many eyes" syndrome, and these things probably mostly get caught ... (and if you don't take risks, you're going to get nowhere).

> Developing is not just a matter of having a job. It's also a matter of being responsible for many users of your work and not willing to abandon them nor send them to what you suppose to be a dead end.

Which isn't helped by driving away people who want to help.

Okay, I get it. If you're fire-fighting all the time, it's hard work to actually do anything. I'm desperate to fix our little system at work (and have it grow its tentacles into everything else! Of course!). But when I'm fighting a P1 incident like I was yesterday (the next one up, P0, is "get the CxOs involved - this is a major disaster in the making") I totally understand how you don't have time to do anything important. It frustrates the hell out of me! I'm a database designer, and a huge amount of grief is caused directly by Relational's FAILURE to guarantee data integrity. Why it does that, I don't know. All I know is it's grief left right and centre.

But you need to stand back, and look at the big picture, AND LET OTHER PEOPLE HELP. And that's painful. I know.

Cheers,
Wol

Thoughts and clarifications

Posted Sep 4, 2024 17:33 UTC (Wed) by wtarreau (subscriber, #51152) [Link]

> But you need to stand back, and look at the big picture, AND LET OTHER PEOPLE HELP. And that's painful. I know.

I totally agree. That's the hardest. And I've been through this as well. But I mean that it's also important that those offering help understand this and don't instantly turn a concern into a religion-based rejection, because when you're overloaded you definitely have other things to do than enter such considerations and endless discussions that bring nowhere.

Thoughts and clarifications

Posted Sep 4, 2024 17:42 UTC (Wed) by pizza (subscriber, #46) [Link] (2 responses)

> Let's call that what it is, shall we. Fire Fighting. Which is a MAHOUSIVE waste of time.

Not if your job is an actual firefighter; then it's literally what you're being paid for.

Thoughts and clarifications

Posted Sep 4, 2024 17:56 UTC (Wed) by Wol (subscriber, #4433) [Link]

:-)

Yeah. But the world would be a lot better off without the need for firefighters.

So it's still a mahousive waste of time. I think it's called the broken window fallacy, right?

Cheers,
Wol

Thoughts and clarifications

Posted Sep 5, 2024 12:34 UTC (Thu) by Ashton (guest, #158330) [Link]

Extending this metaphor further; firefighting is a productive activity only if there are house fires. One of the things that happened in the U.S. is that house fires dropped by about half between 1980 and now, and many fire departments began picking up more paramedic responsibilities in response. Compared to their predecessors modern firefighters do a lot less fighting fires.

Thoughts and clarifications

Posted Sep 4, 2024 18:31 UTC (Wed) by Deleted user 129183 (guest, #129183) [Link] (2 responses)

> You're not doing you a good service doing this, by showing a possible lack of empathy

Unfortunately, there’s something about the IT field that attracts people with lack of empathy like rotten meat attracts flies. Sometimes I wish that programming had remained a (cis) women-dominated profession, then maybe the ‘programming community’ would be more bearable. But unfortunately that ship has sailed.

Thoughts and clarifications

Posted Sep 4, 2024 19:20 UTC (Wed) by pizza (subscriber, #46) [Link] (1 responses)

> Unfortunately, there’s something about the IT field that attracts people with lack of empathy like rotten meat attracts flies.

...You say that like those flies aren't providing important ecological service in their own right.

> Sometimes I wish that programming had remained a (cis) women-dominated profession, then maybe the ‘programming community’ would be more bearable. But unfortunately that ship has sailed.

Be careful what you wish for. The most toxic work environments I've ever experienced during my career were female-dominated. [1]

[1] "professional" office/administrative settings. As always, culture rots from the top down.

Let's stop here

Posted Sep 4, 2024 19:22 UTC (Wed) by corbet (editor, #1) [Link]

Honestly, I don't see this particular subthread leading to anything good. Let's stop it here, OK?

Thoughts and clarifications

Posted Sep 4, 2024 21:31 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

> There *might* possibly be such people but quite frankly I doubt it.

Experience with the C++ community would strongly indicate that there's a rich seam of such people. For them the programming language is a core part of their identity, rather than merely a tool they happen to use as programmers. They're by no means the majority, but they do really exist and in some numbers.

Thoughts and clarifications

Posted Sep 4, 2024 21:53 UTC (Wed) by mb (subscriber, #50428) [Link]

>Not having week-ends, checking e-mails during vacation,
>constantly getting comments from family "you're again on your laptop?".

I'm sorry, but the answer to wrong life decisions is not to block new technologies and to block other people.

If you are overworked, do less. Really simple.
It's a common misunderstanding that the world/kernel stops, if any maintainer does less.

Somebody new will eventually take over.
Why doesn't somebody else take over now, you ask? Well, because you do everything 24/7 already! Why would anybody take over, if you do everything already?

Yes, most people do important work and it would be a great loss short term if they didn't do it.
But the world continues to rotate without all the people who *think* they are extremely important and work all day long.

The kernel development community is in a real need for reducing its hostility to zero.
It scares away way more people than a loss of an overworked maintainer that gave up does.

Because often enough it's not the overworked-ness that causes problems.
It's the ego trip of the maintainers that does.
How else can you explain that patches that improve robustness and thus reduce future (and present!) bugs and future maintainance burden are rejected?

Existing maintainers and abstractions

Posted Sep 4, 2024 17:03 UTC (Wed) by corbet (editor, #1) [Link]

I'll not challenge your comments on the DRM scheduler; you know that code, I do not. But one thing you said here struck me:

Honestly, I think giving existing kernel maintainers veto/ack power over Rust abstractions was a mistake

My feeling is that any alternative would have been far worse — it would be a quick path to the "second-class citizen" status that nobody wants for the Rust code. Maintainers who have had a bunch of Rust code pushed into their subsystem against their will are not going to go out of their way to avoid breaking it — some of them, anyway.

This is why I am saying that we need to find a better way to align everybody's goals.

The maintainer summit is less than two weeks away, and Rust should be well represented there. I hope that people will show up having thought about how we can improve this relationship.

A missed invariant

Posted Sep 4, 2024 18:16 UTC (Wed) by ebiederm (subscriber, #35028) [Link] (13 responses)

I read through your description of what is going on and I have the strong feeling you are missing a strong invariant in the code.

The invariants that I see reading the discussion are:

- The driver may not be torn while I the hardware is doing something.

- The schedulable entities must exist while they have state in the gpu to manage.

Those lead to a scheduler can only be torn down if there is nothing to schedule.

My sense is that you have paid attention to the Rust lifetimes, but have lost the real world hardware lifetimes.

I don't understand how in the world it makes any sense to tear down something, when what it is managing still needs to be managed.

That is just my sense looking at this discussion from the outside, as nether a Rust nor a graphics person. So I might be wrong. But I hope this helps move the conversation forward.

A missed invariant

Posted Sep 4, 2024 19:19 UTC (Wed) by asahilina (subscriber, #166071) [Link] (7 responses)

> I don't understand how in the world it makes any sense to tear down something, when what it is managing still needs to be managed.

I suspect nor did the maintainer, because he wasn't paying attention to what I was saying, and he was stuck in his mental model of how other drivers work, and that's why he kept rejecting my ideas.

> The driver may not be torn while I the hardware is doing something

This is obvious. However, the scheduler is not the driver and in my driver there is not one global scheduler as there is in most drivers. In my driver, a scheduler is bound to a graphics queue and gets torn down when the queue is destroyed, for example when you kill the process that owns it. There are many schedulers running in parallel. This is because the actual global scheduler is implemented in GPU firmware and the DRM schedulers are just one layer on top of the firmware's scheduling primitives, used for dependency control and to wait for hardware resources (not to actually schedule across multiple queues since each only handles one). As strange as this might sound at first glance, this is how firmware-scheduled GPUs are supposed to be implemented in DRM, and I had long conversations with DRM people about this before jumping into the implementation, so the concept is sound.

Therefore schedulers are created and destroyed constantly, and it is not acceptable to block scheduler destruction on unnecessary things, since that would create the need for a bunch of extra tracking that would not otherwise be necessary.

> The schedulable entities must exist while they have state in the gpu to manage.

This is not true. It might be true in some drivers, perhaps drivers that have a better match between the drm_sched design and the GPU design, perhaps even the AMD driver (it is not a coincidence that the drm_sched maintainer is an AMD employee, that code came from the AMD driver).

But it is not true in my driver. My driver has to match the firmware API (which Apple defines and in fact isn't even stable and changes across versions and GPU generations) to DRM's model and the UAPI that I defined, which in turn is what the userspace code speaks to. As such, the driver's code consists of lower layers that manage firmware resources such as its idea of command queues, and higher layers that aggregate them into UAPI objects such as the UAPI command queue. In fact, in this driver, a single UAPI command queue maps to up to three firmware command queues (compute, vertex, fragment).

Since it is critical that firmware resources are always kept alive as long as is necessary (otherwise the whole GPU firmware crashes irrecoverably), the underlying firmware job objects are what hold references to the resources they need (and so on for resources that depend on others). These job objects are only cleaned up when the firmware itself signals job completion (successful or not).

This is perhaps very different from the typical C GPU driver design. C encourages you to have big objects and top-down locking and ownership management. That sort-of works but runs into all kinds of issues where everything needs to wait for everything else for cleanup, and you can easily end up with ordering issues.

Rust instead encourages fine-grained reference counting and fine-grained division of responsibility. The outcome of doing things this way is very positive, because it means that I can use Rust lifetimes to track firmware object lifetimes and ensure they meet the expectations of the firmware itself. This is in fact *the* reason why I chose Rust for this driver, because I knew getting all this right in C (and I have to get it right for the driver to not crash the whole system all the time) would be a nightmare.

Put another way, I do not trust drm_sched or any high-level code to be the ultimate decision maker of the lifetime of GPU resources. The underlying resources "track themselves" instead, and that works a lot better.

And so, when a process dies or destroys a GPU queue, I really don't care if there are jobs in flight. I just want to tear down the DRM scheduler layer and its concept of jobs. The underlying firmware resources will continue to run (there is no "cancel" command I can use to abort them, macOS never does that either), and will continue to hold references to any needed resources such as the associated GPU VM page tables, firmware command queues, kernel-allocated temporary GPU buffers, etc. until they complete, and then all the reference counts drop to zero and everything gets freed, possibly over one minute (compute jobs can be long-running) after the DRM scheduler and indeed the whole process that created this GPU job is long gone.

Not coincidentally, this is also how macOS behaves at least at the UAPI layer (you can ^C a compute app running a long job and the process dies, but the compute job continues to run to completion or failure in the firmware and kernel driver), though there are definitely big differences in our implementations. It's how the hardware is intended to be driven.

A missed invariant

Posted Sep 5, 2024 2:56 UTC (Thu) by nksingh (subscriber, #94354) [Link] (5 responses)

I just wanted to restate my understanding of Lina's point in hopes of justifying her point:

External code effectively controls the lifetime of two objects, the fences and the schedulers that process the fences. The fences are tied to some fds that the userspace controls and the scheduler(s) are tied to either a process context in Lina's case or some hardware device.

Both sides could go away at independent times driven by external forces, like a process exiting or a GPU being yanked out. Lina's design change is saying that the scheduler should be able to keep its lifetime independent by actively kicking out the fences it is tracking. This is nice because it's minimal extra complexity to vastly simplify the contract.

Oftentimes the simplified and less interdependent contracts allow components to evolve cleanly in the future and support more scenarios, so regardless of rust this is probably something that would be a righteous. Is there some C only driver code that could be simplified and deleted along with the proposed change?

A missed invariant

Posted Sep 5, 2024 7:43 UTC (Thu) by khim (subscriber, #9252) [Link] (4 responses)

> Is there some C only driver code that could be simplified and deleted along with the proposed change?

No, but as was already shown there exist a C code that hit the same issue that Asahi driver, even if in somewhat exotic case of external eGPU attached to laptop.

And even if there wasn't anything like that then flat out rejection without explanation is not something I like to ever see.

Sometimes even the most trivial and simple looking changes could be wrong because they are revealing deeply hidden problems and yes I've seen this situation that a 2-line patch can end up with 6 months of work to redo lots of things differently, too. That's normal! And saying that your two-line patch breaks our 10'000 line of code that are built on the opposite assumption is not a crime.

What is not normal is rejecting things on your authority without any explanations. That's acceptable when technical arguments are ignored and heck, as I like to say, I started considering Rust seriously when I saw how Rust community applied that to a person who kept rejecting technical arguments, but damn it, can anyone show us how and when any of Rust-in-Linux developers ignored technical arguments?

We saw that many times with C maintainers doing it (accusation that Rust is a religion and rustaceans try to be a missionaries are as non-technical as they come), but what about other side? Where was ignorance of technical issues was demonstrated?

A missed invariant

Posted Sep 5, 2024 9:01 UTC (Thu) by asahilina (subscriber, #166071) [Link] (3 responses)

It's actually possible that the Xe driver could be simplified with those changes (since it is also a firmware-scheduled GPU driver), but I'm not sure. As I understand it, they did architect things around the existing drm_sched requirements at least in some ways. Maybe that could be cleaned up if drm_sched is made more reasonable, or maybe it ties deeply enough into the rest of the driver that it doesn't make a difference at this point.

When valid technical arguments were made about the way I was doing things (the first patch mentioned in the article, about the new callback) I did change how my code worked. That discussion was very frustrating and could have been handled much better, but the end result *was* that the maintainer let me know of a better way of doing things.

There is, however, one more ironic and frustrating event here. This patch:

https://lore.kernel.org/lkml/20231110001638.71750-1-dakr@...

This patch to drm_sched basically implements what I wanted to implement in my first patch. It adds a new callback just the same, it just calls it update_job_credits() which is merely a slightly different way of doing the same thing. Except it does it much more intrusively, changing existing APIs so all the drivers have to be updated.

It also introduced the race condition that I then had to debug in [1], so it wasn't even a correct patch since it introduced a memory safety regression (for all drivers).

So I guess the "technical arguments" against my first patch, which I did agree with, still only apply to me somehow, and not the author of the above patch. I wonder why... maybe it's because people with an @redhat.com email address are inherently given more respect and trust? I can't help but wonder.

[1] https://github.com/AsahiLinux/linux/issues/309#issuecomme...

A missed invariant

Posted Sep 5, 2024 10:41 UTC (Thu) by khim (subscriber, #9252) [Link]

> So I guess the "technical arguments" against my first patch, which I did agree with, still only apply to me somehow, and not the author of the above patch. I wonder why... maybe it's because people with an @redhat.com email address are inherently given more respect and trust? I can't help but wonder.

Nah, more likely because this particular person have earned trust in the past.

I was in a similar position at my $DAYJOB and had to fight for a similar “privilege” of not fighting against nonsense comments for a few years. What worked, in the end, was the way people fight software patents: instead of declaring them invalid they accept their validity on faith but then “cut it to pieces” ¹. Similarly with $TROUBLESOME_MAINTAINER: after I accepted nonsense suggestions what were sure to lead to troubles and submitted them in my name but with explicit and prominent comments “this code handles issues that are too tricky for me to understand, but it was suggested $TROUBLESOME_MAINTAINER thus I assume they are correct” I then redirected all bugs caused by that code back.

After about half-dozen of times of that I convinced $TROUBLESOME_MAINTAINER that I understand that part of code better and now I could, probably submit really nonsense code and it would be accepted. I actually even did that, once, but since it was me who realized that code I submitted was nonsense and $TROUBLESOME_MAINTAINER have never understood the issue till I cooked up the test that broke and also created a fix faith remained.

I'm pretty sure there are something like this in play there.

¹) Bring prior art and instead of saying that this prior art invalidates the patent assert that since patent is still there and prior art is also there you have to interpret claims for the patent very narrowly to make sure they don't clash with the prior art. Repeat dozen of times and now you have extremely crippled patent which covers some very obscure and strange thing… which is no longer even close to what you are doing.

A missed invariant

Posted Sep 5, 2024 14:48 UTC (Thu) by DemiMarie (subscriber, #164188) [Link] (1 responses)

Has this regression been fixed?

A missed invariant

Posted Sep 5, 2024 16:02 UTC (Thu) by asahilina (subscriber, #166071) [Link]

Not as far as I know. I found it recently but at this point I don't really want to spend time fixing it in drm_sched since I'll be moving away from it. I just have a quick hack in our downstream tree to avoid the issue (that just disables part of the functionality that introduced it).

I haven't seen any other reports of the crash happening with other drivers, so I guess they got lucky for whatever reason (it's a UAF so if the freed memory isn't reused yet it would "work"), or maybe they have extra locking that makes the race impossible (if for some reason queuing a job is mutexed with its execution then that would avoid the problem, though it would also likely introduce a violation of fence signaling rules, but that's a whole different can of worms).

For us the problem only started happening after a certain kernel update, plus the GPUVM changes, and with kernel builds configured with certain preemption modes (not the case on the Fedora kernel), and even then it happened quite rarely and was hard to repro. So it's conceivable that the bug is just lurking and not triggering for other drivers.

A missed invariant

Posted Sep 6, 2024 12:59 UTC (Fri) by daenzer (subscriber, #7050) [Link]

> I suspect nor did the maintainer, because he wasn't paying attention to what I was saying, [...]

And you were paying attention to what he was saying perfectly, right?

Has it occurred to you that Christian might feel the same way about your exchange, just in reverse?

In the two contentious threads referenced in the article, I see Christian asking you for clarification, explaining why things can't be done like this, and making suggestions how it could be done differently. Doesn't really look like "he just wouldn't listen" to me.

P.S. AFAICT Christian was mostly explaining issues from the C PoV, I'm not sure he even made any direct statement about his position on Rust for Linux. Comments here trying to put him in a "maintainers resisting Rust" corner seem unjustified.

A missed invariant

Posted Sep 4, 2024 19:32 UTC (Wed) by pizza (subscriber, #46) [Link] (4 responses)

> My sense is that you have paid attention to the Rust lifetimes, but have lost the real world hardware lifetimes.

To paraphrase the Red Queen, hardware has the remarkable ability of triggering six supposedly-impossible situations before breakfast, invariably wrecking your nice clean/consistent/clever abstractions in the process.

(I have no opinion on if that applies to this particular situation...)

A missed invariant

Posted Sep 5, 2024 11:22 UTC (Thu) by taladar (subscriber, #68407) [Link]

That point applies just as much whether the clever abstraction is written down or is just a set of unwritten invariants that only exist in maintainer's heads. If anything changing the mental model as new behaviors are discovered is a lot harder than changing something that is written down since our brain has a way of making us confuse old versions and new versions of similar things that existed over time.

A missed invariant

Posted Sep 5, 2024 18:30 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (2 responses)

Well, either that, or your abstractions are so nasty that they are actually worse than the hardware. The C++ spec says that relaxed atomics may be implemented using (effectively) a time machine, and therefore inherited all manner of temporal paradoxes from every bad sci-fi novel you've ever read (but they insist on using entirely new names for everything, so e.g. the bootstrap paradox is renamed to "out of thin air," the grandfather paradox is "read from unexecuted branch," etc.). Meanwhile, real compilers emit raw load and stores (in the case of simple atomic reads and writes) or real atomic instructions (for read-modify-write), and real CPUs are usually not time machines, and so this is all a purely theoretical problem that never actually happens.

Rust also inherits this model for its atomics, and to my understanding, it takes exactly the same "meh, theory is hard, we'll just assume the implementation is not insane" stance as C++. I like to imagine that some aggressive compiler writer is going to blow past the spec's non-normative "please do not actually do this" note and try to optimize a time machine into C++ and/or Rust, but realistically I think the compiler writers are smart people who understand that this would be a bad idea.

To further explain the motivation: Relaxed atomics are basically the "I don't want the compiler to turn my data race into UB" annotation. They don't actually do anything other than guarantee that the individual operations on that particular variable are atomic and consistent with some sequential ordering, and that the compiler is not allowed to deduce UB from any data races that may result, but other than that, there are no rules. This is intended to allow compilers to just emit loads and stores for simple (read-only or write-only) atomics, since the compiler may reason that any cache incoherency could also be interpreted as some reordering of the parallel loads and stores, and so you don't have to emit a fence. Since the "some" sequential ordering is not required to be consistent with the ordering of any other relaxed atomic variable, you can have two of these variables that interact with one another in such a way that the resulting ordering is permitted to contain a loop or cycle, in which case the results are not well-defined but also are not UB, and now you have rules that permit circular arguments like "x is 37 because x is 37." In C++14, they did add a rule prohibiting some specific cases (including the simple 37-because-37 case), but to the best of my understanding, this is not a comprehensive fix and there are still variations of the behavior that are not forbidden by the spec.

A missed invariant

Posted Sep 5, 2024 19:35 UTC (Thu) by khim (subscriber, #9252) [Link] (1 responses)

> real CPUs are usually not time machines

I would say that since Pentium Pro “real CPUs” are a time machines and thus all these paradoxes they speak about have become real

> so this is all a purely theoretical problem that never actually happens.

Except it does. It even happens on x86, as we all know. And x86 tries harder than most other CPUs to hide it's true nature.

> to the best of my understanding, this is not a comprehensive fix and there are still variations of the behavior that are not forbidden by the spec

The problem here is that creating a memory model that makes some sense and also is compatible with all these optimizations that hardware (not compiler!) is doing… it's not easy.

From what I understand C++ (and thus Rust) model is not well aligned with what hardware is doing but what Linux is doing is not compatible with some more exotic targets (e.g. GPUs) thus we just have to agree that designing sensible memory model for atomics is just hard.

A missed invariant

Posted Sep 7, 2024 23:18 UTC (Sat) by NYKevin (subscriber, #129325) [Link]

That link is not what I mean by "a time machine." I'm referring to real temporal paradoxes (such as both variables nonsensically taking on the value 42, because each supposedly got that value from the other), not just "the CPU executed things in a different order than I thought it would," nor even "the reordering which the CPU performed cannot be reconciled with a single global order of execution." Those are both normal and expected anomalies under relaxed atomics - they're the spec working as designed.

Thoughts and clarifications

Posted Sep 5, 2024 19:57 UTC (Thu) by mohkale (guest, #159931) [Link]

I lack the experience and familiarity to comment here but just wanted to say, thank you for all the hard work. You're an inspiration :-).

It's more a matter of culture

Posted Sep 4, 2024 16:27 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (5 responses)

Unfortunately it will remain difficult to satisfy everyone because the culture, hence the expectations, are totally different. And that's expected, as if they were not, Rust wouldn't have been proposed as an alternative to just do everything the same way.

The problem is that in one team's culture, the lower layers must provide all the safety belts needed to that it's safe and easy to code on top. In the other team's culture, the upper layers must respect and abide by contracts. Changing any of this means replacing almost everything. Adding protection to the lower layers changes paradigms by requiring that the people in charge of these layers are suddenly responsible for a lot more guarantees offered to the upper layers, and have to care about tons of stuff that they never imagined having to do (and possibly that would be completely outside the scope of what interests them in the project, just like a mathematician might love working on crypto algorithms but never wants to deal with input data representation and supports only raw binary). Those trying to offer safe interfaces can't imagine having to care about stuff that they consider as "implementation specificities" that would make their interfaces more difficult to use than initially designed for, and would needlessly add difficulty to their users.

Neither is right nor wrong. Both are right in their area, what is wrong is to try to assemble the two by each forcing the other to adapt, precisely because the expectations do not match.

And that's not specific to languages, the same is true in many projects where there are core parts touched by few people who find an interest in working on "simple" small and efficient things. I'm putting quotes around "simple" because actually they do something small and do it well. It often requires to understand some poorly mastered topics like memory barriers or stuff like this, but once those concepts are understood the lower parts almost never change and are 100%-proven safe. And there are many more parts acting as satellites around that small core, that are supposed to be more accessible to newcomers without having to understand the gory details of the core. Most often there are abstraction layers between the two extremities to satisfy the expectations of each side, deal with error control, safety checks, domain validity etc. It's very likely that the naiscent conflict here reveals that some abstractions are still missing, and that the painful thing to implement in every code part in fact has to be placed in an abstraction layer, possibly changing a lot how the upper one was expected to be written (e.g. use more callbacks maybe, etc). Very often, most of this is not satisfying for either parts, but that may mean that there are just not enough abstraction layers between these extremities.

Often it's also important not to dismiss the work done at those layers where very few people are skilled. It's very easy to overlook some of the conceptual difficulties there, wrap everything in a way that makes it easier to use, declare that now it's the only way to use it and lose 90% of its value. Then one day someone realizes that the person who did that 10 years ago had quit the project by seeing their work constantly criticised despite never failing, and that unfortunately nobody knows how to port/update or even redo that because those skills are lost. There must be room for having fun in every project and everyone's work and role needs to be respected, even if often those doing stuff nobody understands finish by being ignored and forgotten. This has been seen a lot in applications migrating to the web using frameworks which removed the need to understand databases for example, and one day everyone notices that the DB expert was gone 5 years ago and never replaced since perceived as useless, while nobody can explain anymore why the application is horribly slow and cannot scale...

One last thing: it seems important to me that in any case the lower layers implement protections against misuse. That's particularly true in opensource projects which can get many new developers from everywhere and who will genuinely think that what they found to be possible it supported while it's not. The abuse of assertions in certain code paths (even if just for debug builds) seems important to me so that those trying to make their way through it figure early that what they're doing is not expected to be done that way. This significantly reduces frustration. In the example given here, we could imagine that unregistering a scheduler causes a check for remaining tasks and oopses if that happens so that the user of that API understands they have to find another way to do that.

It's more a matter of culture

Posted Sep 5, 2024 8:06 UTC (Thu) by ralfj (subscriber, #172874) [Link]

> the lower layers must provide all the safety belts needed to that it's safe and easy to code on top

This is only true for safety requirements that cannot be easily checked fully automatically -- in Rust's case, this means: safety requirements that cannot be expressed as Rust types. Most safety requirements *can* be expressed as Rust types, and then nothing needs to change in the lower layer.

For safety requirements that are so complicated that they cannot be expressed as Rust types, an argument can be made that they are so complicated that higher layers programmed in C are also at significant risk of getting them wrong.

So the culture difference here is more about, how much risk of accidental misuse by higher layers are you willing to accept? Even with no Rust involved, one has to pick a reasonable trade-off. So far, there is no hard line here that would clearly mark an API as "too complicated" -- it's up to negotiation between driver authors and subsystem maintainers. With Rust, a hard line is introduced here: if things cannot be put into Rust types, that is deemed a hard "too complicated". The DRM scheduler is such a case, and now the challenge is to convince the maintainer of that system that it is indeed "too complicated".

> Changing any of this means replacing almost everything.

No, that's just not true. As Asahi Lina said multiple times in this thread: the vast majority of C APIs are perfectly fine! No change is needed to expose them in Rust in a reasonable way. It is a rare exception that a C API has a safety contract that is so complicated that it cannot be expressed in Rust types. So far, at least Asahi Lina hit exactly one such case: the DRM scheduler.

So it's just not correct to claim that everything has to change. Most APIs are already robust enough for a good Rust API.

---

All that said, I wonder if it would help for some of these changes to use a different communication strategy. I noticed the argument "this is needed for a safe Rust API" is used to try and change a C API, and I can see how that can be a hard sell for a C maintainer. So maybe it would help to focus on "this makes the API easier to use for everyone, C and Rust consumers alike" -- trying to argue for the change entirely on its own merit, without involving Rust much. That argument *has* been made, but it often came as a secondary argument after "Rust needs this", and such subtle distinctions can make a big difference in how a proposed change is perceived. Of course I am an outsider here, so feel free to ignore this advice if it isn't helpful. :)

It's more a matter of culture

Posted Sep 5, 2024 9:54 UTC (Thu) by kleptog (subscriber, #1183) [Link] (1 responses)

> The problem is that in one team's culture, the lower layers must provide all the safety belts needed to that it's safe and easy to code on top. In the other team's culture, the upper layers must respect and abide by contracts. Changing any of this means replacing almost everything.

But that's just not true though. The Rust approach is also that the all layers must respect and abide by contracts, that's why it produces efficient code. The difference is that the contracts are validated by the compiler. There's no need to change anything, you just need to express the contracts in a way the compiler understands (or use unsafe if it gets too hard).

So there's no need to replace everything, just a lot of figuring out what the contracts actually are and writing them down. Judging by some of the comments here, most of the contracts in the kernel APIs are reasonably ok, but some are very hairy.

I can't recall a situation where making an API more complicated was necessary for efficiency. There's always a way to write your API that is cleaner and more efficient. So if there are hairy APIs in the kernel, it seems unlikely that making them simpler is going to cost much at all. The example here was a few lines of code to simplify an API which incidentally fixed a bunch of drivers that use the API incorrectly already.

It's more a matter of culture

Posted Sep 5, 2024 14:04 UTC (Thu) by Sesse (subscriber, #53779) [Link]

> I can't recall a situation where making an API more complicated was necessary for efficiency.

The Linux kernel certainly has some rather complicated APIs precisely for efficiency reasons (e.g. RCU is much, much more subtle than a simple spinlock, slab allocators are set up because kmalloc would be too slow, workqueues exist because doing too much stuff in an interrupt handler would cause latency, etc.).

It's more a matter of culture

Posted Sep 5, 2024 12:53 UTC (Thu) by Ashton (guest, #158330) [Link] (1 responses)

> In the other team's culture, the upper layers must respect and abide by contracts.

Except in many cases today the contract isn’t even written down, it’s in the maintainers heads *at best*. This makes even using the API correctly a process of extracting information from the maintainer via mailing list, a frustrating experience.

At worst, nobody knows what the contract is. An entirely plausible reading of the T’so outburst is that nobody knew how the API should work, which is why the request for *clarification* (not change!) from the rust developer provoked such anger.

Moving to a world where such contracts are always explicitly documented is, alas, a culture change akin to the first you specified. And even if that existed that provides no guard against necessary changes if the contract needs to change or was in fact wrong from the beginning; both of which apply in this scenario.

It's more a matter of culture

Posted Sep 6, 2024 7:56 UTC (Fri) by wtarreau (subscriber, #51152) [Link]

> Except in many cases today the contract isn’t even written down, it’s in the maintainers heads *at best*.

I know, and in many projects (and I'm sure the kernel is no exception), even the maintainers don't really know, and code analysis is required to figure it. I'm personally extremely picky about writing a function's expectations in comments above it, and when sometimes I face one that's complicated and insufficiently documented, I wonder "who's the bastard who did that?" and git blame shows that it was me... Often it's the indication that the doc needs to be reworked, and sometimes it even leads to changes. In such cases you easily imagine how other people might have faced serious difficulties with that :-/

POV Maintainer

Posted Sep 4, 2024 20:35 UTC (Wed) by airlied (subscriber, #9104) [Link] (3 responses)

Not sure I want to wade or weigh in here,

I don't think the scheduler is what is blocking merging asahi, there are a lot of bindings between asahi and upstream. All those bindings need careful consideration and discussions with maintainers. Lina was taking some time off and wasn't available to to initiate and partake in all the various discussions on bindings before we even got to the drm layers.

This is where I stepped in last year and Danilo is doing that work, but as shown it's slow and very hard to iterate. What Lina misses is that building up trust with maintainers is actually far more important than the working code. The code is secondary in Linux to trust relationships. Once a maintainer can trust the person working on the code will listen to their concerns and expend time showing them why rust is valuable, I expect things will start to move faster.

This can mean having to help each maintainer learn enough rust concepts to make them feel comfortable, I've had many meetings with Danilo on how to approach different maintainers and how to adopt the approach with them. There is no one size fits all and there is no fast path just because code is written in rust.

I'm glad Lina is back working on asahi and hope once we get more bindings upstream and we can work on the actual upstreaming of the driver itself. Lina's code is the reason the idea of nova even exists, and I've even taken the firmware versions handling work as the basis for nova, and rust is so much better for this.

The scheduler discussion will get resolved through a few different things.

1. The scheduler in rust idea might be workable, and I'll probably be interested in merging that.
2. I've asked some others to invest time in the scheduler and see can we help resolve the C side problems in advance in existing drivers and avoid any fallout in amdgpu etc, but also so we can understand the impedance mismatches between the rust lifetime requirements and the C lifetime, and if they are rust lifetime or asahi lifetimes etc. Christian was abrupt and has been asked to be less NAKy in future, and more documentation minded. and I think we've seen some changes, but there is always a journey to bring people on.

Linus has always said something along the lines of, you won't be able to work with everyone in the kernel, there will always be differences, and the trick is to find the people you can work with, and find intermediaries to work with the ones you can't. This is one of those cases where I felt intermediaries can unblock things faster than banging heads and talking past each other.

If all the other bindings get merged and we can't resolve the scheduler thing quick, I'll be surprised.

POV Maintainer

Posted Sep 5, 2024 8:50 UTC (Thu) by sima (subscriber, #160698) [Link] (2 responses)

Yeah same from me, I don't think the drm scheduler is going to be a real hold-up. Things haven't moved forward visibly on that because there's so much other things that are needed first before you can have a device driver in rust in upstream. What we did work on is moving drm scheduler discussions away from NAK to improving the documentations so that NAKs can be replaced by documentation links. There's unfortunately still a long road to go there, with a lot of the design assumptions nowhere to be found outside of very few, too few, maintainer's heads.

POV Maintainer

Posted Sep 5, 2024 20:30 UTC (Thu) by riking (subscriber, #95706) [Link]

Well, it's not going to be a holdup because it's just going to be ignored and bypassed for a custom scheduler that doesn't have UAF bugs.

POV Maintainer

Posted Sep 6, 2024 5:53 UTC (Fri) by Tobu (subscriber, #24111) [Link]

Asking maintainers to explain a refusal is at least a productive outcome. Seeing that most of Lina's patch was focused on documenting drm_sched assumptions, I wish her well in doing without the parts of the rendering subsystem that have been resistant to making their invariants known. She's been super productive, the driver is stable as others aren't, the upstreaming will get done because stable drivers developed this fast are too good to pass up. In order to enable this for future drivers, either an abstraction can be wrapped safely or it can be treated as legacy.

Psychology of change

Posted Sep 5, 2024 10:06 UTC (Thu) by RX14 (subscriber, #123970) [Link] (1 responses)

The psychology behind the recent discussions is quite interesting to me. It's one of those changes I believe most everyone will be happy with after it happens, but will cause large amounts of churn until it does (or unlikely, fails).

A more centralised project could employ a change manager to help with the change going on here. They are a kind of psychologist who specialises in how to provide for the needs of people who are experiencing change. They might suggest a series of sessions to provide maintainers some basic training or knowledge about how Rust might impact them, to support them and make them feel included and supported in the change. The goal would be to let them feel in control and capable of reviewing rust changes when they affect C code, instead of that changes they don't understand are being forced on them.

Unfortunately, Linux as a decentralised project would find it extraordinarily difficult to provide funding for these resources and get turnout in sessions. Its one of the unfortunate and unavoidable trade-offs in how Linux is run, that these changes involve unnecessary violence and churn.

The range of responses of existing Linux maintainers to Rust (from extreme support to distrust) are all common human responses to change, ones that you and your friends are liable to have too. Luckily there are techniques to reduce the pain. Shaming them is not one of them, it's just likely to make an even more divisive environment, more prone to bad outcomes.

Psychology of change

Posted Sep 6, 2024 8:19 UTC (Fri) by wtarreau (subscriber, #51152) [Link]

> The range of responses of existing Linux maintainers to Rust (from extreme support to distrust) are all common human responses to change, ones that you and your friends are liable to have too. Luckily there are techniques to reduce the pain. Shaming them is not one of them, it's just likely to make an even more divisive environment, more prone to bad outcomes.

Well said!

Trusting each other is super important in change decision, and one must absolutely listen when the other one expresses difficulties. "Your api is too difficult for me" is a valid concern, just like "it's too difficult for me to evaluate the risk of breakage introduced by your proposed API change". The outcome should not be "change YOUR side" (from either one), but "now that we agree that it's too difficult for both of us to fully adapt to the other one, let's see what other solutions we have to go forward".

Governance

Posted Sep 5, 2024 12:58 UTC (Thu) by jsakkine (subscriber, #80603) [Link] (10 responses)

Just to have a different view point than "job security", this thing called governance could also matter to some. Rust-Linux is not GCC-compatible, and no kernel maintainer would purposely introduce a toolchain lock-in. Google and Microsoft really should put some funding in https://github.com/Rust-GCC/gccrs development. I blame the main corporate actors for having an unprofessional attitude rather than individual contributors or maintainers.

Still Microsoft has cash to spend money on ext2 driver, which we don't need because we already have a stable and mature driver for that. That cash would be better spent on GCC, which would actually help more with the acceptance than any imaginable kernel feature.

Governance

Posted Sep 5, 2024 13:57 UTC (Thu) by jsakkine (subscriber, #80603) [Link] (3 responses)

I was unaware of this: https://rust-for-linux.com/gccrs

"While gccrs is not yet able to compile Rust code, it is progressing fast — we are hoping we will be able to compile the Rust 1.49 standard library by the next GCC release, 14.1."

That would be huge.

Governance

Posted Sep 5, 2024 15:48 UTC (Thu) by khim (subscriber, #9252) [Link] (2 responses)

GCC 14.1 was released four months ago and these guys haven't even found time to update that page? Doesn't inspire much confidence, if you'll ask me.

Governance

Posted Sep 5, 2024 18:28 UTC (Thu) by acarno (subscriber, #123476) [Link] (1 responses)

From https://rust-gcc.github.io/2024/09/03/2024-08-monthly-report.html:

> Finally, we would like to be more public about gccrs’s progress. In some recent comments surrounding
> the Rust-for-Linux project, we have noted that some internauts believe that the work on the compiler
> has slowed down, despite our regular upstreaming and progress reports. We think this might be due to
> a lack of activity on social media and forums such as Reddit or HackerNews, and will start posting
> our monthly updates there again. We will also be coming out with a few blogposts in the coming weeks,
> namely around the progress achieved by our GSoC students and how it will impact users.

Governance

Posted Sep 5, 2024 20:10 UTC (Thu) by khim (subscriber, #9252) [Link]

I wonder why they think it's lack of social networks presence that makes them so low-key and causes “Meh” reaction.

They publish lots of words and tables and numbers… which fail to give me any useful information whatsoever.

In particular they wrote these exciting words, previously: “we are hoping we will be able to compile the Rust 1.49 standard library by the next GCC release, 14.1”. And that, obviously, haven't happened. That's fine, Rust is complicated language and it's hard to predict the future, but the question that I want to know, if that haven't happened, is not what do they plan to deliver September 15th, 2025 (something that this document does say) but when can I expect that publicly promised compilation of Rust 1.49 standard library!

That's something that this document doesn't really say and doesn't even give me any rough estimate: would that happen in next couple of months, in GCC 15.1, or maybe in GCC 20.1?

Who knows, that's basically not possible to estimate from these tables! There are just nothing to be excited about if you are not part of their team!

You may point out that these are very precise tables and dates like January, 28th 2025 or August 15th, 2023 should inspire confidence… except I couldn't see any missed milestones there and yet I still couldn't compile std… how could that be? Were these promised to make std compileable a lie or was there a misunderstanding… or maybe that achievment was actually done on time, but was never added to GCC 14.1? What have happened?

No answer.

Governance

Posted Sep 5, 2024 14:55 UTC (Thu) by MrWim (subscriber, #47432) [Link]

rustc_codegen_gcc has been able to build Rust for Linux for about a year now:

https://blog.antoyo.xyz/rustc_codegen_gcc-progress-report-26

Governance

Posted Sep 5, 2024 15:44 UTC (Thu) by khim (subscriber, #9252) [Link]

> Rust-Linux is not GCC-compatible, and no kernel maintainer would purposely introduce a toolchain lock-in.

Care to expand on that? I mean: they were all too happy with doing that for years (some distributions even had to carry special kgcc package just to compile kernel because kernel was practizing “a toolchain lock-in” routinely. And even few years ago they were all to happy to break clang with no wiggling room whatsoever.

What have changed these days to do such an abrupt 180 degree turnaround?

> Google and Microsoft really should put some funding in https://github.com/Rust-GCC/gccrs development.

Why should they? Both build their kernels with clang AFAIK. If someone wants to have a different toolchain (because they wedded themselves with gcc plugins or something) then these people have to fund that toolchain development.

> That cash would be better spent on GCC, which would actually help more with the acceptance than any imaginable kernel feature.

I seriously doubt it. Most screams about how GCC support is vital for success of Rust-for-Linux project come from people who are not maintainers and are not even prolific kernel contributors.

If I'm wrong then it should be easy for you to give us links which tell us something different.

The most serious and sensible objection was from M68K guys, but today both clang and Rust support M68K thus it's not clear what GCC support should give us.

Governance

Posted Sep 5, 2024 16:21 UTC (Thu) by intelfx (subscriber, #130118) [Link] (1 responses)

> Rust-Linux is not GCC-compatible, and no kernel maintainer would purposely introduce a toolchain lock-in

So let me clarify, "a toolchain lock-in" was OK for some two decades — until the very same corporate actors stepped in and made kernel capable of being built with Clang — and now it's suddenly not OK?

Sorry, I don't buy this argument. Not at all.

Governance

Posted Sep 5, 2024 18:25 UTC (Thu) by airlied (subscriber, #9104) [Link]

Lols I posted this a day ago.

https://fosstodon.org/@airlied/113080989891431087

"the same folks that pushed backed against clang will be the same ones that complain there is only one rust compiler and we should wait for gcc-rs because only having one compiler is risky 🙂"

Governance

Posted Sep 5, 2024 19:24 UTC (Thu) by atnot (subscriber, #124910) [Link] (1 responses)

> Still Microsoft has cash to spend money on ext2 driver, which we don't need because we already have a stable and mature driver for that

The context is important here. This idea didn't just come out of nowhere. It is, like the other reimplementations, a request of the respective subsystem maintainers. Because every time some bindings get posted for something, the respective subsystem maintainers ask that a "real, complex driver" get written with those bindings to show it's even possible to write them in Rust. This by my count, at least the fourth such effort (gpio driver, ethernet driver, nvme driver, null block device, ext2).
Now, the first few of these were definitely necessary as a proof of concept and were very enlightening. However after the fourth one I do kind of find this practice of ritual waste of engineering effort to prove your commitment a bit of a concerning trend. Especially now that complex, novel drivers like AGX already exist. It makes me wonder what will be demanded by the next subsystem.

Rust reimplementations

Posted Sep 6, 2024 10:31 UTC (Fri) by farnz (subscriber, #17727) [Link]

I do, personally, see the value in a Rust reimplementation of a representative entity in each class of code the kernel supports; it's a great way to validate that the Rust bindings cover everything you need to implement something of representative complexity levels, and that they're performant and have no significant usability bugs.

So, GPIO device shows that GPIO device drivers can be written in Rust; Ethernet driver shows that netdevs can be written in Rust; NVMe driver shows that block device drivers using PCIe can be written in Rust; ext2 shows that you can write a filesystem in Rust.

And I expect to see similar as other areas get Rust support - there will often be a component rewritten in Rust to show that the bindings are good enough to support new work, so that you can separate out "this is not good because the Rust bindings are bad" from "this is not good because the design of the new work is not good in any language".

Rust saga to end?

Posted Sep 10, 2024 12:26 UTC (Tue) by andy_shev (subscriber, #75870) [Link] (19 responses)

Pardon me, but the whole Rust saga looks like a fight against "When you go to Rome you must do as the Romans do." This is a dead end. Why all this good energy can't be simply focused to create a nice shiny kernel for whatever OS you call it ("Rustix" ?) and forget about workarounds (such as "unsafe" pointers) and other alien to Rust things? I really do not understand.

Rust saga to end?

Posted Sep 10, 2024 13:13 UTC (Tue) by corbet (editor, #1) [Link] (12 responses)

While not one of the principals involved, I believe they might say something like:

This particular instantiation of Rome has always evolved toward better processes. Once upon a time, the adoption of a source-code management system was controversial. We got over that, and changed the world while we were at it. Once upon a time, a nine-week development cycle was inconceivable, but then the processes evolved and nine-week cycles are boringly normal. Some people, seemingly, see the use of a safer language as equally inconceivable; others think that the community can continue to evolve.

I might also point out that there are a lot of native-Rust crates that use unsafe; it is there for a reason, and not just for integration with the Linux kernel.

Rust saga to end?

Posted Sep 10, 2024 14:15 UTC (Tue) by Wol (subscriber, #4433) [Link] (3 responses)

> I might also point out that there are a lot of native-Rust crates that use unsafe; it is there for a reason, and not just for integration with the Linux kernel.

I might also point out that Charles Dodgson was a very accomplished logician, and as someone else pointed out "The Red Queen would quite happily imagine ten impossible things before breakfast".

"Unsafe" is there for when reality engages in behaviour the theoreticians have declared "illogical and impossible". Which it does all the time ...

Languages which deny this reality tend to be toys, have training wheels, or just get abandoned. Cf Pascal ... (modern versions of which have ditched the "theoretical purity" to try and remain relevant).

Cheers,
Wol

Rust saga to end?

Posted Sep 10, 2024 14:51 UTC (Tue) by excors (subscriber, #95769) [Link]

> "Unsafe" is there for when reality engages in behaviour the theoreticians have declared "illogical and impossible". Which it does all the time ...

No it's not. `unsafe` is for code which follows all the correctness rules but the compiler can't prove that, so the programmer and code review process are responsible for ensuring its correctness. Theoreticians are perfectly happy with the incompleteness of proof systems.

Use of unsafe in Rust

Posted Sep 10, 2024 16:26 UTC (Tue) by farnz (subscriber, #17727) [Link] (1 responses)

Unsafe is not there for cases where reality engages in behaviour the theoreticians have declared "illogical and impossible"; rather, it's there for cases where the compiler cannot prove a given behaviour correct without unreasonable behaviour from the compiler.

For example, to dereference a pointer, you need it to not be null and to point to a valid place. In theory, the compiler could track all the places where you create and modify pointers in the entire program (not just the current file or library, but the whole program as it will exist at runtime), and be confident that pointers aren't being misused, but in practice that's beyond a reasonable compiler. So, instead, we punt that proof to the programmer, and they use "unsafe" to indicate that they've thought about how the pointer gets here, and they know it's not null, and that it points to a valid place.

Some uses of "unsafe" are easy to justify that way (e.g. the use in the implementation details of split_at_checked), others involve much more tracking of soundness requirements around the entire program. But the point of "unsafe" is not that this is "illogical and impossible" (which theoreticians abbreviate to "unsound"), but that it's too hard for the compiler to prove it sound, and therefore we're going to punt the proof job to the programmer. And that's also why the language has extra ceremony ("unsafe blocks") around calling things labelled unsafe; we want you to stick, if you can, to the parts of the language where the compiler can prove your code sound, because it's much easier to go from sound code to working code than from unsound code, to sound code, to working code.

Use of unsafe in Rust

Posted Sep 22, 2024 18:58 UTC (Sun) by Rudd-O (guest, #61155) [Link]

> Unsafe is not there for cases where reality engages in behaviour the theoreticians have declared "illogical and impossible

I am almost 100% sure that was supposed to be comedy.

Rust saga to end?

Posted Sep 10, 2024 15:01 UTC (Tue) by yeltsin (guest, #171611) [Link] (7 responses)

Linus himself says just as much: https://linuxunplugged.com/578?t=535

I'd like to provide the original video, but I'm not sure which one they extracted the audio from, and couldn't find it easily.

Another reason is to attract the younger crowd, which is less interested in writing C: https://linuxunplugged.com/578?t=316

Rust saga to end?

Posted Sep 10, 2024 20:47 UTC (Tue) by andy_shev (subscriber, #75870) [Link] (6 responses)

What Linus said is the position "why Rust is good for the Linux kernel project". What I'm questioning is the "why Rust started with Linux kernel project".

Rust saga to end?

Posted Sep 10, 2024 20:58 UTC (Tue) by farnz (subscriber, #17727) [Link] (5 responses)

IIRC, Redox started before the Rust-for-Linux project (Redox starts before 2017, Rust-for-Linux is 2020 as far as I can find), and is what "Rust" started with in terms of OSes. Rust-for-Linux comes about because Redox shows that you can build an OS in Rust, and Linus wants to get the benefits Redox gets from being written in Rust for his project.

Rust saga to end?

Posted Sep 22, 2024 20:13 UTC (Sun) by andy_shev (subscriber, #75870) [Link] (4 responses)

Showing the possibility doesn't mean a production-ready solution. So, why Redox is so inactive, if Rust fellow developers trying hard to make it part of Linux kernel instead of investing more into pure Rust OS?

Rust saga to end?

Posted Sep 23, 2024 8:47 UTC (Mon) by farnz (subscriber, #17727) [Link] (3 responses)

Redox OS is not inactive; it's under quite heavy development. The people pushing Rust into the Linux kernel are not people who want a pure Rust OS; they're people who want to work on Linux, but who find that C is a wholly inadequate language for expressing solutions to hard problems compared to languages they've used before, and who think that Rust is appropriate for kernel use.

This is much like asking why people worked on Linux in the early days, when 386BSD was there and in much better state than Linux 0.x; it's a different group of people, with different objectives.

Rust saga to end?

Posted Sep 24, 2024 20:06 UTC (Tue) by andy_shev (subscriber, #75870) [Link] (2 responses)

Ah, that makes sense! And which of those two groups has most influence in how Rust as a language is being developed?

Rust saga to end?

Posted Sep 24, 2024 20:13 UTC (Tue) by farnz (subscriber, #17727) [Link] (1 responses)

Neither; Rust is sufficiently mature that both groups have similar influence on the future of the language.

Note that Rust is not a particularly new language at this point; stable Rust is 9 years old, whereas the first draft of ANSI C was only 6 years old at the point Linux was first released. There's a lot of groups influencing the future of the language, with the intent that it's a usable language for any domain in which you'd consider C or C++.

Rust saga to end?

Posted Sep 24, 2024 20:16 UTC (Tue) by daroc (editor, #160859) [Link]

In practical terms, however, the Rust project has made supporting the Rust-for-Linux project one of their main goals for 2024.

Rust saga to end?

Posted Sep 14, 2024 15:29 UTC (Sat) by sunshowers (guest, #170655) [Link]

Many great engineers care about their actions having a real impact on the world.

Rust saga to end?

Posted Sep 22, 2024 18:57 UTC (Sun) by Rudd-O (guest, #61155) [Link] (4 responses)

"Unsafe" is funny.

Unsafe is a funny keyword because the meaning of an unsafe block in Rust is actually the opposite of what you think it might be. I have seen it described as a promise to the code that's outside of the unsafe block that what you're doing inside of this block is safe for everybody else. Sort of like, trust me bro.

If the code that is within the unsafe block is correct and it does what it is supposed to do, then anyone calling that code or using that code will not encounter undefined behavior, or unexpected crashes, or other problems.

Aside from that, you are allowed to do certain things within an unsafe block that would not be allowed outside of those blocks. You can't just do whatever the hell you want inside of those blocks, of course. There are still rules. They're just somewhat relaxed. For 99.9% of the things that you write, you absolutely do not need to resort to an unsafe block. You may unwittingly be calling code that has on safe blocks within it, but you yourself are not actually writing on safe code. Precisely because those on safe blocks have promised the outside world, the colors, that everything is safe. Trust me bro!

There are things that are just not possible to do in Rust without those unsafe blocks. For example, your olde C style linked lists, or certain hardware / memory twiddling bits, or (I think) thunking into C libraries. For everything else, you just use plain rust and you get all the guarantees that the language offers you.

Too long didn't read version. Unsafe code writers are putting everything on the line to make sure that everybody else around them is safe.

Rust saga to end?

Posted Sep 22, 2024 20:11 UTC (Sun) by andy_shev (subscriber, #75870) [Link] (3 responses)

Hold on, are you telling that Rust has no means to talk to hardware and always need a C shim doing that? If it's true, how even it's possible to write an OS in Rust to begin with?!

Rust saga to end?

Posted Sep 22, 2024 20:16 UTC (Sun) by mb (subscriber, #50428) [Link]

>Rust has no means to talk to hardware and always need a C shim doing that?

No.

https://doc.rust-lang.org/std/ptr/fn.write_volatile.html

Rust saga to end?

Posted Sep 25, 2024 9:29 UTC (Wed) by Rudd-O (guest, #61155) [Link]

> Hold on, are you telling that Rust has no means to talk to hardware and always need a C shim doing that?

I'm curious to find out how you deduced that I had claimed this.

> If it's true, how even it's possible to write an OS in Rust to begin with?!

Redox, Hubris, Rust on Linux,... Eppore sic muove.

Rust saga to end?

Posted Oct 2, 2024 13:17 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

Well, if you want to be pedantic, no general purpose language has "flush the TLB" or "enter VM" instructions either. Both Rust and C use inline assembly or ABI contracts with full assembly files to access such functionality.

Doesn't seem that bad..

Posted Sep 14, 2024 8:04 UTC (Sat) by ssmith32 (subscriber, #72404) [Link] (2 responses)

It was interesting to read a more detailed analysis of this, but, in the end, I still hold to my first take on the situation: it doesn't seem so bad that Lina decided to write a Rust scheduler implementation. Sounds like there are other people interested in using it.

And, sure, Christian was a little abrupt, but that's not necessarily a job security thing. I've certainly been in situations where I maintained a common component that was widely used, and rejected changes to make it part of a larger (better) different thing, because I wanted to keep it focused. But, simultaneously, been supportive of an effort to replace said component with a ground up rewrite supporting a larger vision.

I would wait to see if he fights the Rust reimplementation before jumping to conclusions.

And even the snide, snarky comments in this thread "I can see why you're...", etc.. well, a little obnoxious, but certainly, we're doing a lot better about being our slightly better selves than in the systemd days...

So, yes, we all need to keep getting better, but, hey, we've still come a long way.

Thanks Lina!

Doesn't seem that bad..

Posted Sep 14, 2024 9:30 UTC (Sat) by Wol (subscriber, #4433) [Link] (1 responses)

> And, sure, Christian was a little abrupt, but that's not necessarily a job security thing. I've certainly been in situations where I maintained a common component that was widely used, and rejected changes to make it part of a larger (better) different thing, because I wanted to keep it focused. But, simultaneously, been supportive of an effort to replace said component with a ground up rewrite supporting a larger vision.

I'm also left with the clear impression that the old scheduler is a square solution, while Lina's driver is a round problem, so maybe even trying to use the old scheduler was a wrong decision by Lina.

If that's the case, she won't even be writing a replacement, she'll be writing a round scheduler, and all those jobs that require a square scheduler just carry on using the old one. What's the betting half the spaghetti in the old driver is trying to cope with round problems?

Cheers,
Wol

Doesn't seem that bad..

Posted Sep 22, 2024 19:00 UTC (Sun) by Rudd-O (guest, #61155) [Link]

All of that, combined with the fact that other people have expressed interest in using the new scheduler that might be written in Rust, seems like a win-win for everybody. I know I have had OOPSes in the DRM layer in my machines across the years. If I could benefit from Asahi's work, hell yeah let's go!

Whither the Apple AGX graphics driver?

DRM scheduling and Rust

Diverging goals

C Job Security

C Job Security

C Job Security

C Job Security

C Job Security

C Job Security

C Job Security

C Job Security

C Job Security

Changes to drm_sched to make it safe

Changes to drm_sched to make it safe

Changes to drm_sched to make it safe

Changes to drm_sched to make it safe

Changes to drm_sched to make it safe

Changes to drm_sched to make it safe

Changes to drm_sched to make it safe

C Job Security

C Job Security

‘C Job Security’ OMGRUSRS?

Supplicating Linus

Supplicating Linus

Supplicating Linus

Bad design?

Bad design?

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Slow down a little

Slow down a little

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

The borrow checker as a significant advance

The borrow checker as a significant advance

The borrow checker as a significant advance

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Thoughts and clarifications

Qt major version upgrades

Thoughts and clarifications

Thoughts and clarifications