Changes to drm_sched to make it safe
Changes to drm_sched to make it safe
Posted Sep 21, 2024 11:43 UTC (Sat) by ras (subscriber, #33059)In reply to: Changes to drm_sched to make it safe by asahilina
Parent article: Whither the Apple AGX graphics driver?
Three comments stood out to me. The first is from Christian, the drm maintainer:
> I have seen at least halve a dozen approach in the last two years where people tried to signal a dma_fence from userspace or similar.
>
> Fortunately it was mostly prototyping and I could jump in early enough to stop that, but basically this is a fight against windmills.
Then this one, from an independent third party, possibly in response to the fact the panfrost had got dma_fence life times badly wrong (apparently a windmill sailed past unnoticed):
> I really think this needs to be documented if nothing else out of this thread.
>
> Clearly nobody is going to get it right and hidden here in this thread, this info isn't useful.
The last comment in the the thread comes from the maintainer, Christian:
> Ah shit, yeah of course. We use IRQ threads in amdgpu for the second interrupt ring as well.
>
> Ok, nail that coffin. Any other ideas how we could enforce this?
The significance of "IRQ threads" is the ban against signalling dma_fence's from userspace "or similar". "IRQ threads" fall under "so similar", apparently.
So it seems it's not just the Rust dev's who have a problem with the current API. Everyone does. The difference looks to be that Asahi tolerance for this state of affairs is so low she refuses to use it in it's current state. Perhaps the connection to Rust is it's borrow checker has made sloppy lifetime accounting an anathema to it's adherents.
Posted Sep 22, 2024 16:36 UTC (Sun)
by Rudd-O (guest, #61155)
[Link]
🤣 oh that question was so rich. I'm chuckling, sitting on my couch.
He's been presented with a giant 50kW laser with an automated turret to completely annihilate this class of API defect (the Rust type system), and when presented with this epic gift, he says NYAAAH.
Cue scroll of truth meme.
Asahi's "sin" was pointing out bad API design that a few noticed before, and a few are too cringely attached to. I understand that no driver needs to be completely rewritten if this problem is fixed. But if this problem needs to be fixed and that required the rewriting of a bunch of drivers, then that is exactly why it should happen. Over the years of existence of the DRM system, I have seen quite a few upses and other problems caused by it. And I have little doubt this is the sort of code quality issue that I've been bitten by.
W.r.t. the lifetime/ownership problem, rightfully raised by Asahi:
So, printk() needs to dereference the scheduler to print what, exactly, of importance? Easy solutions all around! Either cut out that printk() function call altogether, or make it not deref the scheduler (e.g. print a string referring to the scheduler name), or just use a weak reference which is easily modelable in any reasonable type system and there's a sane behavioral fallback in case the scheduler has died in between.
I completely understand why in some circumstances you may need to have circularity in ownership. But this specific case is exactly one of those cases, one of those many, many cases, where circularity is completely unjustified.
Years ago when I first heard of Rust, I was turned off to it by a few unsavory personalities in that community. I have been coding for two straight years in Rust now, and I have to say: it is absolutely wonderful; all the times before where I reached for C++ or Go or Python? Oh, how badly do I wish I had actually just used Rust back then.
If I was professionally engaged in writing a kernel driver for video cards or any other such device, I would probably just rip out the DRM, rewrite it in Rust, and then provide a C interface / API that is somewhat or mostly compatible with the current API for existing drivers, so I don't have to rewrite these other drivers, at least not totally. This is possible, and (I believe) Federico Mena Quintero has basically rewritten librsvg to be safe using these techniques (the GNOME planet carried his posts detailing these efforts). But, even if that was not doable, Asahi's call to just ignore the DRM subsystem and write her own workqueue-based implementation is the right call. Just route around the brain damage and keep going forward. Hopefully future drivers have based their own development on that work, rather than continuing to rely on a C API that is technically fixable, but (for human reasons) probably won't be anytime soon.
Posted Sep 22, 2024 16:39 UTC (Sun)
by Rudd-O (guest, #61155)
[Link]
I would wager this is exactly it, because that's exactly what has happened to my own brain.
Even now, when I must use something like Python, I follow the advice of a fascinating article (I'm not sure if it was echoed in this publication here) called Write Python Like It's Rust. That article has some solid advice of Rust patterns that you can use when you're writing a Python application, which definitely lead to improving the quality of your Python code.
Heck, just using types in Python, accompanied with a suitable IDE built-in type analyzer, has already helped me dodge hundreds of potential bugs.
Changes to drm_sched to make it safe
Changes to drm_sched to make it safe