An update on Apple M1/M2 GPU drivers
The kernel graphics driver for the Apple M1 and M2 GPUs is, rather famously, written in Rust, but it has achieved conformance with various graphics standards, which is also noteworthy. At the X.Org Developers Conference (XDC) 2024, Alyssa Rosenzweig gave an update on the status of the driver, along with some news about the kinds of games it can support (YouTube video, slides). There has been lots of progress since her talk at XDC last year (YouTube video), with, of course, still more to come.
It is something of an XDC tradition, since she began it in Montreal in 2019
(YouTube video),
for Rosenzweig to give her presentations dressed like a witch.
This year's edition was no exception, though this time she started her talk in
French, which resulted in some nervous chuckles from attendees. After a few
sentences, she switched to English, "I'm just kidding
", and
continued with her talk.
Updates and tessellation
Last year at XDC, she and Asahi Lina reported that the driver had reached
OpenGL ES 3.1
conformance. They also talked about geometry
shaders, because "that was the next step
". Since then, the
driver has become OpenGL 4.6
conformant. That meant she was going to turn to talking about tessellation
shaders, "as I threatened to do at the end of last year's talk
".
Tessellation,
which is a technique that "allows detail to be dynamically added and
subtracted
" from a scene, is required for OpenGL 4.0, and there is
a hardware tessellator on the Apple GPU—but, "we can't use it
". The
hardware is too limited to implement any of the standards; "it is
missing features that are hard required for OpenGL, Vulkan, and Direct3D
". That
makes it "pretty much useless to anybody who is not implementing Metal
". Apple
supports OpenGL 4.1, though it is not conformant, but if you use any
of the features that the hardware does not support, it simply falls back to
software; "we are not going to do that
".
![Alyssa Rosenzweig [Alyssa Rosenzweig]](https://static.lwn.net/images/2024/xdc-rosenzweig-sm.png)
As far as Rosenzweig is aware, the hardware lacks point mode, where points
are used instead of the usual triangles; it also lacks isoline
support, but those two things can be emulated. The real problem comes
with transform
feedback and geometry shaders, neither of which is supported by the
hardware, but the driver emulates them with compute
shaders. However, the hardware tessellator cannot be used at all when
those are being emulated because minute differences in the tessellation algorithms used
by the hardware and the emulation would result in invariance
failures. She is not sure whether that is a problem in practice or not,
"but the spec says not to do it
", so she is hoping not to have to go
that route.
Instead, "we use software
". In particular, Microsoft released a
reference tessellator a decade or more ago, which was meant to show
hardware vendors what they were supposed to implement when tessellation was
first introduced. It is "a giant pile of 2000 lines of C++
" that
she does not understand, despite trying multiple times; "it is
inscrutable
". The code will tessellate a single patch, which gave the
driver developers an idea: "if we can run that code, we can get the
tessellation outputs and then we can just draw the triangles or the lines
with this index buffer
".
There are some problems with that approach, however, starting with the fact
that the developers are writing a GPU driver; "famously, GPUs do not like running
2000 lines of C++
". But, she announced, "we have conformant OpenCL 3.0 support
"
thanks to Karol Herbst, though it has not yet been released. OpenCL C is
"pretty much the same as regular CPU C
", though it has a few
limitations and some extensions for GPUs. So the idea would be to turn the
C++ tessellation code into OpenCL C code; "we don't have to understand
any of it, we just need to not break anything when we do the port
".
That works, but "tessellator.cl is the most unhinged file of my
career
"; doing things that way was also the most unhinged thing she has done in her career
"and I'm standing up here in a witch hat for the fifth year in a
row
". The character debuted in the exact same room in 2019 when she
was 17 years old, she recalled.
The CPU tessellator only operates on a single patch at a time, but a scene
might have 10,000 patches—doing them all serially will be a real problem.
GPUs are massively parallel, though, so having multiple threads each doing
tessellation is "pretty easy to arrange
". There is a problem with
memory allocation; the CPU tessellator just allocates for each operation
sequentially, but that will not work for parallel processing. Instead, the
driver uses the GPU atomic instructions to manage the allocation of output buffers.
In order to draw the output of the tessellators, though, there is a need to
use draw instructions with packed data structures as specified by the GPU.
That is normally done from the C driver code using functions that are generated
by the GenXML tool. Since the tessellators are simply C code,
"thanks to OpenCL
", the generated functions can be included into the
code that runs on the GPU. Rosenzweig went into more detail, which fills
in the holes (and likely inaccuracies) of the above description; those
interested in the details should look at the presentation video and her
slides.
"Does it work? Yes, it does.
"
She showed an image of terrain tessellation from a Vulkan demo. It was run
on an M2 Mac with "entirely OpenCL-based tessellation
". There is also
the question of "how is the performance of this abomination?
" The
answer is that it is "okay
". On the system, software-only terrain tessellation runs at
less than one frame-per-second (fps), which "is not very fun for playing
games
"; for OpenCL, it runs at 265fps, which is "pretty good
"
and is unlikely to be the bottleneck for real games. The hardware
can do
820fps; "I did wire up the hardware tessellator just to get a number for
this talk.
" There is still room for improvement on the driver's
numbers, she said.
Vulkan and gaming
She also announced Vulkan 1.3 conformance for the Honeykrisp M1/M2 GPU
driver. It started
by copying the NVK
Vulkan driver for NVIDIA GPUs, "smashed against the [Open]GL 4.6
[driver]
", which started passing the conformance test suite
"in about a month
". That was six months ago and, since then, she
has added geometry and tessellation shaders, transform feedback, and shader
objects. The driver now supports every feature needed for multiple
DirectX versions.
There are a lot of problems "if we want to run triple-A (AAA) games on
this system
", however. A target game runs on DirectX and Windows on an x86 CPU with
4KB pages, but "our target hardware is running literally none of those
things
". What is needed is to somehow translate DirectX to Vulkan,
Windows to Linux, x86 to Arm64, and 4KB pages to 16KB pages. The first two
have a well-known solution in the form of the DXVK driver and Wine, which are "generally packaged
into Proton
for Steam gaming
".
Going from x86 to Arm64 also has off-the-shelf solutions: FEX-Emu or Box64. She has a bias toward FEX-Emu;
"when I am not committing Mesa
crimes, I am committing FEX-Emulation crimes
". The big problem,
though, is the page-size difference.
FEX-Emu requires 4KB pages; Box64 has a "hack to use 16KB pages, but it
doesn't work for Wine, so it doesn't help us here
". MacOS can use 4KB
pages for the x86 emulation, but "this requires very invasive kernel
support
"; Asahi Linux already has around 1000 patches that are making
their way toward the mainline kernel, but "every one of those thousand
is a challenge
". Making changes like "rewriting the Linux memory
manager
" is not a reasonable path.
It turns out that, even though Linux does not support heterogeneous page
sizes between different processes, it does support them between different
kernels; "what I mean by that is virtualization
". A KVM guest
kernel can have a different page size than the host kernel. So, "this
entire mess
", consisting of FEX-Emu, Wine, DXVK, Honeykrisp, Steam, and
the game, "we are going to throw that into a virtual machine, which is
running a 4KB guest kernel
".
There is some overhead, of course, but it is hardware virtualization, so
that should have low CPU overhead. The problem lies with the peripherals,
she said. So, instead of having Honeykrisp in the host kernel, it runs in
the guest using virtgpu
native contexts; all of the work to create the final GPU command buffer is done
in the guest and handed to the host, rather than making all of the Vulkan
calls individually traverse the virtual-machine boundary. The VirGL renderer on the
host then hands that to the GPU, which "is not 100% native speed, but
definitely well above 90%
", Rosenzweig said.
The good news is that the overheads for the CPU and GPU do not stack, since
the two run in parallel. "So all the crap overhead we have in the CPU is
actually crap that is running in parallel to the crap overhead on the GPU,
so we only pay the cost once.
"
"'Does it work?' is the question you all want to know.
" It does,
she said, it runs
games like Portal and Portal 2. She also listed a number of
others: Castle Crashers,
The
Witcher 3, Fallout 4, Control, Ghostrunner, and Cyberpunk 2077.
All of the different pieces that she mentioned were made available on October 10, the day of the talk. For those running the Fedora Asahi Remix distribution, she suggested immediately updating to pick up the pieces that she had described. Before taking questions, she launched Steam, which took some time to come up, in part because of the virtual machine and the x86 emulation. Once it came up, she launched Control, which ran at 45fps on an M1 MAX system.
There was a question about resources from someone who has a Mac with 8GB of RAM. Rosenzweig said that the high-end gaming titles are only likely to work on systems with 16GB or more. She noted that she was playing Castle Crashers on an 8GB system during the conference, so some games will play; Portal will also work on that system. She hopes that the resources required will drop over time.
Another question was about ray-tracing
support, since Control can use that feature.
Rosenzweig suggested that patches were welcome but that she did not see
that as a high priority ("frankly, I think ray tracing is a bit of a
gimmick feature
"). Apple hardware only supports it with the M3 and
the current driver is for M1 and M2 GPUs, though she plans to start working
on M3 before long. The session concluded soon after that, though
Rosenzweig played Control, admittedly poorly, as time ran down.
[ I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to Montreal for XDC. ]
Index entries for this article | |
---|---|
Conference | X.Org Developers Conference/2024 |
Posted Oct 30, 2024 20:59 UTC (Wed)
by titaniumtown (subscriber, #163761)
[Link]
Posted Oct 30, 2024 20:59 UTC (Wed)
by matp75 (subscriber, #45699)
[Link]
Posted Oct 30, 2024 22:49 UTC (Wed)
by Paf (subscriber, #91811)
[Link]
Hah!
From the outside, this looks really impressive, and it’s presented with great style.
Posted Oct 31, 2024 3:13 UTC (Thu)
by himi (subscriber, #340)
[Link] (5 responses)
It's always amazing seeing kids doing this sort of thing. I'm old enough to remember seeing Rasterman making everyone else feel old back in the day (even though I'm a few years younger than him, he still made me feel old) - it's nice to see that in this aspect at least, the world of 2024 is as exciting and hopeful as the world of 1999.
Posted Oct 31, 2024 17:02 UTC (Thu)
by Lennie (subscriber, #49641)
[Link] (4 responses)
Posted Oct 31, 2024 22:19 UTC (Thu)
by himi (subscriber, #340)
[Link]
Posted Nov 1, 2024 0:05 UTC (Fri)
by lockecole2 (✭ supporter ✭, #63710)
[Link] (2 responses)
https://web.archive.org/web/20210217130206/https://asahil...
Posted Nov 2, 2024 11:41 UTC (Sat)
by Lennie (subscriber, #49641)
[Link] (1 responses)
Posted Nov 2, 2024 22:27 UTC (Sat)
by Lonjil (guest, #152573)
[Link]
Posted Oct 31, 2024 4:29 UTC (Thu)
by flewellyn (subscriber, #5047)
[Link] (5 responses)
I admit to some curiosity: why the witch hat? Has Ms Rosenzweig ever explained this?
Posted Oct 31, 2024 9:12 UTC (Thu)
by ballombe (subscriber, #9523)
[Link] (1 responses)
Posted Oct 31, 2024 10:59 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
Cheers,
Posted Oct 31, 2024 15:35 UTC (Thu)
by Paf (subscriber, #91811)
[Link]
Posted Nov 1, 2024 2:47 UTC (Fri)
by indrora (guest, #167938)
[Link]
in/re the hat... *snrk* he doesn't know about the witch hats (/s)
To quote her Mastodon profile: "Princesse-sorcière de Linux qui respecte la politique de l'OQLF" -- I'll let you put the parts together. A surprising number of kernel hackers working on fringe hardware (or just... different stuff) are trans folk and a *lot* of them are of the witchy persuasion.
Posted Nov 1, 2024 9:20 UTC (Fri)
by atnot (subscriber, #124910)
[Link]
You're a mysterious woman spending your days in a dim room doing mystical incantations with strange hardware people don't understand but are very impressed by, opening the news to see a new satanic panic, riots, hate crimes and politicians trying to ban your existence. You may as well get the fun hat to go with it.
Posted Oct 31, 2024 7:08 UTC (Thu)
by pbonzini (subscriber, #60935)
[Link]
Very nice!
Great !
Fun and impressive
I find it a little disturbing . . .
I find it a little disturbing . . .
I find it a little disturbing . . .
I find it a little disturbing . . .
I find it a little disturbing . . .
I find it a little disturbing . . .
Very impressive
Very impressive
Very impressive
Wol
Very impressive
Very impressive
Very impressive
libkrun