Man Yue Mo: Gaining kernel code execution on an MTE-enabled Pixel 8

[Posted March 19, 2024 by corbet]

Man Yue Mo explains how to compromise a Pixel 8 phone even when the Arm memory-tagging extension is in use, by taking advantage of the Mali GPU.

So, by using the GPU to access physical addresses directly, I'm able to completely bypass the protection that MTE offers. Ultimately, there is no memory safe code in the code that manages memory accesses. At some point, physical addresses will have to be used directly to access memory.

Man Yue Mo: Gaining kernel code execution on an MTE-enabled Pixel 8

Posted Mar 20, 2024 2:40 UTC (Wed) by makendo (guest, #168314) [Link] (2 responses)

An alternative for system design is to use separate DRAM chips for CPU and GPU memory, which the majority of video-game-oriented desktops already do. Upside is that GPU can be made to only access its own memory so we don't have this difficult to patch ACE path; downside is the obvious overhead in copying data from host to VRAM which could be significant for mobile phones.

Man Yue Mo: Gaining kernel code execution on an MTE-enabled Pixel 8

Posted Mar 20, 2024 9:05 UTC (Wed) by excors (subscriber, #95769) [Link] (1 responses)

On gaming PCs, the GPU can still access host RAM directly over PCIe. E.g. Vulkan on AMD offers three memory heaps: one device-local (VRAM), one device-local and host-visible (~256MB of VRAM that's mapped into the host's physical address space), and one non-device-local (the whole of host RAM; "Can use as a fall-back when GPU device runs out of memory") (https://gpuopen.com/learn/vulkan-device-memory/). VRAM and RAM are physically distinct and have very different performance and caching behaviour, but otherwise they can be used largely interchangeably.

I think that means it is no more inherently secure than the fully-shared-memory model of mobile GPUs. Both rely on IOMMUs etc (and the drivers that configure them) to prevent one process using the GPU to read another process's memory or to bypass other protections.

(Modern game consoles also have fully shared memory. PCs with discrete GPUs are the outlier - I guess it's largely for historical reasons, where PC GPUs were originally add-on cards on a slow bus and therefore had to use their own GPU-local memory, and then system designers saw no need to optimise host RAM for GPU-like access patterns because anyone who cared about performance was already using discrete GPUs, so integrated GPUs were slow and bad, and that architecture persisted despite the major drawbacks.)

Man Yue Mo: Gaining kernel code execution on an MTE-enabled Pixel 8

Posted Mar 20, 2024 13:46 UTC (Wed) by epa (subscriber, #39769) [Link]

One quick and dirty fix might be that the GPU is only allowed to access a certain range of addresses, like the DMA restrictions of old. The operating system could arrange that nothing terribly important is kept in the lower two gigabytes of address space, which is where the GPU is allowed to access. This wouldn’t replace the need for an IOMMU with correctly written software but it might be an additional line of defense.