Heterogeneous memory management

By Jonathan Corbet
April 27, 2016

LSFMM 2016

The processor that one thinks of as "the" CPU is not the only processor on most systems; indeed, it is often not the fastest. Attached devices, first and foremost the graphics processor (GPU), have their own processors that can speed a number of computing tasks. They often have full access to system memory, but there are obvious challenges to sharing that memory completely between the CPU and other processors. The heterogeneous memory management (HMM) subsystem aims to make that sharing possible; Jérôme Glisse led a session on HMM for the memory-management track at the 2016 Linux Storage, Filesystem, and Memory-Management Summit.

The key feature of HMM, Jérôme said, is making it possible to mirror a process's address space within the attached processor. This should happen without the need to use a special allocator in user space. On the hardware side, there are a couple of technologies out there that make this mirroring easier. One is the PowerPC CAPI interface; another is the PASID mechanism for the PCI Express bus. On the software side, options are to either mirror the CPU's page table in the attached processor, or to migrate pages back and forth between CPU and device memory. Regardless of how this is done, the hope is to present the same API to user space.

We care about this, Jérôme said, because the hardware is out there now; he mentioned products from Mellanox and NVIDIA in particular. Drivers exist for this hardware which, he said, is expensive at the moment, but which will get cheaper later this year. If we don't provide a solution in the kernel, things will run much more slowly and will require the pinning of lots of memory. It will be necessary to add more memory-management unit (MMU) notifiers to device-driver code, which few see as desirable. OpenCL support will only be possible on integrated GPUs. In general, he said, it is better to support this capability in the kernel if possible.

The solution to these ills is the HMM patch set, which provides a simple driver API for memory-management tasks. It is able to mirror CPU page tables on the attached device, and to keep those page tables synchronized as things change on the CPU side. Pages can be migrated between the CPU and the device; a page that has been migrated away from the CPU is represented by a special type of swap entry — it looks like it has been paged out, in other words. HMM also handles DMA mappings for the attached device.

Andrew Morton noted that the patch set is "a ton of code," which always makes it harder to get merged. There was some talk of splitting the patch set into more palatable pieces; some of the code, evidently, is also useful for KVM virtualization. Andrew told Jérôme to take care to document who the potential users of this code are. Then, he said, "it's a matter of getting off our asses and reviewing the code." There might be trouble, he said, with the use of MMU notifiers, since Linus has been clear about his dislike of notifiers in the past.

Overall, though, no objections to the core model were expressed. The HMM code has been in development for some years; maybe it is finally getting closer to inclusion into the mainline kernel.

Index entries for this article
Kernel	Memory management/Heterogeneous memory management
Conference	Storage, Filesystem, and Memory-Management Summit/2016

Heterogeneous memory management

Posted Apr 28, 2016 17:29 UTC (Thu) by pbonzini (subscriber, #60935) [Link] (1 responses)

Can anyone who was present expand on how HMM would be useful for KVM? I am not sure of the connection, besides the fact that KVM uses the MMU notifier mechanism.

Heterogeneous memory management

Posted Apr 29, 2016 12:19 UTC (Fri) by glisse (guest, #44837) [Link]

So i talked with Andrea about that, right now there is couple cases where you get an mmu_notifier_invalidate for things that have no impact on KVM. For instance when splitting huge pmd, the memory an address point to will be the same so there is no need to invalidate anything in KVM side (i do not think KVM care if an address range is back by huge page or by 4k pte entries). Right now mmu_notifier does not provide enough informations to distinguish between those different cases, my patchset add enough informations to figure this kind of thing. It would be a minor optimization but it might help a bit on some workload.

Then there is the usecase of virtual device driver that want to do this from inside a virtual host. This is something that is further down the road.

Heterogeneous memory management

Posted Apr 28, 2016 21:14 UTC (Thu) by jnareb (subscriber, #46500) [Link]

I guess it would be useful for implementing Linux support for NVIDIA CUDA 8.0 managed memory (migrating automatically from CPU to discrete-GPU memory): https://devblogs.nvidia.com/parallelforall/cuda-8-feature...