|
|
Log in / Subscribe / Register

A memory allocation API for graphics devices

By Jake Edge
September 27, 2017

X.Org Developers Conference

At last year's X.Org Developers Conference (XDC), James Jones began the process of coming up with an API for allocating memory so that it is accessible to multiple different graphics devices in a system (e.g. GPUs, hardware compositors, video decoders, display hardware, cameras, etc.). At XDC 2017 in Mountain View, CA, he was back to update attendees on the progress that has been made. He has a prototype in progress, but there is plenty more to do, including working out some of the problems he has encountered along the way.

Jones has been at NVIDIA for 13 years and has been working on this problem in various forms for most of that time, he said. Allocating buffers and passing them around between multiple drivers is a complicated problem. The allocator will sit in the same place as the Generic Buffer Management (GBM) component is today; it will be used both by applications and by various user-space driver components. The allocator will support both vendor-agnostic (e.g. Android ION) and vendor-specific back-ends, as well as combinations of the two.

Design

The allocator is designed around a number of different objects. An "assertion" is an object that describes the width, height, and format of the surface desired. Many applications may simply assert what they need, without adding any extra parameters; that request will either succeed or fail directly. More complicated parameters can be specified with a "usage" object that describes how the surface will be used; that could be a single kind of use (e.g. rendering) or a multi-use surface (e.g. display and texturing).

[James Jones]

A "constraint" object is returned from the allocator to describe the limitations inherent in a given assertion and usage combination. It is a description of the limitations of the surfaces that can be provided, including things like the pitch (or stride) and address alignment. Constraints are defined for the allocator library as a whole, so if a device has a strange constraint, it must be defined in the library and cannot be hidden in the allocator back-end library or driver. The reason is that constraints are non-trivial restrictions and it is not clear how to merge them if they are device specific; that code will need to live in the common allocator library.

The other object returned is a "capability" object that describes features that the driver can support for a given assertion and usage. Most commonly, these refer to memory layouts, such as the standard pitch linear layout or a vendor-specific layout (e.g. a tiling format). They will also refer to memory placement, which is what kind of memory (e.g. system memory or device-local memory) is required. Constraints and capabilities have dependencies between them, so what will be returned is a list of "capability sets" that pair compatible constraint and capability objects into valid combinations. Capability sets can be intersected with others (perhaps returned from a different device) to find a common denominator.

Jones then stepped through the allocator workflow, based on the USAGE document in his GitHub repository. The first step is to initialize an allocator device object from a device file descriptor. It is not yet clear how that allocator object will be defined. Those trying to use a single surface with multiple devices will likely initialize multiple allocator objects.

The next step is to query capability sets from the device(s) given an assertion and list of usages. After that, capability sets can be merged to find common capabilities; it is possible that there may be no commonality, so the application will need to have its own fall-back logic. Trying to allocate a surface on the available devices is next, which might also fail, in which case the application could fall back to a different capability set. Once the allocation succeeds, the surface can be imported into graphics, mode-setting, video, or other APIs.

Prototype

His goal was to have a demo for XDC, but that didn't work out. He has parts of the prototype working, though, as detailed in his slides [PDF] (slide eight). So far, he can create devices, query and merge capabilities and constraints, and create allocations. Exporting and importing allocations to other APIs and using the allocations, either for Vulkan and OpenGL or for Direct Rendering Manager (DRM) and non-graphics devices, remain to be done.

The core of the allocator is the capability set math; it is the "value add" for the new allocator, Jones said. The idea is to take two sets, potentially from two different devices, and to create a set that works for both devices. He thought it would be straightforward to implement that, but it took several weeks to get it right. It works well for all of the NVIDIA use cases, but he would like to see more testing from others with different devices and use cases.

He gave two examples of the capability set math using three different device capability sets, two of which could not be combined because there was no overlap in the format capabilities. The other two could be combined by choosing a common format capability and intersecting the address-alignment constraints. Devices can specify a "required capability"—one that will cause the merge to fail if it must be removed.

Capabilities are effectively opaque to the allocator. They are compared using a simple memcmp(), but they are typed by vendor, so there will be no confusion when doing the comparison. For common capabilities, like pitch-linear layout, there will be a vendor-neutral type so that they can be shared by all of the back-ends. In answer to a question, Jones said that it is fairly easy to add constraints. There is a header file where an ID needs to be added and a merge/intersect function must be added to a table.

Problems encountered

There are a number "gotchas" he has found so far. For one thing, a device file does not necessarily uniquely identify a logical device, at least for NVIDIA devices. Creating allocators from a file descriptor implies that there is one unique device file that corresponds to the logical device of interest. It would be nice if the UUIDs from the Vulkan/OpenGL APIs could be used to enumerate available devices, he said.

There are some capabilities that only apply to a particular device (device-local capabilities), such as a GPU with an on-chip cache. The capabilities could specify its use, but other devices won't be aware of or care about this cache. When intersecting capabilities with other device's sets, the local-cache capability will end up being removed, which is not what is desired. There may be a need for another flag (similar to the required flag) that tells devices to ignore the capability if they don't know about it. That would mean there are capability sets that get handed to devices with capabilities that are not understood, which may be problematic in other ways.

The way to specify formats is still up in the air. There are "a billion ways to do it", Jones said, and he doesn't really care which is chosen. Last year, the Khronos data format specification was suggested, as was FOURCC. The prototype supports any format as long as it is RGBA 32-bit, he said with a grin. Whatever is chosen will need to support high dynamic range (HDR) formats. There is also an open question on the need for format enumeration; which formats are supported may depend on the intended usage of the surface.

An important missing piece is how to use these allocations with the Vulkan and OpenGL import APIs. Those APIs expect some metadata to be associated with the allocations that describe them, but there are various elements of the allocator metadata that do not apply to Vulkan or OpenGL, such as the device-local capabilities. A query could be added to allow applications to retrieve the allocation metadata, but some of that is opaque metadata. Some developers are concerned that there is a security risk in using opaque metadata. He does not agree about the security risk (which he did not specify) and is unsure how else to solve the problem.

Another outstanding question is the relationship to DMA buffers (DMA-BUF); should the import and export API consume and produce DMA-BUF file descriptors? He is concerned that doing so would bake Linux-specific assumptions into the API; even file descriptors can be non-portable to other operating systems. He also wondered if there is any value in using a DMA-BUF when the allocation will only be used by a single device or driver stack.

Up next

Transitioning a surface from one usage type to another is something that Vulkan allows, which could be more widely applied. The API to do so would be complicated, however. Applications could request metadata from the allocator on what needs to be done for the transition (e.g. invalidate a cache) that could be passed to the driver.

A simpler approach would be to do a reallocation operation when the usage of the surface changes. The API is already basically established and the steady-state when there is no usage change is optimal. But allocation can be expensive, while transitions have a consistent cost. In addition, the usage may change at inconvenient times, so the allocation cost may be noticeable.

The original goal was to make memory allocation work with Wayland and other, similar compositors. That still needs to be tackled. NVIDIA introduced EGLStream to that end, and has a sample implementation that uses that mechanism. The key functionality needed to replace EGLStream with the allocator is to be able to build an EGLSurface from an allocator surface. There are multiple Wayland applications that need that ability, he said.

There is also a question of where this new allocator code should live and what it should be called. Right now, it is a standalone library called liballocator because that was easier for development. It could be moved into a new library or merged into GBM, he said. The name might be too generic if it remains as a standalone library.

He finished by putting up a slide (number 26) that listed the questions he had asked along the way. There was no immediate resolution to any of them in the talk, but it was held early on the first day of XDC. One suspects there were some hallway track discussions to try to address some or all of them.

[I would like to thank the X.Org Foundation and the Linux Foundation for travel assistance to Mountain View for XDC.]

Index entries for this article
ConferenceX.Org Developers Conference/2017


to post comments

A memory allocation API for graphics devices

Posted Sep 28, 2017 21:17 UTC (Thu) by liam (guest, #84133) [Link] (1 responses)

Perhaps I'm misreading between the lines but it seems as though there is not much interest in this project (outside of nvidia). From this write-up, there appeared to be little engagement from the audience (no mention of questions, I believe).
The end of the article mentions discussions on the hallway, though it's unclear how speculative the author is being.

A memory allocation API for graphics devices

Posted Oct 2, 2017 22:46 UTC (Mon) by louai (subscriber, #58033) [Link]

There were actually quite a few questions, you can watch the talk here: https://www.youtube.com/watch?v=g5T5wSCXkH4&t=1h34m48s


Copyright © 2017, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds