Bounce buffers for untrusted devices

April 26, 2019

This article was contributed by Marta Rybczyńska

The recently discovered vulnerability in Thunderbolt has restarted discussions about protecting the kernel against untrusted, hotpluggable hardware. That vulnerability, known as Thunderclap, allows a hostile external device to exploit Input-Output Memory Management Unit (IOMMU) mapping limitations and access system memory it was not intended to. Thunderclap can be exploited by USB-C-connected devices; while we have seen USB attacks in the past, this vulnerability is different in that PCI devices, often considered as trusted, can be a source of attacks too. One way of stopping those attacks would be to make sure that the IOMMU is used correctly and restricts the device to accessing the memory that was allocated for it. Lu Baolu has posted an implementation of that approach in the form of bounce buffers for untrusted devices.

PCI and untrusted devices

PCI devices are usually built into the system, there was not much concern about them going rogue (however, a reader expressed concerns in the comments on an LWN article about peer-to-peer PCI accesses). The PCI bus does support hotplugging, but its use is limited. It is, however, possible to attach external PCI devices to a bus like Thunderbolt. That opens the door to the Thunderclap vulnerability; a rogue device can benefit from the fact that the PCI bus is, in practice, more trusted than externally accessible buses.

The PCI bus does not have uncontrolled access to the system, though, on systems where an IOMMU exists and is in use. It allows (or denies) access by devices to specific memory regions and maps bus addresses to physical memory addresses. The IOMMU works at the page level, and the remapped regions must be set explicitly before use; each device has different regions it can access. However, not all systems have an IOMMU enabled (or even installed) because of performance concerns or functionality that does not work correctly with the IOMMU.

One step toward improving the situation is to keep track of which devices are expected to behave well and which might not. The marking of trusted and untrusted PCI devices was added in December 2018. It is done with an untrusted flag added to struct pci_dev to control special handling of such devices, including full IOMMU mapping and functions like the bounce buffers. A PCI device is marked untrusted if the firmware marks its root port as external (currently only if the ExternalFacingPort ACPI property is set); that should be the case for Thunderbolt devices.

IOMMU constraints

Trusted PCI devices are expected to perform their DMA operations to and from the buffers they have been given to use; they do not run out of bounds or access other memory zones. With such devices, the IOMMU configuration code can take some shortcuts and, for example, map slightly bigger zones to fit hardware limitations and optimize IOMMU usage. For untrusted devices, we cannot make the same assumptions; the correct and strict configuration of the IOMMU becomes more important. Unfortunately, the minimum granularity of the (Intel) IOMMU is 4KB. Mapping a buffer with the IOMMU means allowing access to the whole 4KB page, even if the desired zone is smaller.

One result of this limitation is that an unaware driver that allocates a small buffer for device DMA and maps it through the IOMMU exposes the whole page with all of the other data it may contain, even if it belongs to other drivers or to the kernel itself. The fact that this situation does not cause any runtime error could be considered a weak point of the DMA API. Just activating the IOMMU doesn't solve the problem — the system must also take care to not place any unrelated data in the memory mapped by the IOMMU.

Bounce buffers

This is where the proposed patch set comes into play. It implements bounce buffers for the untrusted devices; a bounce buffer is simply a separate memory area that is used for DMA operations. Data is copied ("bounced") between the original buffer and the bounce buffer, which is located in isolated memory that can be mapped by the IOMMU in such a way that there is no access to the data outside the buffer in question.

If the original buffer covers a full page (or multiple full pages), nothing needs to change as this buffer can be directly mapped without exposing any unrelated data. If, instead, the buffer is inside a page that may contain other data, bounce buffers will be used. During the mapping, unmapping, and sync operations, the code will copy the data from the original buffer to the bounce buffer and back, depending on the direction of the transfer. Then the IOMMU uses the bounce-buffer addresses for the device instead of the original one.

When an I/O operation is set up, the original I/O buffer is split into three parts: "low", "middle", and "high". The low and high parts might lay on pages that may contain other data: they are the first and the last page that contains the device buffers. The middle pages contain only the device buffer, so they do not use the bounce buffer; only the low and high pages do. This operation may thus split a single contiguous buffer into three pieces; those pieces will be reunited (from the device's point of view) in the IOMMU mapping.

The bounce-buffer patch implements another change: the IOMMU mapping is invalidated immediately after the unmap operation. If that mapping stays cached in the IOMMU, the device might still use it after the mapped page has been reallocated for some other purpose. The patch set also provides an option to deactivate the bounce buffers if the system administrator trusts the attached devices.

Similarity to swiotlb

In the discussion following the first version of the patch set, Christoph Hellwig noted that the code has similarities to the swiotlb (software input output translation lookaside buffer) subsystem. The swiotlb is a bounce-buffering mechanism used with devices that cannot access all of a system's memory. In response, Lu tried to make use of the swiotlb code, but that effort failed because the approach is somewhat different and the offsets given by the swiotlb are different than the the original ones for the low pages. This is because swiotlb copies the whole buffer, rather than just the low and high segments, during the mapping operation.

Robin Murphy suggested that the implementation should be made generic for the whole IOMMU subsystem and not limited to Intel VT-d only. The discussion continued after the second version submission and Lu proposed an extension to swiotlb. A new version of the patch set was posted on April 21. It includes a refactoring of the swiotlb and moves some of the driver-specific code to the generic IOMMU layer.

Next steps

The use of bounce buffers can protect a system against a class of attacks. It remains an open question if there are more similar issues in the kernel and if there will be a need to harden other in-kernel interfaces. This is likely, as the threat model has completely changed — the attacker now controls the devices that were previously thought of as trusted. It seems certain that we are going to see more attacks from rogue devices using unexpected protocols. The kernel interfaces that were considered internal in the past may need to be reviewed and hardened.

The implementation of the IOMMU bounce buffers is complete; one remaining question is what the performance penalty is. The measurements of the impact have not yet been submitted with the patch set. According to the description, the impact is expected to be small. One may expect that it should be lower than swiotlb since less data copying takes place. Large transfers should not be affected as they are usually page-aligned already. The overhead will be more visible for small transfers, where the setup will dominate the cost of a small copy.

The missing performance information, along with some other comments on the latest posting of the patch set, suggest that there is still some work to be done before this code is ready to go upstream. With luck, though, it shouldn't be too long before Linux systems have a higher level of protection against untrustworthy devices.

Index entries for this article
Kernel	Direct memory access
Kernel	Security/Kernel hardening
GuestArticles	Rybczynska, Marta

Bounce buffers for untrusted devices

Posted Apr 26, 2019 17:15 UTC (Fri) by lkundrak (subscriber, #43452) [Link] (3 responses)

> PCI devices are usually built into the system, there was not much concern about them going rogue

There certainly are PCIe devices running firmware of very [1] questionable [2] quality that have network access. Can the MCUs generate arbitrary PCIe bus transactions? If so, I suppose they should also be marked untrusted? Is the untrusted property exposed in sysfs? Perhaps something like an udev rule that would mark net/wwan devices with non-free firmware (or at least known bad firmware) as untrusted would make sense?

[1] https://googleprojectzero.blogspot.com/2017/04/over-air-e...
[2] https://blog.quarkslab.com/reverse-engineering-broadcom-w...

Bounce buffers for untrusted devices

Posted Apr 26, 2019 17:25 UTC (Fri) by hkario (subscriber, #94864) [Link] (1 responses)

I'd say that as a defence in depth approach, this should be applied to all devices, not only ones that have non-free or known buggy firmware.

Bounce buffers for untrusted devices

Posted Apr 26, 2019 23:59 UTC (Fri) by flussence (guest, #85566) [Link]

And not just network devices either; things like DVB decoders on TV cards would make for an interesting target.

Bounce buffers for untrusted devices

Posted Apr 28, 2019 21:13 UTC (Sun) by remleduff (guest, #60589) [Link]

I don't understand why the kernel doesn't allocate to the granularity of the IOMMU. There doesn't seem much harm in wasting part of a single 4k page if the device only wants part of it.

Bounce buffers for untrusted devices

Posted Apr 26, 2019 18:24 UTC (Fri) by cesarb (subscriber, #6266) [Link] (2 responses)

> A PCI device is marked untrusted if the firmware marks its root port as external (currently only if the ExternalFacingPort ACPI property is set); that should be the case for Thunderbolt devices.

I won't be surprised when we find in the wild a computer where external Thunderbolt ports are not marked as ExternalFacingPort (or the opposite, a non-removable built-in device on the motherboard marked as ExternalFacingPort).

Bounce buffers for untrusted devices

Posted Apr 26, 2019 20:10 UTC (Fri) by josh (subscriber, #17465) [Link] (1 responses)

Yeah, at the end of the day *all* devices should be untrusted and use an IOMMU.

Bounce buffers for untrusted devices

Posted Apr 30, 2019 10:08 UTC (Tue) by jezuch (subscriber, #52988) [Link]

...To protect against regular incompetence in addition to malice.

Bounce buffers for untrusted devices

Posted Apr 26, 2019 22:55 UTC (Fri) by sbates (subscriber, #106518) [Link] (2 responses)

> A PCI device is marked untrusted if the firmware marks its root port as external (currently only if the ExternalFacingPort ACPI property is set); that should be the case for Thunderbolt devices.

Is this setting also done for removable NVMe SSD slots in storage servers? If not then one could envision someone with physical access to the server could remove a trusted NVMe SSD and insert something more malicious. There are already several NVMe form-factors and servers that can support hot plugging of new PCIe devices and many more are coming....

Bounce buffers for untrusted devices

Posted Apr 27, 2019 11:58 UTC (Sat) by tao (subscriber, #17563) [Link] (1 responses)

Wouldn't such a scenario be kind of bye-bye anyway? Remote the root disk, replace it with one that has a replaced kernel or /etc/shadow or /bin/login or /bin/ssh or /bin/gpg or a gazillion other neat little backdoors.

Bounce buffers for untrusted devices

Posted Apr 27, 2019 19:25 UTC (Sat) by tau (subscriber, #79651) [Link]

Not if the root disk is encrypted using a TPM-protected key

Bounce buffers for untrusted devices

Posted Apr 27, 2019 2:16 UTC (Sat) by pabs (subscriber, #43278) [Link] (2 responses)

Has there been any research into how trustworthy IOMMU devices are?

Bounce buffers for untrusted devices

Posted Apr 27, 2019 2:28 UTC (Sat) by pabs (subscriber, #43278) [Link]

While researching the answer, I came across this question, which includes some links to research related to my question:

https://security.stackexchange.com/questions/176503/dma-a...

Bounce buffers for untrusted devices

Posted Apr 27, 2019 17:53 UTC (Sat) by hmh (subscriber, #3838) [Link]

Do read their errata sheets/hardware defect lists... :(

Bounce buffers for untrusted devices

Posted Apr 28, 2019 5:59 UTC (Sun) by alison (subscriber, #63752) [Link] (1 responses)

Doesn't having a bounce buffer rather defeat the purpose of DMA? Is DMA via a bounce buffer faster than just plain copying the data in the processor's address space via normal mechanism? Perhaps the answer is that devices that perform small transfers that need sub-page allocations also perform many large transfers for which DMA makes sense.

Bounce buffers for untrusted devices

Posted Apr 28, 2019 17:03 UTC (Sun) by mfuzzey (subscriber, #57966) [Link]

There will be a performance hit certainly but I don't think it completely negates the advantages of DMA.

As you say for large transfers the bounce buffer will only be used for the first and last partial pages.
But, even without that if you read directly from the device it has to be word by word to/from the peripheral by the processor.
Peripheral access is often slower than memory, due to wait states, uncached access, barriers etc.
Peripherals designed for DMA use may also have faster DMA access than CPU access (that all depends on the bus interconnects).

Furthermore even if the DMA is done to a bounce buffer that can be done quickly, potentially freeing the FIFO registers in the peripheral device for the next transfer while the CPU copies the first one.

Bounce buffers for untrusted devices

Posted Apr 29, 2019 7:45 UTC (Mon) by jic23 (subscriber, #56049) [Link]

It's worth noting that the performance lost by not doing lazy invalidations of the TLBs may well be significant. Shall we say, that 'optimization' is there for a reason.

Bounce buffers for untrusted devices

Posted Apr 30, 2019 12:30 UTC (Tue) by Trelane (subscriber, #56877) [Link]

How does this work where peer-to-peer transfers are a feature, e.g. Nvidia gpudirect?

Bounce buffers for untrusted devices

Posted Apr 30, 2019 16:44 UTC (Tue) by kevincox (subscriber, #93938) [Link] (1 responses)

Why use a bounce-buffer instead of just ensuring that nothing shares the page of the regular buffer? It seems like you now have an "owned page" plus an additional copy.

Bounce buffers for untrusted devices

Posted May 1, 2019 8:02 UTC (Wed) by bobot (subscriber, #64147) [Link]

It could be a first step which doesn't require any modifications of the user code. Later user code could be modified to avoid the need of bounce buffers.