Bounce buffers for untrusted devices
The recently discovered vulnerability in Thunderbolt has restarted discussions about protecting the kernel against untrusted, hotpluggable hardware. That vulnerability, known as Thunderclap, allows a hostile external device to exploit Input-Output Memory Management Unit (IOMMU) mapping limitations and access system memory it was not intended to. Thunderclap can be exploited by USB-C-connected devices; while we have seen USB attacks in the past, this vulnerability is different in that PCI devices, often considered as trusted, can be a source of attacks too. One way of stopping those attacks would be to make sure that the IOMMU is used correctly and restricts the device to accessing the memory that was allocated for it. Lu Baolu has posted an implementation of that approach in the form of bounce buffers for untrusted devices.
PCI and untrusted devices
PCI devices are usually built into the system, there was not much concern about them going rogue (however, a reader expressed concerns in the comments on an LWN article about peer-to-peer PCI accesses). The PCI bus does support hotplugging, but its use is limited. It is, however, possible to attach external PCI devices to a bus like Thunderbolt. That opens the door to the Thunderclap vulnerability; a rogue device can benefit from the fact that the PCI bus is, in practice, more trusted than externally accessible buses.
The PCI bus does not have uncontrolled access to the system, though, on systems where an IOMMU exists and is in use. It allows (or denies) access by devices to specific memory regions and maps bus addresses to physical memory addresses. The IOMMU works at the page level, and the remapped regions must be set explicitly before use; each device has different regions it can access. However, not all systems have an IOMMU enabled (or even installed) because of performance concerns or functionality that does not work correctly with the IOMMU.
One step toward improving the situation is to keep track of which devices are expected to behave well and which might not. The marking of trusted and untrusted PCI devices was added in December 2018. It is done with an untrusted flag added to struct pci_dev to control special handling of such devices, including full IOMMU mapping and functions like the bounce buffers. A PCI device is marked untrusted if the firmware marks its root port as external (currently only if the ExternalFacingPort ACPI property is set); that should be the case for Thunderbolt devices.
IOMMU constraints
Trusted PCI devices are expected to perform their DMA operations to and from the buffers they have been given to use; they do not run out of bounds or access other memory zones. With such devices, the IOMMU configuration code can take some shortcuts and, for example, map slightly bigger zones to fit hardware limitations and optimize IOMMU usage. For untrusted devices, we cannot make the same assumptions; the correct and strict configuration of the IOMMU becomes more important. Unfortunately, the minimum granularity of the (Intel) IOMMU is 4KB. Mapping a buffer with the IOMMU means allowing access to the whole 4KB page, even if the desired zone is smaller.
One result of this limitation is that an unaware driver that allocates a small buffer for device DMA and maps it through the IOMMU exposes the whole page with all of the other data it may contain, even if it belongs to other drivers or to the kernel itself. The fact that this situation does not cause any runtime error could be considered a weak point of the DMA API. Just activating the IOMMU doesn't solve the problem — the system must also take care to not place any unrelated data in the memory mapped by the IOMMU.
Bounce buffers
This is where the proposed patch set comes into play. It implements bounce buffers for the untrusted devices; a bounce buffer is simply a separate memory area that is used for DMA operations. Data is copied ("bounced") between the original buffer and the bounce buffer, which is located in isolated memory that can be mapped by the IOMMU in such a way that there is no access to the data outside the buffer in question.
If the original buffer covers a full page (or multiple full pages), nothing needs to change as this buffer can be directly mapped without exposing any unrelated data. If, instead, the buffer is inside a page that may contain other data, bounce buffers will be used. During the mapping, unmapping, and sync operations, the code will copy the data from the original buffer to the bounce buffer and back, depending on the direction of the transfer. Then the IOMMU uses the bounce-buffer addresses for the device instead of the original one.
When an I/O operation is set up, the original I/O buffer is split into three parts: "low", "middle", and "high". The low and high parts might lay on pages that may contain other data: they are the first and the last page that contains the device buffers. The middle pages contain only the device buffer, so they do not use the bounce buffer; only the low and high pages do. This operation may thus split a single contiguous buffer into three pieces; those pieces will be reunited (from the device's point of view) in the IOMMU mapping.
The bounce-buffer patch implements another change: the IOMMU mapping is invalidated immediately after the unmap operation. If that mapping stays cached in the IOMMU, the device might still use it after the mapped page has been reallocated for some other purpose. The patch set also provides an option to deactivate the bounce buffers if the system administrator trusts the attached devices.
Similarity to swiotlb
In the discussion following the first version of the patch set, Christoph Hellwig noted that the code has similarities to the swiotlb (software input output translation lookaside buffer) subsystem. The swiotlb is a bounce-buffering mechanism used with devices that cannot access all of a system's memory. In response, Lu tried to make use of the swiotlb code, but that effort failed because the approach is somewhat different and the offsets given by the swiotlb are different than the the original ones for the low pages. This is because swiotlb copies the whole buffer, rather than just the low and high segments, during the mapping operation.
Robin Murphy suggested that the implementation should be made generic for the whole IOMMU subsystem and not limited to Intel VT-d only. The discussion continued after the second version submission and Lu proposed an extension to swiotlb. A new version of the patch set was posted on April 21. It includes a refactoring of the swiotlb and moves some of the driver-specific code to the generic IOMMU layer.
Next steps
The use of bounce buffers can protect a system against a class of attacks. It remains an open question if there are more similar issues in the kernel and if there will be a need to harden other in-kernel interfaces. This is likely, as the threat model has completely changed — the attacker now controls the devices that were previously thought of as trusted. It seems certain that we are going to see more attacks from rogue devices using unexpected protocols. The kernel interfaces that were considered internal in the past may need to be reviewed and hardened.
The implementation of the IOMMU bounce buffers is complete; one remaining question is what the performance penalty is. The measurements of the impact have not yet been submitted with the patch set. According to the description, the impact is expected to be small. One may expect that it should be lower than swiotlb since less data copying takes place. Large transfers should not be affected as they are usually page-aligned already. The overhead will be more visible for small transfers, where the setup will dominate the cost of a small copy.
The missing performance information, along with some other comments on the
latest posting of the patch set, suggest that there is still some work to
be done before this code is ready to go upstream. With luck, though, it
shouldn't be too long before Linux systems have a higher level of
protection against untrustworthy devices.
Index entries for this article | |
---|---|
Kernel | Direct memory access |
Kernel | Security/Kernel hardening |
GuestArticles | Rybczynska, Marta |
Posted Apr 26, 2019 17:15 UTC (Fri)
by lkundrak (subscriber, #43452)
[Link] (3 responses)
There certainly are PCIe devices running firmware of very [1] questionable [2] quality that have network access. Can the MCUs generate arbitrary PCIe bus transactions? If so, I suppose they should also be marked untrusted? Is the untrusted property exposed in sysfs? Perhaps something like an udev rule that would mark net/wwan devices with non-free firmware (or at least known bad firmware) as untrusted would make sense?
[1] https://googleprojectzero.blogspot.com/2017/04/over-air-e...
Posted Apr 26, 2019 17:25 UTC (Fri)
by hkario (subscriber, #94864)
[Link] (1 responses)
Posted Apr 26, 2019 23:59 UTC (Fri)
by flussence (guest, #85566)
[Link]
Posted Apr 28, 2019 21:13 UTC (Sun)
by remleduff (guest, #60589)
[Link]
Posted Apr 26, 2019 18:24 UTC (Fri)
by cesarb (subscriber, #6266)
[Link] (2 responses)
I won't be surprised when we find in the wild a computer where external Thunderbolt ports are not marked as ExternalFacingPort (or the opposite, a non-removable built-in device on the motherboard marked as ExternalFacingPort).
Posted Apr 26, 2019 20:10 UTC (Fri)
by josh (subscriber, #17465)
[Link] (1 responses)
Posted Apr 30, 2019 10:08 UTC (Tue)
by jezuch (subscriber, #52988)
[Link]
Posted Apr 26, 2019 22:55 UTC (Fri)
by sbates (subscriber, #106518)
[Link] (2 responses)
Is this setting also done for removable NVMe SSD slots in storage servers? If not then one could envision someone with physical access to the server could remove a trusted NVMe SSD and insert something more malicious. There are already several NVMe form-factors and servers that can support hot plugging of new PCIe devices and many more are coming....
Posted Apr 27, 2019 11:58 UTC (Sat)
by tao (subscriber, #17563)
[Link] (1 responses)
Posted Apr 27, 2019 19:25 UTC (Sat)
by tau (subscriber, #79651)
[Link]
Posted Apr 27, 2019 2:16 UTC (Sat)
by pabs (subscriber, #43278)
[Link] (2 responses)
Posted Apr 27, 2019 2:28 UTC (Sat)
by pabs (subscriber, #43278)
[Link]
https://security.stackexchange.com/questions/176503/dma-a...
Posted Apr 27, 2019 17:53 UTC (Sat)
by hmh (subscriber, #3838)
[Link]
Posted Apr 28, 2019 5:59 UTC (Sun)
by alison (subscriber, #63752)
[Link] (1 responses)
Posted Apr 28, 2019 17:03 UTC (Sun)
by mfuzzey (subscriber, #57966)
[Link]
As you say for large transfers the bounce buffer will only be used for the first and last partial pages.
Furthermore even if the DMA is done to a bounce buffer that can be done quickly, potentially freeing the FIFO registers in the peripheral device for the next transfer while the CPU copies the first one.
Posted Apr 29, 2019 7:45 UTC (Mon)
by jic23 (subscriber, #56049)
[Link]
Posted Apr 30, 2019 12:30 UTC (Tue)
by Trelane (subscriber, #56877)
[Link]
Posted Apr 30, 2019 16:44 UTC (Tue)
by kevincox (subscriber, #93938)
[Link] (1 responses)
Posted May 1, 2019 8:02 UTC (Wed)
by bobot (subscriber, #64147)
[Link]
Bounce buffers for untrusted devices
[2] https://blog.quarkslab.com/reverse-engineering-broadcom-w...
Bounce buffers for untrusted devices
Bounce buffers for untrusted devices
Bounce buffers for untrusted devices
Bounce buffers for untrusted devices
Bounce buffers for untrusted devices
Bounce buffers for untrusted devices
Bounce buffers for untrusted devices
Bounce buffers for untrusted devices
Bounce buffers for untrusted devices
Bounce buffers for untrusted devices
Bounce buffers for untrusted devices
Bounce buffers for untrusted devices
Bounce buffers for untrusted devices
Bounce buffers for untrusted devices
But, even without that if you read directly from the device it has to be word by word to/from the peripheral by the processor.
Peripheral access is often slower than memory, due to wait states, uncached access, barriers etc.
Peripherals designed for DMA use may also have faster DMA access than CPU access (that all depends on the bus interconnects).
Bounce buffers for untrusted devices
Bounce buffers for untrusted devices
Bounce buffers for untrusted devices
Bounce buffers for untrusted devices