Garrett: Why ACPI?
There's an alternative universe where we decided to teach the kernel about every piece of hardware it should run on. Fortunately (or, well, unfortunately) we've seen that in the ARM world. Most device-specific simply never reaches mainline, and most users are stuck running ancient kernels as a result. Imagine every x86 device vendor shipping their own kernel optimised for their hardware, and now imagine how well that works out given the quality of their firmware. Does that really seem better to you?
Posted Nov 1, 2023 14:06 UTC (Wed)
by larkey (guest, #104463)
[Link]
Posted Nov 1, 2023 16:10 UTC (Wed)
by dxin (guest, #136611)
[Link] (6 responses)
Posted Nov 1, 2023 16:46 UTC (Wed)
by pizza (subscriber, #46)
[Link] (5 responses)
It turns out writing (and supporting!) a bespoke operating system for your bespoke hardware platform isn't terribly profitable, so hardware vendors turn to others to provide the OS. Similarly, application writers and end-users don't want to have to deal with a bespoke OS for each hardware platform, so they want a commodity OS they can use across a variety of hardware.
(...I should also point out that SBSA, aka "The ARM server way" mandates use of ACPI and UEFI for this very reason)
Posted Nov 2, 2023 22:51 UTC (Thu)
by Hunterprocrasinates (guest, #167806)
[Link] (4 responses)
Posted Nov 3, 2023 15:14 UTC (Fri)
by pizza (subscriber, #46)
[Link] (3 responses)
It doesn't change the fact that there has to be _some_ way for the hardware to perform basic initialization, enumerate storage and other I/O devices, and load the operating system. On top of that, over the past 30 years we've grown accustomed to the "hardware" providing a standard mechanism to perform operations like power management, suspend, and other things that require intimate knowledge of each unique hardware design/variation.
In ARM-land, historically there has been no such standard mechanism; a unique per-design bootloader performs low-level initialization, copies the OS image from storage, and loads it. In the olden days, the OS needed to be pre-compiled for that specific design, with what was effectively a "per-design driver" that told the OS what peripherals were present and how/where they were hooked up. The major downside to this approach is that you need explicit OS support for your hardware design before it can even _boot_.
More recent ARM moved to a devicetree, where the per-design bootloader handed a generic OS image a detailed description of the hardware. The OS still had needed to have a driver for each individual component, but no longer needed a "driver" unique to each design variation. This meant you could use a generic OS image, but that OS still needed to have device drivers for every component, including the low-level platform stuff (eg PMIC or clock generators) without which the system might not be able to boot. This still tended to require a relatively modern OS image for newer hardware designs.
This is where UEFI+ACPI comes in; it standardizes the OS<->"hardware" API so that you don't need to update the OS (and/or its drivers) every time you tweak your hardware design. The OS can just invoke the generic ACPI "power down" API instead of needing to know that this design has a FooCo X4321-b PMIC, with eight different voltage regulators (controlled by a specific set of pins) that need to be shut down in a specific sequence or the board can't be powered back up again without physically disconnecting power for five minutes.
Posted Nov 3, 2023 16:02 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (2 responses)
I've done that. It's NOT NICE. And you know the old Unix error message? "Your printer is on fire"? It wasn't unknown for an innocent OS probe to let the magic smoke out of some peripheral or other - one only has to look back at X destroying VDUs by setting the wrong mode line...
Cheers,
Posted Nov 3, 2023 19:47 UTC (Fri)
by pizza (subscriber, #46)
[Link] (1 responses)
ACtually, that depends. When the hardware is designed to be safely pluggable & probable (eg USB, PCI[e], and other self-describing interconnects), probing works just fine. But trying to probe for things that weren't designed with that in mind... can lead to some nastiness.
> "Your printer is on fire"? It wasn't unknown for an innocent OS probe to let the magic smoke out of some peripheral or other
That wasn't due to probing; it was a tongue-in-cheek interpretation of (very limited) status signals that the line printer could report.
(See https://en.wikipedia.org/wiki/Lp0_on_fire )
> one only has to look back at X destroying VDUs by setting the wrong mode line...
That was due to the user supplying a display configuration that the monitor couldn't handle rather than probing, and was no different than someone manually picking a bogus resolution+refresh rate in $Other_OS. Eventually, VGA monitors gained a standard mechanism to report their capabilities (via DDC & EDID) and that remains in use today.
Posted Nov 4, 2023 17:52 UTC (Sat)
by Wol (subscriber, #4433)
[Link]
> ACtually, that depends. When the hardware is designed to be safely pluggable & probable (eg USB, PCI[e], and other self-describing interconnects), probing works just fine. But trying to probe for things that weren't designed with that in mind... can lead to some nastiness.
I had it easy. I was just using a terminal emulator. With clueless lusers. So I had the emulator wIntegrate, some real physical Pr1me PT100s, and PT250s, Wyse, Adds3E, ...
And as I say, clueless lusers who just refused to know what was going on. Fortunately, I discovered that <ESC>E (iirc) was the standard "ask the terminal to send an answerback" code. I don't think it's universal, I was lucky that it was on all the physical terminals, and I could program the emulator to respond.
So I coded the login shell to send <ESC>E, listened for the response, and set the appropriate terminal type. Still broke regularly, but at least I wasn't repeatedly fielding calls from users who'd selected the wrong terminal type and wondered why their screens were all messed up.
(Likewise, the shell chose the user's default printer based on their department, and as much automation as I could manage to try and suppress all the grief lusers cause ... :-)
You don't want to go there, if you can avoid it ...
Cheers,
Posted Nov 1, 2023 19:48 UTC (Wed)
by jra (subscriber, #55261)
[Link] (7 responses)
https://www.youtube.com/watch?v=36myc8wQhLo
He doesn't seem to be a big Linux fan (understandable, given his research interests) but it does seem to vividly articulate a real problem with our current code.
Posted Nov 1, 2023 19:49 UTC (Wed)
by jra (subscriber, #55261)
[Link]
USENIX ATC '21/OSDI '21 Joint Keynote Address-It's Time for Operating Systems to Rediscover Hardware
https://www.youtube.com/watch?v=36myc8wQhLo
Posted Nov 2, 2023 0:57 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
I think that Linux developers understand pretty well that a modern computer is a mess of multiple smaller computers, and Linux tries to do all the right stuff with the IOMMU and defensive programming. But it's almost inevitable that drivers that are tens of thousands of lines long will have vulnerabilities. Especially because it's so hard to test for them, you can't just fuzz the device-side for something like WiFi or GPUs.
Posted Nov 2, 2023 1:32 UTC (Thu)
by pizza (subscriber, #46)
[Link] (3 responses)
It's not that the _driver_ had a vulnerability; it's that the _device itself_ reconfigured a shared peripheral out from underneath Linux's nose. There's nothing that Linux (or any other "OS" software) can do to protect itself against a peripheral that has full bus-master DMA access to the main system bus. This sort of hardware-level vulnerability requires hardware-level protection.
(Of course, in the real world, these subsystems actually consider the entire Linux-running core complex to be the non-trustworthy one, and are set up to protect _themselves_ from Linux. Because cellular radios and DRM keys are more precious than any possible user data)
> Especially because it's so hard to test for them, you can't just fuzz the device-side for something like WiFi or GPUs.
That will do f-all when a device has full access to the system bus and can arbitrarily access the entire physical address space.
Posted Nov 2, 2023 2:27 UTC (Thu)
by cypherpunks2 (guest, #152408)
[Link] (1 responses)
Enabling the IOMMU (properly-configured VT-d2 with x2APIC and IR support[1], and ATS disabled[2]) is the very hardware-level protection that protects against this. While the peripheral does have bus master, the DMAR ACPI tables allow the IOMMU to remap any requests from the peripheral. This assumes the driver doesn't have any serious bugs[3], but will protect against arbitrary memory access, despite the peripheral having bus master.
And even if a device is not isolated with the IOMMU, while it can still toggle its own bus master bit in the PCIe command register by itself, the DMA requests won't be forwarded by the PCI bridge unless it _too_ has its bus master bit set (which is not something the peripheral can control on its own).
[1] https://invisiblethingslab.com/resources/2011/Software%20...
Posted Nov 2, 2023 11:56 UTC (Thu)
by pizza (subscriber, #46)
[Link]
You're thinking x86 PC, whereas this hardware is a complex SoC with many sets of CPU cores, only one of which runs Linux. The specific co-processor in question is *directly attached to the main system/processor bus* (ie not PCIe) and has its own dedicated IOMMU under its own control.
Again, the threat model for this system is that this co-processor needs to be protected from *linux*, not the other way around. This peripheral has *higher* privileges than the Linux kernel, so any protections Linux can set up, this thing can overwrite.
And *that* is the point of the talk -- Linux isn't the "real" operating system on modern SoCs (and arguably, even x86 PCs/servers) and we need to disabuse ourselves of that delusion ASAP.
Posted Nov 2, 2023 15:08 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
No, it was the driver mistakenly allowing the peripheral to request access to all of the RAM, bypassing the SMMU (that's the name of IOMMU for ARM).
> This sort of hardware-level vulnerability requires hardware-level protection.
There is such protection in place, it was not used correctly. There's nothing special about modern ARM SOCs, they all potentially have the same guarantees as x86. Most of the devices in SOCs are connected through internal serial buses like SPI or I2C, anyway. It's mostly high-bandwidth devices that need some special attention.
Posted Nov 2, 2023 4:12 UTC (Thu)
by brunowolff (guest, #71160)
[Link]
Posted Nov 2, 2023 12:53 UTC (Thu)
by ballombe (subscriber, #9523)
[Link] (2 responses)
Posted Nov 2, 2023 16:08 UTC (Thu)
by syrjala (subscriber, #47399)
[Link]
Posted Nov 15, 2023 22:23 UTC (Wed)
by roblucid (guest, #48964)
[Link]
Posted Nov 2, 2023 20:28 UTC (Thu)
by Lumag (subscriber, #22579)
[Link]
Posted Nov 2, 2023 22:51 UTC (Thu)
by Hunterprocrasinates (guest, #167806)
[Link] (29 responses)
The more we grant UEFI control over the OS, the worse the OS becomes. UEFI has proven itself multiple times that it isn't capable of handling things. Taking more control away from the OS and giving it to UEFI results in a worse OS. Back during the BIOS era, things weren't great either as there was no standard. However, at least the OS had almost complete control over things, meaning that if something went wrong, it was the operating system's fault. Now, with UEFI, the OS has less control over the system, and hardware is managed more by UEFI than the OS. Since UEFI firmware never fully hands control to the OS or disappears like BIOS does, if the UEFI firmware encounters a problem or error, the entire OS comes to a halt, even if the OS is well past the boot-up or login page. We need to return to a time when if the OS crashed, it was purely due to the OS's inability to handle the hardware or some bug that could be fixed with a simple kernel patch.
Posted Nov 3, 2023 0:06 UTC (Fri)
by mjg59 (subscriber, #23239)
[Link] (28 responses)
Posted Nov 3, 2023 0:41 UTC (Fri)
by Hunterprocrasinates (guest, #167806)
[Link] (27 responses)
Posted Nov 3, 2023 4:22 UTC (Fri)
by mjg59 (subscriber, #23239)
[Link] (26 responses)
Posted Nov 3, 2023 13:04 UTC (Fri)
by Hunterprocrasinates (guest, #167806)
[Link] (25 responses)
Posted Nov 3, 2023 14:14 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (24 responses)
The deep fundamental is that there are details of the hardware that are unique to this system, and not part of the wider platform. If you don't have something like PC BIOS, OpenFirmware or UEFI you end up in a world where you literally have to compile your OS for the specific hardware in front of you, and you have to know exactly how the specifics of this instance of the platform are expected to behave.
This is, after all, not impossible - it's what the Commodore Amiga did, and the Acorn Archimedes range, and Old World Macs. But it means that things that we now take for granted (like the ability to replace the CPU with a newer one without replacing the OS) are no longer guaranteed to work, since the OS has to know all the details of the new hardware that didn't exist when it was written in order to be able to bring it up.
Posted Nov 4, 2023 16:29 UTC (Sat)
by raven667 (subscriber, #5198)
[Link]
The reason why this happened this way is that ARM and Android architecture did not specify any kind of standard facility to do this low-level hardware setup, they just stuffed it into Linux for expediency, where it's easy to modify, with really no consideration to the long-term maintainability of the hardware, which they'd rather sell you new hardware to get updates anyway, regardless of how wasteful of resources that actually is.
Posted Nov 5, 2023 23:33 UTC (Sun)
by Hunterprocrasinates (guest, #167806)
[Link] (22 responses)
Posted Nov 5, 2023 23:48 UTC (Sun)
by mjg59 (subscriber, #23239)
[Link] (13 responses)
Posted Nov 5, 2023 23:57 UTC (Sun)
by Hunterprocrasinates (guest, #167806)
[Link] (3 responses)
Posted Nov 6, 2023 0:30 UTC (Mon)
by mjg59 (subscriber, #23239)
[Link]
Posted Nov 6, 2023 0:35 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
At least with UEFI there's a (somewhat) reference implementation that most (all?) vendors use: https://www.tianocore.org/ In particular, AMI uses it.
Posted Nov 6, 2023 0:48 UTC (Mon)
by mjg59 (subscriber, #23239)
[Link]
Posted Nov 6, 2023 0:13 UTC (Mon)
by Hunterprocrasinates (guest, #167806)
[Link] (8 responses)
https://imgur.com/a/s7klWHu (ignore the nsfw warning. theres no nsfw on this page)
Posted Nov 6, 2023 0:28 UTC (Mon)
by mjg59 (subscriber, #23239)
[Link] (7 responses)
Posted Nov 6, 2023 0:47 UTC (Mon)
by Hunterprocrasinates (guest, #167806)
[Link] (6 responses)
Posted Nov 6, 2023 0:50 UTC (Mon)
by mjg59 (subscriber, #23239)
[Link] (5 responses)
Posted Nov 6, 2023 0:53 UTC (Mon)
by Hunterprocrasinates (guest, #167806)
[Link] (4 responses)
Posted Nov 6, 2023 4:31 UTC (Mon)
by mjg59 (subscriber, #23239)
[Link] (3 responses)
Posted Nov 6, 2023 9:49 UTC (Mon)
by Hunterprocrasinates (guest, #167806)
[Link] (2 responses)
Posted Nov 6, 2023 10:04 UTC (Mon)
by zdzichu (subscriber, #17118)
[Link] (1 responses)
Posted Nov 6, 2023 10:29 UTC (Mon)
by Hunterprocrasinates (guest, #167806)
[Link]
Posted Nov 6, 2023 11:08 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (7 responses)
No, I'm not - I'm saying that you need something that knows details like "you must delay 0.1 ms between enabling the DRAM voltage regulator and trying to access DRAM" that are completely specific to the hardware, and on x86-64 and SBSA ARM systems, that something is UEFI + ACPI.
The alternative is OS-specific drivers; would you prefer it if, instead of your ASUS, Dell, or Lenovo motherboard coming with a UEFI and ACPI, it came with a minimal Windows install embedded in the motherboard flash that was just functional enough to start "real" Windows and hand over hardware knowledge? That is the alternative that's been used in the past, and the result was that alternative OSes for that hardware had to somehow pick up all the details of how the hardware worked - which was different for each and every device. BIOS manufacturers are imperfect, but they're better than saying that you don't have a BIOS, you have a subset of Windows embedded in your system, and the only thing it can boot is Windows.
Posted Nov 6, 2023 12:28 UTC (Mon)
by Hunterprocrasinates (guest, #167806)
[Link] (6 responses)
Posted Nov 6, 2023 12:31 UTC (Mon)
by Hunterprocrasinates (guest, #167806)
[Link]
Posted Nov 6, 2023 14:11 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (4 responses)
What, exactly, does UEFI handle that would be better handled by the OS? Without a concrete example, it's hard to judge what you're saying; but I'd note that my system has nothing in use right now that's handled by UEFI after it loads the kernel.
Posted Nov 10, 2023 13:55 UTC (Fri)
by Hunterprocrasinates (guest, #167806)
[Link] (3 responses)
Posted Nov 10, 2023 13:58 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (2 responses)
What about ACPI would be better handled by the OS?
ACPI is a set of data tables the OS interprets to tell it how to do useful things (like enter suspend-to-RAM state, or find the PCIe Root Complex registers). If we got rid of it, the OS would have to have all of these details hard-coded for every single platform that the OS wishes to support - and you simply wouldn't be able to boot if no-one had hard-coded the right details for your motherboard + CPU combination into the OS.
Posted Nov 11, 2023 2:55 UTC (Sat)
by Hunterprocrasinates (guest, #167806)
[Link] (1 responses)
Fine you win. I don't know who to blame anymore. I'm sick of firmware bugs on other computers besides thinkpads. I Don't want to spend 7000$ on the newest linux laptop and I cant spend my money on a cheap modern computer because they are riddled with firmware bugs that affect linux.
Posted Nov 13, 2023 12:17 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
Underlying this is firmware bugs, not ACPI or UEFI issues; if we replace ACPI or UEFI with something else, you'll just get a different set of buggy firmwares.
This is clear when you look at some of the crap that got put out there for OpenFirmware systems, and for platforms without a standard firmware like the Acorn platforms; they often could only boot one OS reliably, because the firmware authors "knew" that they would only run that OS.
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
you're acting like they did a good job at it. UEFI vendors like AMI still use windows ML.exe to compile everything and are riddled with bugs. They also give vendors barely any information on stuff besides code comments and like some basic windows help files(forgot what they're called) that contain like 1 page. Vendors just don't even care about firmware(I'm pretty sure some 2017 gigabyte motherboards still use AWARD BIOS!)
Garrett: Why ACPI?
Garrett: Why ACPI?
Wol
Garrett: Why ACPI?
Garrett: Why ACPI?
Wol
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
[2] https://cloud.google.com/blog/products/gcp/fuzzing-pci-ex...
[3] https://www.ndss-symposium.org/wp-content/uploads/2019/02...
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
I also noticed that he seemed to indicate that even the helpful companies still only gave up some info under NDA, but what they couldn't say didn't have much of an effect on what they published. If so, that could still be a problem for developing an open source operating system that runs on hardware from somewhat friendly companies.
Garrett: Why ACPI?
It took years for ACPI to achieve that on Linux.
Garrett: Why ACPI?
Garrett: Why ACPI?
But hardware vendors only cared about how it worked under Windows, so a new Linux ACPI came out emulating Windows behaviour so buggy firmware worked.
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Ask ASUS for API description how to do that? Internet shows this data is field-programmable (for example https://support.nextcomputing.com/hc/en-us/articles/47096...-)
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?
Garrett: Why ACPI?