|From:||Matthew Garrett <mjg59-AT-srcf.ucam.org>|
|To:||Jonathan Corbet <corbet-AT-lwn.net>|
|Subject:||Re: Two questions|
|Date:||Wed, 29 Jun 2011 01:33:32 +0100|
ASPM's an interesting technology. PCIe is clocked at at least 2.5GHz, and later versions run at 5GHz. Running that clock takes a bunch of power, and a lot of the time the link will be idle. ASPM lets this be powered down at runtime, with the only (theoretical!) cost being some additional latency in bringing the link back up. The implementation of ASPM is down to the hardware (the PCIe root and the endpoint handshake the transition between states), but policy is determined by setting bits in PCI configuration space. This can be done by either the OS or the BIOS. The PCIe specification defines how ASPM setup should occur, provides an algorithm for determining whether the latency will be greater than the maximum permitted and gives absolutely no indication under what circumstances ASPM should be enabled. Because this is a complicated feature that will only be tested in hardware if something explicitly enables it, it obviously didn't work in the majority of early PCIe hardware. As a result, there's several heuristics and mechanisms for indicating the presence or absence of working ASPM support. Microsoft test for the presence of one of the bits that was only added late in the PCIe 1.1 specification, and if that's not present don't enable ASPM. We implemented the same policy. Microsoft will also not touch any PCIe configuration bits (including ASPM) if the ACPI/OS handshake for PCIe control doesn't grant full control of PCIe features to the OS. We've also implemented this. Drivers can also implement blacklisting, and there's support for this in Linux in the form of pcie_disable_link_state. The final mechanism is a bit in the ACPI tables. If this bit is set, the platform is indicating to the OS that it doesn't support ASPM. In the past we took that to mean that we simply shouldn't touch the ASPM bits. However, it turns out that there's some systems where the BIOS has enabled ASPM itself, set the "ASPM unsupported" bit and then the hardware falls over when an ASPM transition occurs. The most straightforward thing to assume was that the BIOS was stupid (which is, to be fair, my default assumption) and shouldn't have enabled ASPM. So, since that patch, we clear the ASPM state when the BIOS indicates that the platform doesn't support ASPM. So, is this the right thing to do? We don't know. The ACPI spec doesn't say anything about whether BIOS state should be retained or cleared, and the PCIe spec doesn't refer to the ACPI spec at all. Microsoft don't document how they behave in the presence of this bit. No BIOS vendors have yet seen fit to share their knowledge with us. THe patch fixed some hardware. However, it also disabled ASPM on some other machines where it appeared to work fine. Of course, "Appeared" may be the operative word here. ASPM failures in hardware may be triggered by specific timing issues and may seem like random lockups that nobody ever tracks down. Some of the machines where we're seeing increased power consumption may also now be more stable. Or they may just be chewing more power. It's hard to tell. What alternatives are there? We could keep the status quo and add driver whitelisting for hardware setups that are known to work. The problem is that even where we have specifications for the hardware, we often don't have the errata lists. We don't know for sure whether it works or not. We could revert this patch and add more driver blacklisting. But then we need to track down every device that doesn't work. Or, it's possible that the original code was correct and Linux simply programs the hardware differently, triggering ASPM issues that aren't seen elsewhere. Right now the kernel takes the conservative approach. Users can override this at boot time with pcie_aspm=force. The only safe approach is to have more feedback from the vendors who set this bit as to what their expectations are. And, so far, that's not something we've had a great deal of luck obtaining. -- Matthew Garrett | email@example.com
Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds