Matthew Garrett responds to the ASPM power regression
[Posted June 29, 2011 by corbet]
| From: |
| Matthew Garrett <mjg59-AT-srcf.ucam.org> |
| To: |
| Jonathan Corbet <corbet-AT-lwn.net> |
| Subject: |
| Re: Two questions |
| Date: |
| Wed, 29 Jun 2011 01:33:32 +0100 |
| Message-ID: |
| <20110629003332.GA27357@srcf.ucam.org> |
| Archive-link: |
| Article, Thread
|
ASPM's an interesting technology. PCIe is clocked at at least 2.5GHz,
and later versions run at 5GHz. Running that clock takes a bunch of
power, and a lot of the time the link will be idle. ASPM lets this be
powered down at runtime, with the only (theoretical!) cost being some
additional latency in bringing the link back up.
The implementation of ASPM is down to the hardware (the PCIe root and
the endpoint handshake the transition between states), but policy is
determined by setting bits in PCI configuration space. This can be done
by either the OS or the BIOS. The PCIe specification defines how ASPM
setup should occur, provides an algorithm for determining whether the
latency will be greater than the maximum permitted and gives absolutely
no indication under what circumstances ASPM should be enabled.
Because this is a complicated feature that will only be tested in
hardware if something explicitly enables it, it obviously didn't work in
the majority of early PCIe hardware. As a result, there's several
heuristics and mechanisms for indicating the presence or absence of
working ASPM support. Microsoft test for the presence of one of the bits
that was only added late in the PCIe 1.1 specification, and if that's
not present don't enable ASPM. We implemented the same policy. Microsoft
will also not touch any PCIe configuration bits (including ASPM) if the
ACPI/OS handshake for PCIe control doesn't grant full control of PCIe
features to the OS. We've also implemented this. Drivers can also
implement blacklisting, and there's support for this in Linux in the
form of pcie_disable_link_state.
The final mechanism is a bit in the ACPI tables. If this bit is set, the
platform is indicating to the OS that it doesn't support ASPM. In the
past we took that to mean that we simply shouldn't touch the ASPM bits.
However, it turns out that there's some systems where the BIOS has
enabled ASPM itself, set the "ASPM unsupported" bit and then the
hardware falls over when an ASPM transition occurs. The most
straightforward thing to assume was that the BIOS was stupid (which
is, to be fair, my default assumption) and shouldn't have enabled ASPM.
So, since that patch, we clear the ASPM state when the BIOS indicates
that the platform doesn't support ASPM.
So, is this the right thing to do? We don't know. The ACPI spec doesn't
say anything about whether BIOS state should be retained or cleared, and
the PCIe spec doesn't refer to the ACPI spec at all. Microsoft don't
document how they behave in the presence of this bit. No BIOS vendors
have yet seen fit to share their knowledge with us. THe patch fixed some
hardware. However, it also disabled ASPM on some other machines where it
appeared to work fine.
Of course, "Appeared" may be the operative word here. ASPM failures in
hardware may be triggered by specific timing issues and may seem like
random lockups that nobody ever tracks down. Some of the machines where
we're seeing increased power consumption may also now be more stable. Or
they may just be chewing more power. It's hard to tell.
What alternatives are there? We could keep the status quo and add driver
whitelisting for hardware setups that are known to work. The problem is
that even where we have specifications for the hardware, we often don't
have the errata lists. We don't know for sure whether it works or not.
We could revert this patch and add more driver blacklisting. But then we
need to track down every device that doesn't work. Or, it's possible
that the original code was correct and Linux simply programs the
hardware differently, triggering ASPM issues that aren't seen elsewhere.
Right now the kernel takes the conservative approach. Users can override
this at boot time with pcie_aspm=force. The only safe approach is to
have more feedback from the vendors who set this bit as to what their
expectations are. And, so far, that's not something we've had a great
deal of luck obtaining.
--
Matthew Garrett | mjg59@srcf.ucam.org
(
Log in to post comments)