User: Password:
|
|
Subscribe / Log in / New account

PCIe, power management, and problematic BIOSes

PCIe, power management, and problematic BIOSes

Posted Jun 29, 2011 18:05 UTC (Wed) by flewellyn (subscriber, #5047)
Parent article: PCIe, power management, and problematic BIOSes

>In other words, sometimes the BIOS will tell the system that ASPM is not supported even though ASPM support is present; for added fun, the BIOS may enable ASPM on some devices (even though it says ASPM is not supported) before passing control to the kernel.

This raises the question of exactly why a BIOS would do such an asinine thing. Just what is it about BIOS development that creates such...um...creatively broken code?


(Log in to post comments)

PCIe, power management, and problematic BIOSes

Posted Jun 29, 2011 18:13 UTC (Wed) by mjg59 (subscriber, #23239) [Link]

BIOS authors typically end up with documentation from hardware vendors and a Leading Alternative OS vendor which describe their expectations of the BIOS, and so BIOSes are often written based on those assumptions. The problem is that Linux vendors (even large ones) rarely get to see any of this documentation, even if they have NDAs with the hardware vendors. We're usually left trying to determine the nature of the expected behaviour by looking at shadows cast on walls. So the answer to your question may be "Because the hardware and OS vendor told them to", or it may be "Because their crack was cut with meth that morning". We just don't know.

PCIe, power management, and problematic BIOSes

Posted Jun 29, 2011 18:16 UTC (Wed) by flewellyn (subscriber, #5047) [Link]

I see. So, is there a way for the OS to test for this ASPM support without recourse to the BIOS?

PCIe, power management, and problematic BIOSes

Posted Jun 29, 2011 18:28 UTC (Wed) by mjg59 (subscriber, #23239) [Link]

Not really. In an ideal world hardware that broke with ASPM simply wouldn't advertise ASPM support, but doing that was actually a spec violation until recently. So you're left with hardware advertising ASPM support and you have no way of knowing whether or not it's safe to touch that, and in some cases *turning off* ASPM can cause hardware to explode as well. So you either need complete knowledge of all hardware where ASPM is broken, or you need to assume that the BIOS will give you reliable information as to whether or not ASPM works on that platform.

PCIe, power management, and problematic BIOSes

Posted Jun 29, 2011 20:06 UTC (Wed) by samroberts (guest, #46749) [Link]

But how does the BIOS know whether ASPM works or not?

If it figures out from the devices, the kernel could do this, too.

Since you say the kernel can't do this reliably, I assume the BIOS authors have to hard-code this information about the devices? I guess, in theory, they should know, but it sounds pretty horrible.

PCIe, power management, and problematic BIOSes

Posted Jun 29, 2011 20:13 UTC (Wed) by mjg59 (subscriber, #23239) [Link]

Because the BIOS has been tested with that set of hardware. This is typically a laptop problem, not a desktop one.

PCIe, power management, and problematic BIOSes

Posted Jun 29, 2011 20:24 UTC (Wed) by dlang (subscriber, #313) [Link]

because the bios engineer hard-coded it to say if it works or not.

the BIOS isn't testing things and deciding it doesn't work, it's a hard-coded entry made by the BIOS programmer.

PCIe, power management, and problematic BIOSes

Posted Jun 29, 2011 20:40 UTC (Wed) by flewellyn (subscriber, #5047) [Link]

So we have a situation in which the firmware lies to you, and then penalizes you if you believe it? What a wonderful world.

PCIe, power management, and problematic BIOSes

Posted Jun 29, 2011 21:00 UTC (Wed) by mjg59 (subscriber, #23239) [Link]

Not an uncommon situation, sadly.

PCIe, power management, and problematic BIOSes

Posted Jun 29, 2011 22:17 UTC (Wed) by bronson (subscriber, #4806) [Link]

You're too kind Matthew. To engineer a BIOS, most companies copy something from last year, stab it until it boots, and ship it.

Check out the top vendor on http://smolts.org . Or http://smolts.org/reports/view_profiles?profile=Filled&...

Not much craftsmanship to be found.

PCIe, power management, and problematic BIOSes

Posted Jun 30, 2011 0:53 UTC (Thu) by AndreE (guest, #60148) [Link]

So does the Leading Alternative OS have greater access to documentation and vendor information?

I'm can't see how they themselves could solve this problem without having a perfect blacklist of broken motherboards.

PCIe, power management, and problematic BIOSes

Posted Jun 30, 2011 1:46 UTC (Thu) by BenHutchings (subscriber, #37955) [Link]

Microsoft generally doesn't have this specific information; it relies on OEMs to do the integration.

The OEMs buy in hardware components, which come with Windows drivers already written, and they buy firmware from other companies. Normally they get some specific hardware documentation and source code for the drivers and firmware. They also have documentation and tools for Windows (the DDK).

They need to make the combination of hardware, firmware, OS and drivers work, and quickly. Since they control both firmware and driver code (except for generic drivers provided by Microsoft) they may resort to dirty hacks in either or both of these. The result may work only as a result of undocumented behaviour of Windows.

The challenge for free OS developers is to replicate these driver hacks and undocumented behaviour of Windows where necessary.

PCIe, power management, and problematic BIOSes

Posted Jun 30, 2011 7:26 UTC (Thu) by ebirdie (guest, #512) [Link]

>Since they control both firmware and driver code (except for generic drivers provided by Microsoft) they may resort to dirty hacks in either or both of these. The result may work only as a result of undocumented behaviour of Windows.

I see! That explains the phenomenon we have wondered for years, when working as Windows reinstaller junkies. The question asked, what makes a decent hardware become so unusable with Windows over short and longer periods of time?

Now I understand that the frequency on patching Windows also ruins the underlaying assuptions made when producing "a nice working decent hardware". The common solution is to fix the hardware (slow hardware, sleep states working poorly, poor battery duration, a device not working after awake, irritating moments in prepared presentations where reboot is often the cure to rescue the presentation etc. etc. and after the presentation the laptop is returned for "repair") with buying new hardware, what keeps the wheels going for the group in the party.

PCIe, power management, and problematic BIOSes

Posted Jun 30, 2011 19:29 UTC (Thu) by BenHutchings (subscriber, #37955) [Link]

Note, there are some limiting factors on OEM hacks. If the OEM installs the 64-bit edition of Windows or if they use the 'Designed for Windows' logo then all drivers must pass a fairly challenging test suite (WHQL test). So there is some disincentive to modifying drivers. I'm not sure whether there is a similar test suite for the firmware.

PCIe, power management, and problematic BIOSes

Posted Jul 6, 2011 0:59 UTC (Wed) by cortana (subscriber, #24596) [Link]

Is it true that some drivers will detect that they are being put through the WHQL process, and modify their behaviour so that they pass, but act in a way that they would never do in the real world?

PCIe, power management, and problematic BIOSes

Posted Jul 6, 2011 1:21 UTC (Wed) by BenHutchings (subscriber, #37955) [Link]

Since the WHQL test is run by the submitter, not Microsoft, there are any number of ways to cheat. I assume there are penalties for companies that get caught doing this.

PCIe, power management, and problematic BIOSes

Posted Jul 6, 2011 12:09 UTC (Wed) by nye (guest, #51576) [Link]

>Is it true that some drivers will detect that they are being put through the WHQL process, and modify their behaviour so that they pass, but act in a way that they would never do in the real world?

If they know how they're supposed to act, why would they deliberately act differently in normal circumstances?

PCIe, power management, and problematic BIOSes

Posted Jul 7, 2011 17:13 UTC (Thu) by farnz (subscriber, #17727) [Link]

As an example; DirectX has a DoesDriverSupport method that it calls to see what functionality the driver supports. It's obvious that an implementation that always returns TRUE is faster than one that returns an accurate result.

Less obvious, but still true, is that a driver that can currently support everything the platform uses can return TRUE without checking, and will be a tiny bit faster. There are similar cases throughout any significant sized API, where being wrong happens to work for today's software, and is faster because you do less work - and when those cases are on the fastpath, the driver will do them.

WHQL tries to deliberately break these sorts of things - it looks for cases where the answer can be predicted, and checks that the driver gives the right answer; if it's lying, WHQL will break things.

Hypothetically, for example, imagine that your GPU only has a single thread of execution, used for 3D commands and for putting buffers in the hands of scanout to display, but lets you access buffers from the CPU directly, bypassing the GPU execution. A driver could implement glXSwapBuffers and friends by putting the swap in the GPU's thread of execution, and returning immediately; it could then make glFinish and glFlush no-ops, and not break anything obvious. If Microsoft thought drivers were doing this sort of trick, WHQL could do a glReadPixels immediately after a glFinish, and get the wrong result - the driver's been caught lying.

In the meantime, of course, the driver is faster than the competition's driver in benchmarks people care about - because it's not doing things by the spec, and hoping that you'll never notice the lie.

PCIe, power management, and problematic BIOSes

Posted Jul 12, 2011 13:06 UTC (Tue) by nye (guest, #51576) [Link]

>As an example; DirectX has a DoesDriverSupport method that it calls to see what functionality the driver supports. It's obvious that an implementation that always returns TRUE is faster than one that returns an accurate result.

That would presumably be caught by some things visibly breaking at some point, otherwise there's no point in having it in the first place. (I wonder what the modified version does in that example when it catches the driver lying.)

>Less obvious, but still true, is that a driver that can currently support everything the platform uses can return TRUE without checking, and will be a tiny bit faster. There are similar cases throughout any significant sized API, where being wrong happens to work for today's software, and is faster because you do less work - and when those cases are on the fastpath, the driver will do them.

This does at least make more sense - if it definitely isn't causing any problems now, then I can imagine somebody saying 'we can always update it in the future' - and possibly even believing it.

>In the meantime, of course, the driver is faster than the competition's driver in benchmarks people care about - because it's not doing things by the spec, and hoping that you'll never notice the lie

One might hope that driver authors would expect people to care whether their very fast driver is unstable or has rendering glitches, and if they have a more accurate WHQL-passing driver (as posited upthread) to provide that as an option.

*Sigh*

I guess worse things happen at sea.

PCIe, power management, and problematic BIOSes

Posted Jul 12, 2011 13:56 UTC (Tue) by farnz (subscriber, #17727) [Link]

In Raymond's example, the implementation handles a detected lie by assuming that DoesDriverSupport always returns FALSE, and not using the accelerated paths. In other words, if you're ever caught lying, you're never going to be trusted to do anything sophisticated, even if you could do some acceleration.

Unfortunately, too many people buy hardware on the basis of benchmarks - for an example, look at the QUACK.EXE incident - a GPU driver was set up to detect a specific application used as a benchmark, and cheat.

The problem for buyers of devices with complex drivers is that until you work out what the cheats are, you don't know whether the driver is fast in benchmarks because it cheats, or because it's buggy, or because it's genuinely that fast, or because your applications are buggy and relying on things not guaranteed by the API. Add in closed-source drivers, which can do things like detect the presence of WHQL certification tests on the machine, and you end up with a driver that (for example) is slow and stable when you run the WHQL test suite (thus always passes), but takes shortcuts when WHQL is not running. As benchmarkers rarely have WHQL installed, the driver author gets the "best" of both worlds - stability if you try and test it with WHQL (so you have a WHQL-compliant driver), and fast if you try and benchmark it without WHQL.

Now throw in the idea that applications don't use complex functionality at first, and you see just how painful things can get - the bit that fails on you might be something that no application today uses, at which point it can be years before anyone writes test code that shows the problem is the driver. For some classes of driver (e.g. graphics drivers), people build up a whole set of mythology around things you cannot do, and you develop a set of shared assumptions that aren't actually in the spec, but that "everyone knows" are things that don't work, because drivers traditionally cheated.

same for everybody, not just us

Posted Jul 1, 2011 9:47 UTC (Fri) by tialaramex (subscriber, #21167) [Link]

Also keep in mind that we're approaching, if not already at, the point where "weird bugs with Windows" aren't that much less common than "weird bugs with Linux". Notice I say "with" not "in" because often it's just as likely that the hardware is faulty and the "working" OS just happens not to trigger the fault as is probably the case with this ASPM problem.

My current laptop "forgets" it has built-in speakers if suspended with the headphones plugged in while running Windows. Only a reboot _into Fedora_ fixes the problem (rebooting just Windows has no effect).

The outwardly normal USB to SATA/IDE drive cases I bought recently turn out to occasionally give erroneous read results with IDE drives on Windows. The data on disk is undamaged, but understandably that's not good enough for me. Can't reproduce in Linux. If this had been the other way around, would the retailer have taken them back as faulty? I'm glad I don't have to find out.

PCIe, power management, and problematic BIOSes

Posted Sep 28, 2016 13:15 UTC (Wed) by Hi-Angel (guest, #110915) [Link]

Are those BIOS bugs reported to manufacturers? Because if it isn't, they'd never had a chance to produce a good BIOS, they just doesn't know there is a problem.

PCIe, power management, and problematic BIOSes

Posted Jun 30, 2011 17:08 UTC (Thu) by iceblink (subscriber, #19982) [Link]

In at least one BIOS that I have worked with the ACPI spec was simply mis-interpreted as needing this flag to be set it to 1 in order to enable ASPM. Seems like a silly mistake but I suspect it is common.

The kernel should probably also be checking to see if the FACS claims to support ACPI 4.0 before relying on this flag.

PCIe, power management, and problematic BIOSes

Posted Jun 30, 2011 17:24 UTC (Thu) by mjg59 (subscriber, #23239) [Link]

Did anyone test that ASPM actually ended up enabled as a result? You're right that the check should be made conditional on the FADT version, though.

PCIe, power management, and problematic BIOSes

Posted Jul 2, 2011 2:05 UTC (Sat) by iceblink (subscriber, #19982) [Link]

Yeah we noticed a spike in power testing and ended up with a DMI workaround until BIOS updates can be rolled out: http://crosbug.com/15124

PCIe, power management, and problematic BIOSes

Posted Jul 2, 2011 3:05 UTC (Sat) by mjg59 (subscriber, #23239) [Link]

Ah, ok, I hadn't realised this was Chrome OS. My concern is more that BIOS vendors in general have this opinion, and Windows handles it differently to us in some way.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds