Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for December 5, 2013
Deadline scheduling: coming soon?
LWN.net Weekly Edition for November 27, 2013
ACPI for ARM?
LWN.net Weekly Edition for November 21, 2013
PCIe, power management, and problematic BIOSes
Posted Jul 6, 2011 1:21 UTC (Wed) by BenHutchings (subscriber, #37955)
Posted Jul 6, 2011 12:09 UTC (Wed) by nye (guest, #51576)
If they know how they're supposed to act, why would they deliberately act differently in normal circumstances?
Posted Jul 7, 2011 17:13 UTC (Thu) by farnz (guest, #17727)
As an example; DirectX has a DoesDriverSupport method that it calls to see what functionality the driver supports. It's obvious that an implementation that always returns TRUE is faster than one that returns an accurate result.
Less obvious, but still true, is that a driver that can currently support everything the platform uses can return TRUE without checking, and will be a tiny bit faster. There are similar cases throughout any significant sized API, where being wrong happens to work for today's software, and is faster because you do less work - and when those cases are on the fastpath, the driver will do them.
WHQL tries to deliberately break these sorts of things - it looks for cases where the answer can be predicted, and checks that the driver gives the right answer; if it's lying, WHQL will break things.
Hypothetically, for example, imagine that your GPU only has a single thread of execution, used for 3D commands and for putting buffers in the hands of scanout to display, but lets you access buffers from the CPU directly, bypassing the GPU execution. A driver could implement glXSwapBuffers and friends by putting the swap in the GPU's thread of execution, and returning immediately; it could then make glFinish and glFlush no-ops, and not break anything obvious. If Microsoft thought drivers were doing this sort of trick, WHQL could do a glReadPixels immediately after a glFinish, and get the wrong result - the driver's been caught lying.
In the meantime, of course, the driver is faster than the competition's driver in benchmarks people care about - because it's not doing things by the spec, and hoping that you'll never notice the lie.
Posted Jul 12, 2011 13:06 UTC (Tue) by nye (guest, #51576)
That would presumably be caught by some things visibly breaking at some point, otherwise there's no point in having it in the first place. (I wonder what the modified version does in that example when it catches the driver lying.)
>Less obvious, but still true, is that a driver that can currently support everything the platform uses can return TRUE without checking, and will be a tiny bit faster. There are similar cases throughout any significant sized API, where being wrong happens to work for today's software, and is faster because you do less work - and when those cases are on the fastpath, the driver will do them.
This does at least make more sense - if it definitely isn't causing any problems now, then I can imagine somebody saying 'we can always update it in the future' - and possibly even believing it.
>In the meantime, of course, the driver is faster than the competition's driver in benchmarks people care about - because it's not doing things by the spec, and hoping that you'll never notice the lie
One might hope that driver authors would expect people to care whether their very fast driver is unstable or has rendering glitches, and if they have a more accurate WHQL-passing driver (as posited upthread) to provide that as an option.
I guess worse things happen at sea.
Posted Jul 12, 2011 13:56 UTC (Tue) by farnz (guest, #17727)
In Raymond's example, the implementation handles a detected lie by assuming that DoesDriverSupport always returns FALSE, and not using the accelerated paths. In other words, if you're ever caught lying, you're never going to be trusted to do anything sophisticated, even if you could do some acceleration.
Unfortunately, too many people buy hardware on the basis of benchmarks - for an example, look at the QUACK.EXE incident - a GPU driver was set up to detect a specific application used as a benchmark, and cheat.
The problem for buyers of devices with complex drivers is that until you work out what the cheats are, you don't know whether the driver is fast in benchmarks because it cheats, or because it's buggy, or because it's genuinely that fast, or because your applications are buggy and relying on things not guaranteed by the API. Add in closed-source drivers, which can do things like detect the presence of WHQL certification tests on the machine, and you end up with a driver that (for example) is slow and stable when you run the WHQL test suite (thus always passes), but takes shortcuts when WHQL is not running. As benchmarkers rarely have WHQL installed, the driver author gets the "best" of both worlds - stability if you try and test it with WHQL (so you have a WHQL-compliant driver), and fast if you try and benchmark it without WHQL.
Now throw in the idea that applications don't use complex functionality at first, and you see just how painful things can get - the bit that fails on you might be something that no application today uses, at which point it can be years before anyone writes test code that shows the problem is the driver. For some classes of driver (e.g. graphics drivers), people build up a whole set of mythology around things you cannot do, and you develop a set of shared assumptions that aren't actually in the spec, but that "everyone knows" are things that don't work, because drivers traditionally cheated.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds