|
|
Log in / Subscribe / Register

"Can't be benchmarked" – No.

"Can't be benchmarked" – No.

Posted Sep 7, 2009 6:14 UTC (Mon) by quotemstr (subscriber, #45331)
In reply to: I understand Con's response by aorth
Parent article: BFS vs. mainline scheduler benchmarks and measurements

The explanation is less technical and more psychological. What you're seeing is observer bias. See that poster on LKML who claimed that he was seeing improvements in sub-seconds lag in a 3D FPS (which is probably spinning at 100% CPU anyway)? That's precisely the kind of environment most susceptible to observer bias: a supposed small effect in a noisy signal like game latency.

I'll believe there's something to this "can't be benchmarked" nonsense when I see a double-blind experiment run that shows a statistically significant effect. As the old saying goes, "data is not the plural of anecdote".


to post comments

"Can't be benchmarked" – No.

Posted Sep 7, 2009 6:46 UTC (Mon) by flewellyn (subscriber, #5047) [Link]

Perhaps there is a way to benchmark such things: make a test 3D program which plays multimedia onto a rotating 3D cube, and outputs the frame rate and other latency data on the screen. Run this, then start up some other things that contend for the scheduler, like some I/O (copying a large file?), some network traffic (pinging a host in the LAN?), and such. See how the 3D app holds up under such strain, by watching the numbers.

I don't know how well this would work, but it'd be a test of some kind.

"Can't be benchmarked" – No.

Posted Sep 7, 2009 13:09 UTC (Mon) by cesarb (subscriber, #6266) [Link] (1 responses)

The same poster mentioned frame drops in mplayer. That would be somewhat easy to convert into a benchmark (if mplayer does not output to the console the number of frames dropped, edit its source code to make it do so; then write an app to move the mplayer window around the screen pseudo-randomly and drop it back to the desktop, and see how many frames you can make it drop).

All the other examples mentioned by that poster sound like they could be benchmarkable with some coding effort. For instance, in the Doom 3 example, you would not measure the frame rate, but the frame jitter (record the time of the end of the "flush" call which actually pushes the image to the screen for each frame, subtract from the time for the previous frame, and see which is the highest difference and how uniform the differences are). Even if for some reason you cannot change the source code of your game, you can change the source code of the libraries it calls to do the "flush", or even interpose with LD_PRELOAD or something like it.

You could even measure the "input lag" in his sound example by building a hardware contraption which "presses a key" (by pretending to be a keyboard), listens to the analog audio output, and logs the time difference between the input and the output.

This all seems benchmarkable without the need for a double-blind test.

"Can't be benchmarked" – No.

Posted Sep 7, 2009 17:12 UTC (Mon) by cesarb (subscriber, #6266) [Link]

This is what I meant by interposing with LD_PRELOAD:

http://github.com/cesarb/glxswapbuffersmeasure/tree/master

This is a small quick-and-dirty library I just wrote which hooks into glXSwapBuffers via LD_PRELOAD and prints some statistics to stderr on exit.

An example of its output with everyone's favorite "benchmark" tool, glxgears, on an outdated distribution (thus an older kernel):

LD_PRELOAD=./glxswapbuffersmeasure.so glxgears
1142 frames in 5.0 seconds = 228.375 FPS
1035 frames in 5.0 seconds = 206.474 FPS
934 frames in 5.0 seconds = 186.540 FPS
glXSwapBuffers count: 3947, avg: 0.004757, variance: 0.000045, std dev: 0.006699, max: 0.204504

I did some moving of windows around to make it stutter a bit more, and the output from my test library shows it (200ms max latency, which corresponds to around 5 FPS). Note that the average time between glXSwapBuffers calls approximately matches glxgear's FPS printout.

It should be quite simple for someone who sees latency problems which seem to be cured by BFS to try to run the same 3D game with something like this library both in the mailine scheduler and in BFS and see if it shows any differences in the output. Of course, the code I posted can be enhanced to get better statistics (like a histogram of the latencies); I put the code under Creative Commons CC0 (a bit similar to "public domain").

"Can't be benchmarked" – No.

Posted Sep 7, 2009 13:49 UTC (Mon) by job (guest, #670) [Link]

I disagree. You could, for example, easily count the number of buffer underruns with pulseaudio when playing an mp3 at the same time as you compile the kernel. Then you could do the same thing with a movie.

These are the kinds of things normal users do when latency really counts. The problem is not measuring it, the problem is that nobody is really interested.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds