Michael Larabel and Matthew Tippett of Phoronix took the wraps off of their
latest project at SCALE
9x in late February. Named OpenBenchmarking, the project
builds on the Phoronix Test
Suite to allow everyday users to run a series of tightly controlled
hardware benchmarking tests, then upload the results to a public
database. The data can then be correlated with automatically-harvested details about the operating system and software stack. The practical upshot is that other users can browse in the collected data for a far more comprehensive look at individual system components.
The talk was entitled "Making
More Informed Linux Hardware Choices," which succinctly sums up Larabel
and Tippett's primary use-case for OpenBenchmarking: a user is shopping for
a new component, and consults the OpenBenchmarking database to find
detailed, statistically significant reports on compatibility and
performance. They introduced the topic by positing that Linux-using
hardware shoppers always begin by asking "does it work?" (a
question for which consulting Google and reading discussion forums is generally the only way to find an answer), followed by "how fast is it?" (a question on which there are even fewer solid answers).
In the past, they said, commercial distributions have published
"hardware compatibility lists," but these tended to be minimal, well behind
the cutting edge, and often only tested for bare-bones functionality. For
example, a graphics card might be published on a distribution's hardware
compatibility list if it functions on servers using the VGA driver, but
that is unlikely to help a home desktop user who really wants to
know if hardware acceleration works for 3D games and video playback. The
performance data available on the web, when it does address Linux, either
does not specify the test conditions well enough to be helpful, or it is completely anecdotal.
The Phoronix Test Suite is a test framework that can run a wide
selection of benchmarking tests, from those that test raw graphics, memory,
and disk performance, to those that test specific CPU features. Phoronix
is very GPU-centric, and thus many of the tests focus on graphics cards,
but as a whole the test suite can be used to run tightly-controlled tests
on just about any performance factor, and gather a detailed system profile
to accompany it. The profile even includes system logs, to flag extraneous
events that might have affected the test.
In broad strokes, OpenBenchmarking is designed to provide these missing pieces of the hardware shopping puzzle. Its creators, naturally, have built it around their own benchmarking test suite, but the same essential service could be developed around any collection of benchmarking tests. OpenBenchmarking tries to add to that basic functionality by profiling the system on which the test was run, including the full complement of hardware on the system, distribution and version, and the precise kernel, video driver, and X server options used. Site users can fine-tune their comparisons to closely match their own systems, or to compare results for a particular test across some orthogonal factor (to see, for example, whether Ubuntu or openSUSE produces the higher frame rate for a particular graphics test, or faster builds on a particular compiler test).
The site also offers some more complex features, such as linking to test results found on external sites, comparing results for multiple components in parallel, and a "product finder" search tool that whittles down a particular hardware buying decision with a sequence of questions. At the moment, the product finder can only help you search for a CPU, motherboard, or graphics card, but the parallel comparison tool has no limitations. You can select any test (MP3 encoding, Apache stress testing, raytracing, etc), then add any of the results to your custom comparison, and the site will generate graphs pitting them against each other head-to-head — and almost instantaneously, to boot.
Data, data, data
The team has also developed its own visualizations for the test results, most notably "heatmaps" of histogram data. The concept is that traditional bar-graph style histograms all begin to look alike when there are too many data points, so OpenBenchmarking displays "bird's eye" views of the graph instead, where higher numbers are rendered as darker pixels, and lower numbers as lighter pixels. I am not persuaded that heatmaps are a substantially different perspective (no pun intended), but during the demo Larabel and Tippett showed some large data sets where it was easier to pick out outliers in a heatmap than on a bar graph. For their part, all of the graphs are easy to read: all axes are labeled, scores' histograms are drawn to size but also labeled with the precise measurement, and each graph includes a key indicating the standard error (for the statisticians) and whether higher or lower numbers are best (for the non-statisticians).
Presenting the data visually is only half of the battle, though. The
OpenBenchmarking site is designed to make it easy to "drill down" in any
direction from a test result page. Each test result that you see comes
with hyperlinked tables, so that you can (for example) click on the hard
disk in the "System Information" header, and access all tests in the
database that include that particular model, even if the test you were
looking at had nothing to do with disk performance. During the talk,
Larabel and Tippett said that the components database combines data for
re-branded products, which is a bigger issue for graphics cards than some
other product types, and should make the data sets more usable for component-hunters.
You can add additional statistical measurements to each results page or filter out some results if that better suits your purpose. Test pages allow users to leave comments (although comments are scarce for now), as well as to save a direct link to any custom comparison, alleviating the need to re-add the individual factors on a subsequent visit. Finally, if you have the Phoronix Test Suite package installed on your system, you can click a link on every page to launch the selected test on your own machine.
Direct data export and a public query API are slated to come to the service later this year, as are some additional measurements and features. The speakers said they would be adding periodic "general community performance" snapshots that summarized all recent tests, and a "virtual build-a-system" tool that lets users define a system performance profile, and find which combinations of components will let them achieve it (and at which price points).
They also mentioned a "Phoronix Certification and Qualification Suite" (PCQS) for formal performance validation, which is a peculiar choice considering that they had harsh words for the certifications provided by OEMs and software vendors. High on the list of problems they cited with other certification programs was that they are invented to further the interests of either the hardware vendor or the software maker, so perhaps the PCQS is going to be more trustworthy because Phoronix itself is neither. We will have to wait and see when the details are announced.
In the audience question-and-answer portion of the talk, the two explained some of the steps they take to prevent outliers from corrupting overall performance data, and to prevent device manufacturers from trying to "game" the system by uploading slanted results in hopes of bolstering their products' reputation. There is active monitoring of the data set, but the long-term reliability seems to come from crowdsourcing — i.e., if particular numbers seem too good to be true and are un-reproducible, site readers will notice them and raise alarms.
You can sample the OpenBenchmarking service for yourself today. The site claims more than 480,000 components and more than 74,000 systems, resulting in more than 37,000 test result files. I'm not sure exactly how those numbers fit together, but there are clearly enough data points to see what OpenBenchmarking is capable of.
From a usability standpoint, OpenBenchmarking throws you right into the deep end. From the home page you can jump right into the top searches, top hardware, and top software, and from the sidebar immediately click through to the most recent tests, the most recent test suites, and the most recent test profiles.
But you might need to spend a few minutes reading through the site documentation to get a grasp of the difference between a test suite and a test profile before you do so. To be frank, I am not entirely clear I grasp the definitions; a test suite is described as "an XML file that defines tests and suites for which the Phoronix Test Suite, or other OpenBenchmarking.org schema-compliant test clients, are able to execute in a defined, pre-configured form," while a test profile "is comprised of an XML file and set of scripts that define how the Phoronix Test Suite or other OpenBenchmarking.org schema-compliant test clients interact with an individual test and provide abstraction for all relevant test information." Got it?
Fortunately, digging in to find actual performance numbers is easier.
Wherever you see a component, an operating system, or the name of a test,
you can click directly on it to bring up all of the OpenBenchmarking
database numbers related to it. In fact, almost every element on every
page seems to be a clickable link, including many that are not
distinguished from ordinary inactive text. There are some layout problems,
such as the way the "System Information" header is embedded in a
horizontally-scrollable table that doesn't fit into the main content
column, and uses comically-wide columns of its own that are sized to fit long product names onto a single line.
Some elements of the test result pages need further explanation. For example, each graph has a tiny hyperlinked "T" in the bottom right-hand corner, which when clicked turns the graph into a table, but this feature is not documented anywhere, and on a page filled with identical graphs, it is easily lost.
All of those are cosmetic design issues. But it is also frequently confusing that each user who uploads a set of test results evidently gets to assign a custom name to his or her specific system. Here's why. For an individual, the best way to differentiate between two systems being tested is often to name them according to the major piece of hardware being considered (such as two different graphics cards). Once those results are uploaded to the public database, though, the results are accessible through every dimension of the system configuration. This means you can end up looking at a CPU benchmark test where the data series are labeled with graphics card names, because that is what the original uploader assigned to their test set-ups.
It is also extremely easy to head off on a tangent when browsing through
search results, when narrowing down the search results might be better.
For instance, you can click on a component, such as the Intel
Core 2 Quad Q6600 processor, and get a page pulling all of the test
sets that involve it out of the database. The top of the page lists the
various operating systems, kernels, and motherboards used in the database's
Q6600 tests. But clicking on any of those links does not filter the
existing results to include just the clicked-on option: it leads you off onto the start page devoted to that option.
Benchmarks gone wild
In general, OpenBenchmarking does contain a wealth of data, but it needs work on exposing that data in an easy-to-navigate form. Maximizing hardware performance is a hobby with intense appeal to some Linux users and none whatsoever to others, so perhaps OpenBenchmarking's current usability is a good fit for the speed-obsessed. What is more interesting to the broader Linux community are the hints that Larabel and Tippett dropped about ways the total data set can be mined and mashed-up in the future.
For example, they discussed the possibility that the data could be used to automatically spot regressions in system software, so that OpenBenchmarking might detect a bug in the new release of a video card driver that slipped under the radar during testing. Or the data could track performance over a long period of time, and discover that a Linux distribution has been getting gradually slower at certain tasks, which might prompt users to pick a different distribution for their long-term deployment.
For these use cases, we will likely have to wait for additional features
to roll out on the OpenBenchmarking site. System tweakers will generate and
upload most of the data to
OpenBenchmarking — that data is more useful to them in the short
term, while it will be most enlightening to others only when mashed-up and
Comments (1 posted)
If we broke every piece of code that was broken we'd have very
little working code.
-- Matthew Garrett
I love you all dearly, but I also love GNOME. I feel that it's
juvenile to beat down on other free software projects' hard
work. It really breaks my heart to see this going on. Don't you
think that there are more constructive and less personal ways to
voice your feedback, concerns, and critiques?
If you would like to participate in
critically-important activities for the fun of it, might I suggest
a more worthy cause: promoting the glorious and miraculous hot dog
that will surely be the 'weiner' of the Fedora 16 naming contest?
-- Máirín Duffy
All I can say, Jesse, is that I am very, very glad that I don't
have to go running off to CPAN just to get Unicode work done in
Perl, as apparently I must in order to get OO work done in Perl.
At least this shows we do recognize where our core competency and
true focus are, and it's not in OO.
-- Tom Christiansen
You can of course say: I don't need 3G, no Audio, D-Bus is evil
anyway, and I don't want to print, and plug'n'play isn't for me
anyway, and I just want my 80's style Unix back. Then, sure, a
separate /usr will work fine for you. But if that's really you then
you probably are not running a shiny new systemd installation, but
rather Slackware 1.0. So why are you reading this anyway?
-- Lennart Poettering
Comments (15 posted)
Google has announced
the release of version 10 of the Chrome browser. New stuff includes better
including better plugin blocking and sandboxing of the Flash
plugin (on Windows only, alas).
Comments (30 posted)
The first LyX 2.0.0 release candidate is out; now would be a good time for
LyX users to test things out and find the last remaining bugs. (LWN looked at LyX 2.0
Full Story (comments: none)
Version 2.1.0 of the PacketFence network access control system is
available. Changes in this release include support for a few new routers,
detection, improved desktop Linux client support, and more.
Full Story (comments: none)
is a "Python reverse engineering" tool which generates UML diagrams from
Python source. The 1.5 release adds Python 2.6 and 2.7 compatibility,
improved menus, and more.
Full Story (comments: none)
Version 4.9.0 of the RPM package manager is out. Improvements include a
refusal to install packages which are obsoleted by other, already installed
packages, the ability to install self-conflicting packages, more readable
output, additional query options, and more; see the release notes
Full Story (comments: none)
Newsletters and articles
Comments (none posted)
The H has a brief report
on the status of X RandR (Resize and Rotate) 1.4 in light of it being pulled from X.Org 1.10 at the last moment due to concerns about the protocol
. "Version 1.4 of Resize and Rotate promises per-CRTC pixmaps and the possibility of support from NVIDIA's proprietary Linux graphics driver. At present, Linux users with NVIDIA cards (NVIDIA is estimated to hold roughly a 30% share of the graphics market) must use the proprietary NVIDIA settings utility. With NVIDIA's driver with RandR support, screen resolution could be adjusted in a more integrated fashion from the desktop.
Comments (4 posted)
Page editor: Jonathan Corbet
Next page: Announcements>>