Development

SCALE: Phoronix launches OpenBenchmarking

March 9, 2011

This article was contributed by Nathan Willis

Michael Larabel and Matthew Tippett of Phoronix took the wraps off of their latest project at SCALE 9x in late February. Named OpenBenchmarking, the project builds on the Phoronix Test Suite to allow everyday users to run a series of tightly controlled hardware benchmarking tests, then upload the results to a public database. The data can then be correlated with automatically-harvested details about the operating system and software stack. The practical upshot is that other users can browse in the collected data for a far more comprehensive look at individual system components.

The talk was entitled "Making More Informed Linux Hardware Choices," which succinctly sums up Larabel and Tippett's primary use-case for OpenBenchmarking: a user is shopping for a new component, and consults the OpenBenchmarking database to find detailed, statistically significant reports on compatibility and performance. They introduced the topic by positing that Linux-using hardware shoppers always begin by asking "does it work?" (a question for which consulting Google and reading discussion forums is generally the only way to find an answer), followed by "how fast is it?" (a question on which there are even fewer solid answers).

In the past, they said, commercial distributions have published "hardware compatibility lists," but these tended to be minimal, well behind the cutting edge, and often only tested for bare-bones functionality. For example, a graphics card might be published on a distribution's hardware compatibility list if it functions on servers using the VGA driver, but that is unlikely to help a home desktop user who really wants to know if hardware acceleration works for 3D games and video playback. The performance data available on the web, when it does address Linux, either does not specify the test conditions well enough to be helpful, or it is completely anecdotal.

The Phoronix Test Suite is a test framework that can run a wide selection of benchmarking tests, from those that test raw graphics, memory, and disk performance, to those that test specific CPU features. Phoronix is very GPU-centric, and thus many of the tests focus on graphics cards, but as a whole the test suite can be used to run tightly-controlled tests on just about any performance factor, and gather a detailed system profile to accompany it. The profile even includes system logs, to flag extraneous events that might have affected the test.

In broad strokes, OpenBenchmarking is designed to provide these missing pieces of the hardware shopping puzzle. Its creators, naturally, have built it around their own benchmarking test suite, but the same essential service could be developed around any collection of benchmarking tests. OpenBenchmarking tries to add to that basic functionality by profiling the system on which the test was run, including the full complement of hardware on the system, distribution and version, and the precise kernel, video driver, and X server options used. Site users can fine-tune their comparisons to closely match their own systems, or to compare results for a particular test across some orthogonal factor (to see, for example, whether Ubuntu or openSUSE produces the higher frame rate for a particular graphics test, or faster builds on a particular compiler test).

The site also offers some more complex features, such as linking to test results found on external sites, comparing results for multiple components in parallel, and a "product finder" search tool that whittles down a particular hardware buying decision with a sequence of questions. At the moment, the product finder can only help you search for a CPU, motherboard, or graphics card, but the parallel comparison tool has no limitations. You can select any test (MP3 encoding, Apache stress testing, raytracing, etc), then add any of the results to your custom comparison, and the site will generate graphs pitting them against each other head-to-head — and almost instantaneously, to boot.

Data, data, data

The team has also developed its own visualizations for the test results, most notably "heatmaps" of histogram data. The concept is that traditional bar-graph style histograms all begin to look alike when there are too many data points, so OpenBenchmarking displays "bird's eye" views of the graph instead, where higher numbers are rendered as darker pixels, and lower numbers as lighter pixels. I am not persuaded that heatmaps are a substantially different perspective (no pun intended), but during the demo Larabel and Tippett showed some large data sets where it was easier to pick out outliers in a heatmap than on a bar graph. For their part, all of the graphs are easy to read: all axes are labeled, scores' histograms are drawn to size but also labeled with the precise measurement, and each graph includes a key indicating the standard error (for the statisticians) and whether higher or lower numbers are best (for the non-statisticians).

Presenting the data visually is only half of the battle, though. The OpenBenchmarking site is designed to make it easy to "drill down" in any direction from a test result page. Each test result that you see comes with hyperlinked tables, so that you can (for example) click on the hard disk in the "System Information" header, and access all tests in the database that include that particular model, even if the test you were looking at had nothing to do with disk performance. During the talk, Larabel and Tippett said that the components database combines data for re-branded products, which is a bigger issue for graphics cards than some other product types, and should make the data sets more usable for component-hunters.

You can add additional statistical measurements to each results page or filter out some results if that better suits your purpose. Test pages allow users to leave comments (although comments are scarce for now), as well as to save a direct link to any custom comparison, alleviating the need to re-add the individual factors on a subsequent visit. Finally, if you have the Phoronix Test Suite package installed on your system, you can click a link on every page to launch the selected test on your own machine.

Direct data export and a public query API are slated to come to the service later this year, as are some additional measurements and features. The speakers said they would be adding periodic "general community performance" snapshots that summarized all recent tests, and a "virtual build-a-system" tool that lets users define a system performance profile, and find which combinations of components will let them achieve it (and at which price points).

They also mentioned a "Phoronix Certification and Qualification Suite" (PCQS) for formal performance validation, which is a peculiar choice considering that they had harsh words for the certifications provided by OEMs and software vendors. High on the list of problems they cited with other certification programs was that they are invented to further the interests of either the hardware vendor or the software maker, so perhaps the PCQS is going to be more trustworthy because Phoronix itself is neither. We will have to wait and see when the details are announced.

In the audience question-and-answer portion of the talk, the two explained some of the steps they take to prevent outliers from corrupting overall performance data, and to prevent device manufacturers from trying to "game" the system by uploading slanted results in hopes of bolstering their products' reputation. There is active monitoring of the data set, but the long-term reliability seems to come from crowdsourcing — i.e., if particular numbers seem too good to be true and are un-reproducible, site readers will notice them and raise alarms.

Drilling in

You can sample the OpenBenchmarking service for yourself today. The site claims more than 480,000 components and more than 74,000 systems, resulting in more than 37,000 test result files. I'm not sure exactly how those numbers fit together, but there are clearly enough data points to see what OpenBenchmarking is capable of.

From a usability standpoint, OpenBenchmarking throws you right into the deep end. From the home page you can jump right into the top searches, top hardware, and top software, and from the sidebar immediately click through to the most recent tests, the most recent test suites, and the most recent test profiles.

But you might need to spend a few minutes reading through the site documentation to get a grasp of the difference between a test suite and a test profile before you do so. To be frank, I am not entirely clear I grasp the definitions; a test suite is described as "an XML file that defines tests and suites for which the Phoronix Test Suite, or other OpenBenchmarking.org schema-compliant test clients, are able to execute in a defined, pre-configured form", while a test profile "is comprised of an XML file and set of scripts that define how the Phoronix Test Suite or other OpenBenchmarking.org schema-compliant test clients interact with an individual test and provide abstraction for all relevant test information". Got it?

Fortunately, digging in to find actual performance numbers is easier. Wherever you see a component, an operating system, or the name of a test, you can click directly on it to bring up all of the OpenBenchmarking database numbers related to it. In fact, almost every element on every page seems to be a clickable link, including many that are not distinguished from ordinary inactive text. There are some layout problems, such as the way the "System Information" header is embedded in a horizontally-scrollable table that doesn't fit into the main content column, and uses comically-wide columns of its own that are sized to fit long product names onto a single line.

Some elements of the test result pages need further explanation. For example, each graph has a tiny hyperlinked "T" in the bottom right-hand corner, which when clicked turns the graph into a table, but this feature is not documented anywhere, and on a page filled with identical graphs, it is easily lost.

All of those are cosmetic design issues. But it is also frequently confusing that each user who uploads a set of test results evidently gets to assign a custom name to his or her specific system. Here's why. For an individual, the best way to differentiate between two systems being tested is often to name them according to the major piece of hardware being considered (such as two different graphics cards). Once those results are uploaded to the public database, though, the results are accessible through every dimension of the system configuration. This means you can end up looking at a CPU benchmark test where the data series are labeled with graphics card names, because that is what the original uploader assigned to their test set-ups.

It is also extremely easy to head off on a tangent when browsing through search results, when narrowing down the search results might be better. For instance, you can click on a component, such as the Intel Core 2 Quad Q6600 processor, and get a page pulling all of the test sets that involve it out of the database. The top of the page lists the various operating systems, kernels, and motherboards used in the database's Q6600 tests. But clicking on any of those links does not filter the existing results to include just the clicked-on option: it leads you off onto the start page devoted to that option.

Benchmarks gone wild

In general, OpenBenchmarking does contain a wealth of data, but it needs work on exposing that data in an easy-to-navigate form. Maximizing hardware performance is a hobby with intense appeal to some Linux users and none whatsoever to others, so perhaps OpenBenchmarking's current usability is a good fit for the speed-obsessed. What is more interesting to the broader Linux community are the hints that Larabel and Tippett dropped about ways the total data set can be mined and mashed-up in the future.

For example, they discussed the possibility that the data could be used to automatically spot regressions in system software, so that OpenBenchmarking might detect a bug in the new release of a video card driver that slipped under the radar during testing. Or the data could track performance over a long period of time, and discover that a Linux distribution has been getting gradually slower at certain tasks, which might prompt users to pick a different distribution for their long-term deployment.

For these use cases, we will likely have to wait for additional features to roll out on the OpenBenchmarking site. System tweakers will generate and upload most of the data to OpenBenchmarking — that data is more useful to them in the short term, while it will be most enlightening to others only when mashed-up and transformed.

Comments (1 posted)

Brief items

Quotes of the week

If we broke every piece of code that was broken we'd have very little working code.

-- Matthew Garrett

I love you all dearly, but I also love GNOME. I feel that it's juvenile to beat down on other free software projects' hard work. It really breaks my heart to see this going on. Don't you think that there are more constructive and less personal ways to voice your feedback, concerns, and critiques?

If you would like to participate in ~~juvenile~~ critically-important activities for the fun of it, might I suggest a more worthy cause: promoting the glorious and miraculous hot dog that will surely be the 'weiner' of the Fedora 16 naming contest?

-- Máirín Duffy

All I can say, Jesse, is that I am very, very glad that I don't have to go running off to CPAN just to get Unicode work done in Perl, as apparently I must in order to get OO work done in Perl. At least this shows we do recognize where our core competency and true focus are, and it's not in OO.

-- Tom Christiansen

You can of course say: I don't need 3G, no Audio, D-Bus is evil anyway, and I don't want to print, and plug'n'play isn't for me anyway, and I just want my 80's style Unix back. Then, sure, a separate /usr will work fine for you. But if that's really you then you probably are not running a shiny new systemd installation, but rather Slackware 1.0. So why are you reading this anyway?

-- Lennart Poettering

Comments (15 posted)

Chrome 10 released

Google has announced the release of version 10 of the Chrome browser. New stuff includes better JavaScript performance, GPU-accelerated video, and a number of security features including better plugin blocking and sandboxing of the Flash plugin (on Windows only, alas).

Comments (30 posted)

LyX version 2.0.0 (release candidate 1)

The first LyX 2.0.0 release candidate is out; now would be a good time for LyX users to test things out and find the last remaining bugs. (LWN looked at LyX 2.0 last November).

Full Story (comments: none)

PacketFence 2.1.0 released

Version 2.1.0 of the PacketFence network access control system is available. Changes in this release include support for a few new routers, a new configuration validation interface, JavaScript-based network access detection, improved desktop Linux client support, and more.

Full Story (comments: none)

PyNSource 1.5 released

PyNSource is a "Python reverse engineering" tool which generates UML diagrams from Python source. The 1.5 release adds Python 2.6 and 2.7 compatibility, improved menus, and more.

Full Story (comments: none)

RPM 4.9.0 released

Version 4.9.0 of the RPM package manager is out. Improvements include a refusal to install packages which are obsoleted by other, already installed packages, the ability to install self-conflicting packages, more readable -i output, additional query options, and more; see the release notes for details.

Full Story (comments: none)

Newsletters and articles

Development newsletters from the last week

Caml Weekly News (March 8)
LibreOffice development summary (March 9)
Linux Audio Monthly Roundup (February)
PostgreSQL Weekly News (March 6)

Comments (none posted)

A restart for RandR 1.4 (The H)

The H has a brief report on the status of X RandR (Resize and Rotate) 1.4 in light of it being pulled from X.Org 1.10 at the last moment due to concerns about the protocol. "Version 1.4 of Resize and Rotate promises per-CRTC pixmaps and the possibility of support from NVIDIA's proprietary Linux graphics driver. At present, Linux users with NVIDIA cards (NVIDIA is estimated to hold roughly a 30% share of the graphics market) must use the proprietary NVIDIA settings utility. With NVIDIA's driver with RandR support, screen resolution could be adjusted in a more integrated fashion from the desktop."

Comments (4 posted)

Page editor: Jonathan Corbet
Next page: Announcements>>