LWN.net Logo

Building a High-Performance Cluster with Gentoo

Building a High-Performance Cluster with Gentoo

Posted Apr 11, 2007 4:09 UTC (Wed) by gdt (subscriber, #6284)
Parent article: Building a High-Performance Cluster with Gentoo

The main advantage of using Gentoo seems to be Portage. The incremental upgrade approach of Portage might well be worth the effort.

The "big upgrade" approach of Red Hat doesn't seem to cut it. My experience [1] with pushing around Large Hadron Collider datasets is that there's still a lot of clusters running RH9, with its kernel's lack of ability to push big fat network pipes to capacity. Whenever I ask a HPC administator to upgrade to a recent kernel I get a look of horror.

So if you argue that system libraries and kernels don't matter to HPC performance, then why are HPC administrators so reluctant to change something so irrelevant? Perhaps it is because the packaging solution they use makes the risk of a change too great. It would well be worth a look at Gentoo to see if its packaging system lowers the risk of changes.

[1] a network engineer at a academic and research network with responsibility for end-to-end performance.


(Log in to post comments)

Building a High-Performance Cluster with Gentoo

Posted Apr 11, 2007 4:34 UTC (Wed) by njs (guest, #40338) [Link]

There are now plenty of distributions that have excellent incremental upgrade support -- Debian is the classic leader here (and my preference), but it's not unique. AFAIK Red Hat does pretty well these days too. So portage might well beat out RH9 (which is what, 4 years old at this point?), but that's not really saying much.

And, if your criterion is minimizing the risk of upgrades, then a source-based distribution like Gentoo will necessarily be worse than a binary-based one. With a binary-based distribution, everyone is running exactly the same executables, and the chance that you will be the first person to trip over some bug is minimized. With a source-based distribution, it's entirely possible that you are the only person in the world to have packages built with your exact combination of header files, compiler version, and USE and compiler flags -- so even if the bug tracker says that some piece of software has been out for 6 months with no reported problems, that's no guarantee that it'll work for *you*. Of course, you can minimize this by sticking to well-known compiler versions and declining to fiddle with compile flags, but if you're doing that then why bother with a source-based distro at all?

Building a High-Performance Cluster with Gentoo

Posted Apr 11, 2007 7:14 UTC (Wed) by amacater (subscriber, #790) [Link]

The classic answer on the Beowulf list: It depends. It depends on whether
you admin. your own server or have to rely on central admin. It depends on
the size of your cluster and, more importantly, who your hardware vendors
are. If you buy a 2048 node cluster from IBM, to some exent it's easier to
take the hardware vendor's choice of distro and cluster admin tools. HP's
choice may be different from Penguin's. Two further considerations: fast
interconnect hardware (Quadrics/Mellanox ...) which is an essential for
some classes of problem needs drivers. The companies are relatively small
in terms of staff size and are operating on tight margins in a small
market. It may be that they haven't time to sort out a Debian/Gentoo/Yellow
Dog ... hardware card driver. Lastly, there's the high performance compiler
writers and high-end proprietary software types: they want to debug a known
kernel/memory combination when they get oops reports. You can run highly
successful infrastructures on whichever distribution you like - as ever,
your problem set, resources, time and effort will differ from everyone
else's and, sometimes, it's easier to buy a system off the shelf so that
your users can concentrate on coding and running jobs. Read the Beowulf
list archives for this discussion and minor variants - many times :)

Building a High-Performance Cluster with Gentoo

Posted Apr 11, 2007 16:21 UTC (Wed) by dlang (✭ supporter ✭, #313) [Link]

useing the approach described above you don't have different systems running different versions of things (unless you want them to). with the binary package server you have one box compile the code with the optimizations that you want, and then it makes the results available to all the other systems (assuming that they are identical)

I haven't done head-to-head performance comparisons with gentoo, but I have seen cases where optimizing the kernel could result in 20-30% performance improvements in the past (back in the 1GHz athlon days). on modern 64 bit hardware it's less of an issue becouse there's less variability between hardware, and therefor less difference betwen optimized versions and the generic versions.

where I actually see the benifit of gentoo where I use it (my home server) is in the ability to configure the packages with the options and dependancies that I want them to have (this means turning on some that other distros would leave off, but mostly turning off options that other distros turn on, but I don't care about)

Building a High-Performance Cluster with Gentoo

Posted Apr 12, 2007 6:29 UTC (Thu) by njs (guest, #40338) [Link]

>using the approach described above you don't have different systems running different versions of things (unless you want them to).

You misunderstand -- the point is that all your systems might be the same, but they'll be different from everyone else's systems. For instance, they will be different from the people who you let upgrade to cool new version of Foobar2000 first, so that they could trip over the nasty bugs and get them fixed before you hit them. (Plus the maintainers tasked with fixing those bugs have a huge combinatorial space of configurations they are trying to support.)

Building a High-Performance Cluster with Gentoo

Posted Apr 12, 2007 13:06 UTC (Thu) by nix (subscriber, #2304) [Link]

Indeed. This is one of the reasons *why* I run bleeding-edge systems on all my systems for which stability is relatively unimportant: specifically so that I can find niggling portability bugs before other people. I find a few a month, typically (sometimes a few a week, sometimes none for a month or two, but the trickle never stops completely).

Building a High-Performance Cluster with Gentoo

Posted Apr 19, 2007 12:35 UTC (Thu) by piggy (subscriber, #18693) [Link]

I would question the claim that a source-based distro necessarily sees a higher risk of encountering obscure and subtle bugs than a binary-based distro. Your reasoning is sound, but my empirical experience suggests that the reverse may be true.

My experience as a developer for a vendor of a binary-only commercial Unix clone demonstrates that the range of strange PC hardware out there is more than sufficient to exercise plenty of unique corner cases.

My other stint of experience comes from working for an embedded Linux vendor. We saw a LOT more trouble from people trying to piece together tiny distributions from prebuilt binaries (even all from the same source) than from people willing to build everything from source. A very common problem we saw with people who tried to do all of their system work with binaries only was subtle version dependencies among libraries as people upgraded individual packages over time. These problems simply do not occur if every library is built successively against the existing set of binaries on the system.

Building a High-Performance Cluster with Gentoo

Posted Apr 11, 2007 14:35 UTC (Wed) by ajross (subscriber, #4563) [Link]

So if you argue that system libraries and kernels don't matter to HPC performance, then why are HPC administrators so reluctant to change something so irrelevant?

Because, like any IT administrator queried about a big configuration change, they are afraid of breaking things. It doesn't matter how fast that kernel is; an inoperative cluster is still infinitely slower. I assure you, that they would reply with "can I install gentoo?!" with the same horror.

Hell, even if they were running gentoo, they would probably refuse to do upgrade. At least they would if I were paying them.

Leave the distro wars for the kiddies. Real clusters need to be doing real work, and futzing with the installed software doesn't qualify.

Building a High-Performance Cluster with Gentoo

Posted Apr 12, 2007 10:36 UTC (Thu) by jschrod (subscriber, #1646) [Link]

Probably they are afraid that the (most often flaky) interconnect driver doesn't work any more after the upgrade.

As long as you're using Gbit Ethernet for interconnects, upgrades are easy. If you're using Myrinet or Infiniband, that's a whole other story.

(My experience is from HPC work for automotive and aerospace companies.)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds