Interview: Eigen Developers on 2.0 Release (KDEDot)

[Posted February 17, 2009 by ris]

Jonathan Riddell interviews the developers of Eigen. "Recently Eigen 2.0 was released. You might already have heard about Eigen, it is a small but very high performance maths library which has its roots in KDE. Below, the two core developers are interviewed about it."

vs. VSIPL++?

Posted Feb 18, 2009 3:20 UTC (Wed) by ncm (guest, #165) [Link] (12 responses)

I would like to see comparisons to VSIPL++, a library with similar structure, range, and goals.

One difference I note is that VSIPL++ is dual-licensed, GPL2 and "proprietary". Another is that VSIPL++ automatically parallelizes matrix operations to as many cores as are available. A third is that it may be compiled to use (e.g.) Intel's MKL underneath.

(Disclosure: I worked on an early version of VSIPL++.)

vs. VSIPL++?

Posted Feb 18, 2009 6:04 UTC (Wed) by csamuel (✭ supporter ✭, #2624) [Link] (9 responses)

Hmm, as an HPC cluster admin I don't like automatic parallelisation to
multiple cores on our systems.

Take our current dual quad-core systems, if two users end up running 4 CPU
jobs on the same node you don't want them to both try and use all 8 cores.
Fortunately our queuing system (Torque) puts the jobs into a CPU set so all
they'll affect are themselves, but in general it's probably not what you
want..

vs. VSIPL++?

Posted Feb 18, 2009 8:12 UTC (Wed) by ncm (guest, #165) [Link] (1 responses)

As I recall, the number of cores to use is a runtime initialization parameter. The point is that the source and binary are the same regardless of the number of cores used, whether 1, 4, or six dozen. In principle a library could optimize at compile time for the number of cores to use. VSIPL++ doesn't, deliberately, so that a program can still run when (e.g.) one of the cores is unavailable.

vs. VSIPL++?

Posted Feb 18, 2009 10:35 UTC (Wed) by csamuel (✭ supporter ✭, #2624) [Link]

Ahh, now that makes a lot more sense, like the OMP_NUM_THREADS environment
variable for OpenMP applications.

Thanks for the clarification!

vs. VSIPL++?

Posted Feb 18, 2009 9:55 UTC (Wed) by epa (subscriber, #39769) [Link] (1 responses)

Take our current dual quad-core systems, if two users end up running 4 CPU jobs on the same node you don't want them to both try and use all 8 cores.

The best policy might be for each job to use all 8 cores, but run the two jobs in sequence. Then the first user gets his results after an hour and the second user after another hour, rather than making them both wait two hours as would happen if running in parallel with 4 cores each. (Assuming perfect scalability from 4 to 8 cores.)

vs. VSIPL++?

Posted Feb 18, 2009 10:36 UTC (Wed) by csamuel (✭ supporter ✭, #2624) [Link]

The number of cores a job uses is set by the user when they submit it, so if they ask
for 4 cores they are likely not be expecting any more than that.

vs. VSIPL++?

Posted Feb 18, 2009 11:41 UTC (Wed) by endecotp (guest, #36428) [Link] (4 responses)

My multi-threaded code calls sysconf(_SC_NPROCESSORS_ONLN) to determine how many processors are available, and uses all of them. (There is also NPROCESSORS_CONF, returning the number of "configured" processors rather than those "online"; I'm not really sure of the difference; is it for systems with hot-swap CPUs?)

I think this is the right thing to do for single-user systems. When you run something with a CPU set [you're talking about the cgroups feature, right?], does sysconf(_SC_NPROCESSORS_ONLN) give a different answer? I suspect not. Maybe it should. Or is there some other API that a process can use to determine how many processors are available to it, taking into account the cgroups stuff?

We don't want to end up with each library or application using its own environment variable that the user is expected to set.

vs. VSIPL++?

Posted Feb 18, 2009 13:04 UTC (Wed) by csamuel (✭ supporter ✭, #2624) [Link] (3 responses)

If you're only *ever* going to run on a single user system then you're OK with what
you're doing, but if it is ever going to be given to anyone else or if you yourself are
ever going to want to run it on a shared system I'd suggest rethinking it.

If you're writing multi-threaded HPC codes then you should look at using MPI (which
lets you span multiple nodes and not just be tied to the cores on a single SMP
system) or use an existing SMP framework like OpenMP. Both MPI and OpenMP are
open standards with multiple implementations (and it is OpenMP that implements
the OMP_NUM_THREADS limits for you).

We prefer people to use MPI as the explicit parallelisation you need to do seems to
make people produce more efficient code - we've certainly seen commercial codes
that come as both MPI and SMP variants where the MPI version scales better on an
SMP system than the SMP version!

We're using the cpusets virtual filesystem (which are part of cgroups now, but you
can't mount both views at the same time) and I don't believe your code will see a
different number of cores available, it'll just try and run all its threads on whatever
cores have been allocated. It would be nice if it could figure it out automatically!

vs. VSIPL++?

Posted Feb 18, 2009 13:59 UTC (Wed) by endecotp (guest, #36428) [Link]

I looked at OpenMP a while ago and found it rather over-the-top for my modest needs. The Intel Thread Building Blocks library (TBB) seems more appropriate in my case. But I don't think it's appropriate to expect everyone to use the same library, or small set of libraries: there is nothing wrong with diversity. We just need to ensure that in all cases an appropriate level of parallelism is used. I can't think of a better way of achieving this than making sysctl(NPROCESSORS) return an appropriate answer.

vs. VSIPL++?

Posted Feb 19, 2009 1:30 UTC (Thu) by ncm (guest, #165) [Link] (1 responses)

MPI strikes me as markedly better in every way. The experience of VSIPL++ demonstrates that practically nothing of the "explicit parallelization" that csamuel mentions need leak out to the library API, at least for a matrix operations library. I would guess that the performance advantage mentioned is a consequence of better cache behavior.

vs. VSIPL++?

Posted Feb 21, 2009 20:30 UTC (Sat) by jedbrown (subscriber, #49919) [Link]

I love MPI, but it definitely leaks out. You generally have to run the program with a special launcher and every process runs main. Cache coherency is a big killer for SMP, MPI deals with that at the expense of needing to move more memory around (for example, copying between differnet processes address spaces). Explicit threads with affinity and awareness of the cache behavior of their neighbors can be better yet, but that model is really only possible for a few very special kernels (sparse matrix-vector products for instance) and even then isn't widely available.

vs. VSIPL++?

Posted Feb 18, 2009 9:10 UTC (Wed) by halla (subscriber, #14185) [Link] (1 responses)

Well, if you've worked on that library, you are probably in the best position to give such a
comparison.

Just from judging the website, I'd say that VSIPL++ focuses on signal processing,
while Eigen2 is a versatile library for linear algebra: vectors, matrices, and related algorithms.

And, as you say, VSIPL++ is a commercial, dual licensed product, while Eigen2 is available under
both GPL and LGPLv3 (to avoid all header-include licensing rannygazoo).

For me, as a free software developer there is an even bigger difference: the VSIPL++ people have
never given me patches to make my application (krita) use VSIPL++, while the eigen people have
done just that.

The Eigen manual is rather more extensive and gives useful examples: the VSIPL++ tutorial has
nice examples for high-level things like convolution. The Eigen api seems a lot nicer to me, but
that may because I am used to it or because even I, who hasn't had maths at school to speak of,
can understand what the expression mean. somtimes.

But as for performance and functionality... That's something you are much better placed to
compare. And I'm sure the Eigen people would like to see it, too :-)

vs. VSIPL++?

Posted Feb 19, 2009 1:56 UTC (Thu) by ncm (guest, #165) [Link]

All these libraries do the generic linear algebra stuff; that's their bread and butter. VSIPL++, because of its heritage, also does signal and image processing. The list of features (and all of the more unfortunate function names) trace back to an ancient C library from SGI. No C library can compete for performance with assembly language cores, but C++ solved that problem.

Without looking, I would guess that Eigen's API is probably nicer than VSIPL++'s. Besides history, the latter suffers from committee design. A fork to replace the API while retaining the underlying implementation (allowing plug-in parallization libraries and assembly-language cores) could make it nicer to use.

It ought to be pretty easy to port Krita over to use VSIPL++, and benchmarks would be instructive. It's a heady thing to see a program pick up and use extra CPUs without changing source or even recompiling.

Eigen vs. uBLAS

Posted Feb 18, 2009 17:11 UTC (Wed) by cry_regarder (subscriber, #50545) [Link] (3 responses)

Has anyone compared Eigen with uBLAS (part of boost)? Was uBLAS considered during the initial stages of Eigen?

http://www.boost.org/doc/libs/1_38_0/libs/numeric/ublas/d...

"uBLAS is a C++ template class library that provides BLAS level 1, 2, 3 functionality for dense, packed and sparse matrices. The design and implementation unify mathematical notation via operator overloading and efficient code generation via expression templates."

Thanks,

Cry

Eigen vs. uBLAS

Posted Feb 18, 2009 17:32 UTC (Wed) by bo (guest, #56215) [Link] (2 responses)

The Eigen developers have a comparison benchmark on their website which
includes (among others) uBlas.
It may be found here:
http://eigen.tuxfamily.org/index.php?title=Benchmark

Eigen vs. uBLAS

Posted Feb 18, 2009 18:11 UTC (Wed) by cry_regarder (subscriber, #50545) [Link] (1 responses)

Thanks,

Interesting chart. There are MKL <-> uBLAS bindings so should be able to get the MKL performance also.

Cry

Eigen vs. uBLAS

Posted Feb 18, 2009 19:52 UTC (Wed) by halla (subscriber, #14185) [Link]

Perhaps. But that's not the whole story, of course. As noted in the
article, when asked to explain what is different about Eigen:

"Generality: we need many different kinds of matrices: fixed-size,
dynamic-size dense, sparse. For example, BLAS and LAPACK handle only
dynamic-size dense matrices. Even MKL and vecLib have only limited support
for fixed-size matrices."