vs. VSIPL++?

Posted Feb 18, 2009 13:04 UTC (Wed) by csamuel (✭ supporter ✭, #2624)
In reply to: vs. VSIPL++? by endecotp
Parent article: Interview: Eigen Developers on 2.0 Release (KDEDot)

If you're only *ever* going to run on a single user system then you're OK with what
you're doing, but if it is ever going to be given to anyone else or if you yourself are
ever going to want to run it on a shared system I'd suggest rethinking it.

If you're writing multi-threaded HPC codes then you should look at using MPI (which
lets you span multiple nodes and not just be tied to the cores on a single SMP
system) or use an existing SMP framework like OpenMP. Both MPI and OpenMP are
open standards with multiple implementations (and it is OpenMP that implements
the OMP_NUM_THREADS limits for you).

We prefer people to use MPI as the explicit parallelisation you need to do seems to
make people produce more efficient code - we've certainly seen commercial codes
that come as both MPI and SMP variants where the MPI version scales better on an
SMP system than the SMP version!

We're using the cpusets virtual filesystem (which are part of cgroups now, but you
can't mount both views at the same time) and I don't believe your code will see a
different number of cores available, it'll just try and run all its threads on whatever
cores have been allocated. It would be nice if it could figure it out automatically!

vs. VSIPL++?

Posted Feb 18, 2009 13:59 UTC (Wed) by endecotp (guest, #36428) [Link]

I looked at OpenMP a while ago and found it rather over-the-top for my modest needs. The Intel Thread Building Blocks library (TBB) seems more appropriate in my case. But I don't think it's appropriate to expect everyone to use the same library, or small set of libraries: there is nothing wrong with diversity. We just need to ensure that in all cases an appropriate level of parallelism is used. I can't think of a better way of achieving this than making sysctl(NPROCESSORS) return an appropriate answer.

vs. VSIPL++?

Posted Feb 19, 2009 1:30 UTC (Thu) by ncm (guest, #165) [Link] (1 responses)

MPI strikes me as markedly better in every way. The experience of VSIPL++ demonstrates that practically nothing of the "explicit parallelization" that csamuel mentions need leak out to the library API, at least for a matrix operations library. I would guess that the performance advantage mentioned is a consequence of better cache behavior.

vs. VSIPL++?

Posted Feb 21, 2009 20:30 UTC (Sat) by jedbrown (subscriber, #49919) [Link]

I love MPI, but it definitely leaks out. You generally have to run the program with a special launcher and every process runs main. Cache coherency is a big killer for SMP, MPI deals with that at the expense of needing to move more memory around (for example, copying between differnet processes address spaces). Explicit threads with affinity and awareness of the cache behavior of their neighbors can be better yet, but that model is really only possible for a few very special kernels (sparse matrix-vector products for instance) and even then isn't widely available.