User: Password:
|
|
Subscribe / Log in / New account

An Introduction to GCC Compiler Intrinsics in Vector Processing (Linux Journal)

An Introduction to GCC Compiler Intrinsics in Vector Processing (Linux Journal)

Posted Oct 9, 2012 18:35 UTC (Tue) by elanthis (guest, #6227)
In reply to: An Introduction to GCC Compiler Intrinsics in Vector Processing (Linux Journal) by khim
Parent article: An Introduction to GCC Compiler Intrinsics in Vector Processing (Linux Journal)

I'm not convinced it's useful at all.

There are largely three classes of applications that really make use of these features:

1) HPC apps, where portability of a binary is not necessary since the whole system is generally compiled for a very specific set of hardware.

2) Media processing apps, which generally offload to the GPU these days, because even the best CPU ops for media processing are slow compared to a 300-2000 core GPU (which might even have specialized circuitry for certain media processing tasks, like video encode/decode).

3) Games (the parts that can't be offloaded to the GPU, or for games where the graphics alone are consuming the GPU's processing bandwidth), which are often mixing in bits and pieces of vector code with non-vector code and in which the overload of dispatch to another routine completely negates the advantage of using the vector instructions in the first place.

In the last case, there are games that just compile the core engine multiple times for different common end-user CPU architectures into shared libraries, and use a small loader executable to select and load the proper shared library. This allows the vector math to be completely inlined as desired while still allowing use with newer instruction sets like SSE4.1, while the game still runs on older baseline SSE3 hardware. Note that we don't generally bother supporting folks without SSE3 even on x86, since nobody who plays high-end games has a CPU old enough to lack SSE3. (Steam Hardware Survey: 99.18% of PC gaming users have SSE3, but only 57.41% have SSE4.1, so SSE3 is baseline supported but SSE4.1 must still be optional in apps.)

tl;dr: the dispatch has overhead and the folks who need vector math either don't care about CPU portability or refuse to accept that overhead.


(Log in to post comments)

An Introduction to GCC Compiler Intrinsics in Vector Processing (Linux Journal)

Posted Oct 10, 2012 5:19 UTC (Wed) by khim (subscriber, #9252) [Link]

tl;dr: the dispatch has overhead and the folks who need vector math either don't care about CPU portability or refuse to accept that overhead.

GCC dispatching is tied to shared libraries and have no overhead on top of that. Nothing. Exactly zero. Not one single cycle, not one single byte (except for the slow-path which is, obviously, not a big concern: it's called slow-path for a reason). Sure, shared libraries are slower for various reasons yet somehow games use them, but they don't use dispatch. Where is the logic in your statement? Why would you create fully separate engine where you can only create specialized version of some core part?

IMNSHO it's not "refuse to accept", it's "refuse to consider because of ignorance". Ignorance is most definitely not bliss in this particular case.


Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds