Amdahl's law, 55 years later

Posted Nov 1, 2025 6:05 UTC (Sat) by WolfWings (subscriber, #56790)
In reply to: Amdahl's law, 55 years later by jreiser
Parent article: Ubuntu introduces architecture variants

That's a common misconception from the days when all the SIMD stuff was just basic parallel-math functions.

The BMI sub-extensions around AVX2 added a TON of fine-grained data-manipulation instructions down to the bit-level (thus the name), and AVX512 added more advanced masking features and selective packing on write with VPCOMPRESS to get variable-length memory writes from non-contiguous sequential bytes out of the 512-bit register.

So even just dealing with 32-byte blocks of data on something as simple as adding escape backslashes to a string or colorspace conversion can benefit almost fully.

AVX512 really straddles the line with what you'd expect more from GPU compute shaders.

Amdahl's law, 55 years later

Posted Nov 1, 2025 7:27 UTC (Sat) by epa (subscriber, #39769) [Link] (2 responses)

Adding escape backslashes to a string… it would take a diabolically cunning compiler to vectorize that code. Or should we write assembly language for it? Is there a better, more expressive language than C that can be compiled to efficient vector code, yet is safer than assembly language?

An example of vectorisation helping string operation

Posted Nov 1, 2025 16:53 UTC (Sat) by fishface60 (subscriber, #88700) [Link]

I misremembered reading https://purplesyringa.moe/blog/i-sped-up-serde-json-strin... as doing manual vectorisation because it does some similar tricks within 32-bit registers, so it's not relevant for specifically how to do it, but may be of interest for how vectorisation speeds up string encoding and decoding.

Amdahl's law, 55 years later

Posted Nov 1, 2025 22:16 UTC (Sat) by WolfWings (subscriber, #56790) [Link]

This is where the 'intrinsic' pseudo-functions that Intel created for compilers greatly simplifies the code so you don't need to break out raw assembly and can let the compiler still deal with register usage and inter-mix its code with yours.

https://www.intel.com/content/www/us/en/docs/intrinsics-g...

For a simple but sufficient example of the escaping-strings idea, and how you can POPCNT the mask used for VPCOMPRESS to get the byte-count written https://lemire.me/blog/2022/09/14/escaping-strings-faster... is a pretty decent point of reference.

Amdahl's law, 55 years later

Posted Nov 1, 2025 14:32 UTC (Sat) by khim (subscriber, #9252) [Link] (1 responses)

AVX512 would have been great if Intel wouldn't have bombed its introduction so badly. Today you may expect AVX512 from AMD in consistent fashion, but not from Intel.

This is extremely stupid, but hey, that's Intel for you.

Amdahl's law, 55 years later

Posted Nov 1, 2025 22:18 UTC (Sat) by WolfWings (subscriber, #56790) [Link]

I mean... they also introduced a niche instruction in AVX-512 that they implemented SO BADLY that other instructions could reproduce the same effect even faster, to the point Intel has deprecated the instruction.

AMD's implementation? 1 VP2INTERSECT per clock cycle as of Zen5, where Intel is was over 25 clock cycles.