Batch processing of network packets
Batch processing of network packets
Posted Aug 22, 2018 15:55 UTC (Wed) by excors (subscriber, #95769)In reply to: Batch processing of network packets by mm7323
Parent article: Batch processing of network packets
Even outside of proper DSPs, I think this is a fairly natural pattern when using SIMD CPUs or GPUs. You want to process multiple data items concurrently, so you think about your data as a single array (or maybe a structure-of-arrays) rather than as a large collection of independent OOP-style objects, and you write your processing code as array transformations. But the processing code has limited resources (registers, constants, instruction cache, local memory in GPUs, etc), and the compiler will either complain or go dreadfully slow (which should be obvious in a profiler) if you exceed the limits; or maybe some of the processing works best on vectors of 8 elements while some only works on vectors of 4, etc; or maybe it's just hard to understand the whole algorithm at once; so it's natural to split the processing into multiple simpler passes that can be optimised individually. You might run each pass over the entire data set in sequence, or (if memory bandwidth is a concern) you might split the data into chunks that fit within the data cache then run all passes over one chunk before moving onto the next chunk.
(This seems kind of like the data-oriented design thing that some game developers have been interested in, as a reaction against the cache-inefficiency of common OOP designs. But I imagine it's much easier to recognise and apply in a game engine that you're writing from scratch, and that you know is going to have to process huge amounts of data, than in a large existing OOP-ish codebase like Linux that was designed with very difference performance requirements.)
Posted Aug 22, 2018 20:32 UTC (Wed)
by mm7323 (subscriber, #87386)
[Link]
One thing that springs to mind, in an object oriented context, is that inheritance really breaks (instruction) cache efficiency. You may have a collection of objects and wish to call the same method on each one, but due to inheritance each object can take very different code paths and foul the instruction caches.
The data orientated approach the link describes mitigates this. But sorting collections by object type (before any other comparison criteria) may also yield better performance in this vain too I guess.
Batch processing of network packets