> DCA doesnt really reduce the memory bandwidth requirements since the
> data still has to be fetched by the cache from the main memory (the
> device doesnt write into the cache, it just tells the cache that data
> should be fetched). The whole point of the approach is that this fetch
> is done in advance, so you dont have to wait for it when the host starts
> processing the packet.
So it doesn't seem that good yet.
With integrated memory controllers I would think it's easy to do this automatically, or at least very easily, by directly going to the cache first instead of through RAM (at least the L3 cache).