|
|
Subscribe / Log in / New account

Prefetching considered harmful

By Jonathan Corbet
September 8, 2010
As anybody who has read What every programmer should know about memory knows, performance on contemporary systems is often dominated by cache behavior. A single cache miss can cause a processor stall lasting for hundreds of cycles. The kernel employs many tricks and techniques to optimize cache behavior, but, as is often the case with low-level optimization, it turns out that some of those tricks are not as helpful as had been thought.

The kernel's linked list macros include a set of operators for iterating through a list. At the top of a list-processing loop, the macros will issue a prefetch operation for the next entry in the list. The hope is that, by the time one entry has been processed, the CPU will have fetched the following entry into its cache, avoiding a stall at the beginning of the next trip through the loop. It seems like the sort of micro-optimization which can only help, and nobody has looked closely at these prefetch operations for a long time - until now. Andi Kleen has just posted a patch removing most of those prefetches.

Andi's contention is that, on contemporary processors, the prefetch operations are actually making things worse. These processors already prefetch everything they can get their hands on, so the explicit prefetch is unlikely to help. Even if that prefetch does start a memory cycle earlier than it would have otherwise happened, list processing loops tend to be so short that the amount of additional parallelism gained is quite small. Meanwhile the prefetch operations bloat the kernel image, increase register use, and cause the compiler to generate worse code. So, he says, we are better off without them.

With the prefetch operations removed, Andi's kernel image ends up being 10KB smaller. It also shows no performance regressions over mainline kernels. Unless somebody else gets different results, that seems like enough to justify putting this patch into the mainline.

Index entries for this article
KernelPrefetch


to post comments

Prefetching considered harmful

Posted Sep 9, 2010 8:53 UTC (Thu) by sandmann (subscriber, #473) [Link] (1 responses)

We have had similar results in pixman:

http://lists.cairographics.org/archives/pixman/2010-June/...

although in that case, the memory access is very linear, so the software prefetch basically competes with the hardware prefetcher.

For more random access, such as accessing a linked list, prefetching seems more likely to be beneficial.

Prefetching considered harmful

Posted Sep 9, 2010 13:48 UTC (Thu) by ejr (subscriber, #51652) [Link]

Prefetching random accesses requires 100s of cycles (and growing!) between the prefetch and the access. If you're running a single 'task' that long, you're likely hitting memory and/or polluting the cache in the meantime. Plus, current systems have very few pipes out to memory and cannot support many outstanding loads.

Explicit prefetch seems useful in a narrow set of conditions: You have many light-weight tasks with tiny memory footprint that require >10s of cycles and don't pollute the cache. If the tasks are too short, the prefetches will stomp on each other. If they're too long, you often end up needing other data in the meantime. Graph analysis algorithms benefit, but not much else seems to benefit.

(What may be more useful architecturally is the ability to stop a HW prefetch engine and retarget it. Consider repeatedly processing the same image/audio frame in memory. The prefetch engine is happily continuing to fetch past the end... It might be nice to say "hey, no, restart prefetching from the beginning" when you're on the last scanline, etc. I don't know if any HW supports it, or if that would even have a place in an OS kernel.)

Prefetching considered harmful

Posted Sep 9, 2010 12:04 UTC (Thu) by Trou.fr (subscriber, #26289) [Link] (1 responses)

Just a thought, I may be completely wrong but what a about embedded processors which aren't as complicated as modern "big" CPUs ?

Prefetching considered harmful

Posted Sep 9, 2010 22:41 UTC (Thu) by smoogen (subscriber, #97) [Link]

Not sure what is a simple embedded chip these days. ARM and low power Intel chips look as complicated as their big brothers in handling memory vs power issues. I would guess that other processors may do the same.

Prefetching considered harmful

Posted Sep 9, 2010 17:06 UTC (Thu) by pr1268 (guest, #24648) [Link] (1 responses)

<Wishlist request>

You don't suppose that Andi could make a kernel config option whether the prefetch is included so that I could do my own comparison? Thanks!

</Wishlist request>

Prefetching considered harmful

Posted Sep 9, 2010 17:35 UTC (Thu) by dtlin (subscriber, #36537) [Link]

That's close to what he did first, see
http://lkml.org/lkml/2010/9/8/77

Prefetching considered harmful

Posted Sep 10, 2010 12:22 UTC (Fri) by nwatkins (guest, #61119) [Link] (2 responses)

In the linked list example used in the article, the author seems to be implying that modern processors "already prefetch everything they can get their hands on", and that the prefetch annotation isn't necessary.

Is there an underlying assumption for the linked list case that the items in the list are near each other in memory, or is there additional help from the compiler that marks memory for the next node? I'm not coming up with a mechanism which would allow the processor to understand the structure of the list.

Prefetching considered harmful

Posted Sep 10, 2010 16:20 UTC (Fri) by jzbiciak (guest, #5246) [Link]

It's more likely that a modern CPU will speculate ahead to the next iteration of the loop, which will start to bring in the next structure in the list.

Prefetching considered harmful

Posted Sep 10, 2010 22:51 UTC (Fri) by daglwn (guest, #65432) [Link]

Soon the data items will not even need to be that close to each other. Google "region prefetching" for some articles.

We have to use those billions of transistors for something

Prefetching considered harmful

Posted Sep 10, 2010 21:04 UTC (Fri) by rilder (guest, #59804) [Link] (1 responses)

Not sure what counts as modern CPU here. Either this should be a kernel config or it should have runtime detection of h/w prefetching. Otherwise perf may actually suffer.

Prefetching considered harmful

Posted Sep 18, 2010 6:45 UTC (Sat) by efexis (guest, #26355) [Link]

I don't know whether it matters much, I think the "modern cpu's" definition would be cpu's that are so fast that memory's not even pretending to keep up and so cache sizes are shooting up to try compensate. This is more of a problem now because the disparity between cpu and memory speeds is higher than ever before so you have more speed available to waste on cache stalls, so with less modern cpu's it's not so much of a problem to begin with.


Copyright © 2010, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds