User: Password:
Subscribe / Log in / New account

Memory part 2: CPU caches

Memory part 2: CPU caches

Posted Oct 2, 2007 21:41 UTC (Tue) by ikm (subscriber, #493)
In reply to: Memory part 2: CPU caches by pr1268
Parent article: Memory part 2: CPU caches

Actually, I don't quite get it. What is the major difference between two threads executing simultaneously when the OS schedules them on one core and two threads executing simultaneously as two hyperthreads on the same core? Only the scheduling delay, which gives each thread some time to fill in caches and use them before another thread kicks in to ruin everything?

(Log in to post comments)

Memory part 2: CPU caches

Posted Oct 3, 2007 1:01 UTC (Wed) by ncm (subscriber, #165) [Link]

There are lots of differences, but extra L1 cache pressure is an important one. Another is competition for memory bus bandwidth.

Hyperthreading treats the ALUs as the scarce resource, and sacrifices cache capacity and memory bandwidth to grant more of such access. For those (much more common) workloads already limited by cache size and memory bandwidth, this seems like a really bad idea, but there are a few workloads where it's not. To cater to those workloads, the extra cost is just a bit of extra scheduling logic and a bunch of extra hidden registers.

If it could be turned on and off automatically according to whether it helps, we wouldn't need to pay it any attention. That it can't is a problem, because we don't have any good place to put the logic to turn it on and off.

Memory part 2: CPU caches

Posted Oct 4, 2007 22:10 UTC (Thu) by jzbiciak (subscriber, #5246) [Link]

Actually, hyperthreading treats ALUs as an underutilized resource, and task scheduling latency as the benchmark. That is, one task might be busy chasing pointer chains and taking branches and cache misses, and not totally making use of the ALUs. (Think "most UI type code.") Another task might be streaming data in a tight compute kernel, scheduling its data ahead of time with prefetch instructions. It will have reasonable cache performance (due to the prefetches), and will happily use the bulk of the ALU bandwidth.

In this scenario, the CPU hog will tend to use its entire timeslice. The other task, which is perhaps interacting with the user, may block, sleep and wake up to go handle minor things like moving the mouse pointer around, blinking the cursor, handling keystrokes, etc. In a single-threaded machine, that interactive task would need to preempt the CPU hog directly, go do its thing, and then let the hog back onto the CPU. In a hyperthreaded environment, there's potentially a shorter latency to waking the interactive task, and both can proceed in parallel.

That's at least one of the "ideal" cases. Another is when one CPU thread is blocked talking to slow hardware (e.g. direct CPU accesses to I/O registers and the like). The other can continue to make progress.

Granted, there are many workloads that don't look like these. Those which cause the cache to fall apart by thrashing it definitely look worse on an HT machine.

Memory part 2: CPU caches

Posted Oct 10, 2007 0:48 UTC (Wed) by ncm (subscriber, #165) [Link]

I stand corrected. Now, if only processes could be rated according to how much of each fragment of the CPU they tend to use, they might be paired with others that favor using the corresponding other fragments.

Unfortunately the mix changes radically from one millisecond to the next. For example, slack UI code may get very busy rendering outline fonts.

Still, I am now inspired to try turning on HT on my boxes and see how it goes.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds