LWN.net Logo

Memory part 2: CPU caches

Memory part 2: CPU caches

Posted Oct 2, 2007 10:51 UTC (Tue) by pr1268 (subscriber, #24648)
Parent article: Memory part 2: CPU caches

The concept of disabling hyperthreading (SMT) in the BIOS as a way to reduce cache misses and possibly increase performance is interesting (and pertinent to me as I run a system with such a CPU and motherboard). After all, my CPU seems to utilize this feature about 10% of the time, and even then it's usually (99.99% of the time) with two distinct, non-threaded applications.

It does seem logical that, if the hyperthreaded CPU shows as two CPUs to the OS (I get two penguins at boot time plus cat /proc/cpuinfo shows two processors), but each virtual CPU is sharing the same 512K of L2 cache, then maybe my PC is sucking rocks in performance due to the cache miss rate alone.

And to think of how aggressively Intel had marketed HT as a panacea for multi-tasking OSes. One thing's for sure: Intel is great at marketing charisma. Any Intel engineers out there wish to comment on the virtues of HT? Or, I could wait for § 6.3.4 when Ulrich discusses an advantage to using hyperthreading...


(Log in to post comments)

Memory part 2: CPU caches

Posted Oct 2, 2007 21:41 UTC (Tue) by ikm (subscriber, #493) [Link]

Actually, I don't quite get it. What is the major difference between two threads executing simultaneously when the OS schedules them on one core and two threads executing simultaneously as two hyperthreads on the same core? Only the scheduling delay, which gives each thread some time to fill in caches and use them before another thread kicks in to ruin everything?

Memory part 2: CPU caches

Posted Oct 3, 2007 1:01 UTC (Wed) by ncm (subscriber, #165) [Link]

There are lots of differences, but extra L1 cache pressure is an important one. Another is competition for memory bus bandwidth.

Hyperthreading treats the ALUs as the scarce resource, and sacrifices cache capacity and memory bandwidth to grant more of such access. For those (much more common) workloads already limited by cache size and memory bandwidth, this seems like a really bad idea, but there are a few workloads where it's not. To cater to those workloads, the extra cost is just a bit of extra scheduling logic and a bunch of extra hidden registers.

If it could be turned on and off automatically according to whether it helps, we wouldn't need to pay it any attention. That it can't is a problem, because we don't have any good place to put the logic to turn it on and off.

Memory part 2: CPU caches

Posted Oct 4, 2007 22:10 UTC (Thu) by jzbiciak (✭ supporter ✭, #5246) [Link]

Actually, hyperthreading treats ALUs as an underutilized resource, and task scheduling latency as the benchmark. That is, one task might be busy chasing pointer chains and taking branches and cache misses, and not totally making use of the ALUs. (Think "most UI type code.") Another task might be streaming data in a tight compute kernel, scheduling its data ahead of time with prefetch instructions. It will have reasonable cache performance (due to the prefetches), and will happily use the bulk of the ALU bandwidth.

In this scenario, the CPU hog will tend to use its entire timeslice. The other task, which is perhaps interacting with the user, may block, sleep and wake up to go handle minor things like moving the mouse pointer around, blinking the cursor, handling keystrokes, etc. In a single-threaded machine, that interactive task would need to preempt the CPU hog directly, go do its thing, and then let the hog back onto the CPU. In a hyperthreaded environment, there's potentially a shorter latency to waking the interactive task, and both can proceed in parallel.

That's at least one of the "ideal" cases. Another is when one CPU thread is blocked talking to slow hardware (e.g. direct CPU accesses to I/O registers and the like). The other can continue to make progress.

Granted, there are many workloads that don't look like these. Those which cause the cache to fall apart by thrashing it definitely look worse on an HT machine.

Memory part 2: CPU caches

Posted Oct 10, 2007 0:48 UTC (Wed) by ncm (subscriber, #165) [Link]

I stand corrected. Now, if only processes could be rated according to how much of each fragment of the CPU they tend to use, they might be paired with others that favor using the corresponding other fragments.

Unfortunately the mix changes radically from one millisecond to the next. For example, slack UI code may get very busy rendering outline fonts.

Still, I am now inspired to try turning on HT on my boxes and see how it goes.

Memory part 2: CPU caches

Posted Oct 10, 2007 11:40 UTC (Wed) by dlang (✭ supporter ✭, #313) [Link]

one of the huge advantages of hyperthreading is that a result of the tratment of CPU registers. if you schedule two threads on one core you have to save and restore all the registers (possibly as far as main memory, depending on what your second thread does to your cache)

but with hyperthreading each virtual core has it's own set of registers, this provides a drastic speedup when switching from one task to another (under ideal situations)

in the real world it all depends on how the different threads compete for cache space and memory I/O. if you are building a dedicated compute cluster (and some high-end graphics workstations fit in this catagory) you can tune for this and get really good speedups, if you are running a mixed hodgepodge of stuff you are far more likely to hit the problem cases.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds