Is hyperthreading dangerous?
One of the resources shared by hyperthreaded processor sets is the memory cache. This sharing has its advantages: if processes running on the two processors are sharing memory, that memory need only be fetched into the cache once. That kind of sharing happens often; shared libraries are one obvious example. The shared cache also makes moving processes between hyperthreaded processors an inexpensive operation, so keeping loads balanced across the system is easier.
The sharing of caches between hyperthreaded processors is also, however, the cause of a vulnerability identified in a heavily trailered report by Colin Percival. The core of the problem is that, by measuring the latency of specific memory accesses, a process can tell whether a given memory location was represented in the processor cache or not. A hostile process can load the cache with its own memory, wait a bit, then run tests to see which locations have been evicted from the cache. From that information, it can make inferences about which memory locations were accessed by the sibling processor in the hyperthreaded set.
Two cooperating processes, running at different privilege levels, could make use of the cache to set up a covert channel for communication. In a highly secured system, these two processes might not be able to talk to each other at all normally. With a covert channel in place, information can be leaked from a privileged level to one less privileged, leading to all kinds of dreadful consequences - for somebody. Most systems, however, are not overly concerned about this sort of covert channel; there are easier ways to deliberately leak information.
Mr. Percival, however, also shows how the vulnerability can be exploited to obtain information from processes which are not cooperating. In particular, he claims that it can be used to steal keys from cryptographic applications. A number of crypto algorithms have data-dependent memory access patterns; an attacker who can watch memory accesses can, for some algorithms, derive the key which was being used. The exploit discussed in the report attacks the OpenSSL key signing algorithm in this way.
The paper makes a number of recommendations on steps which can be taken to mitigate this problem. The simplest is to simply disable hyperthreading; on Linux systems, it is a simple matter of configuring out hyperthreading support or booting with the noht option. Alternatively, the kernel could take care not to schedule potentially unfriendly processes on the same hyperthreaded set. Removing access to a high-resolution clock would make the necessary timing information unavailable, thus defeating such attacks. Cryptographic algorithms could be rewritten to avoid data-dependent memory access patterns. Processors could be redesigned to not share caches between hyperthreaded siblings, or to use a cache eviction algorithm which makes it harder to determine which cache lines have been removed.
The Linux scheduler could certainly be changed to defeat attempted cache-based attacks on hyperthreaded processors, but the chances of that happening are small. There are numerous obstacles to any sort of real-world exploit of this vulnerability. The attacker must be able to run a CPU-intensive program on the target system - without being noticed - and ensure that it remains on the same hyperthreaded processor as the cryptographic process. The data channel is noisy at best, and it will be made much more so by any other processes running on the system. Timing the attack (knowing when the target process is performing cryptographic calculations, rather than doing something else) is tricky. Getting past all these roadblocks is likely to keep a would-be key thief busy for some time.
In other words, there are almost certainly more effective ways of attacking
cryptographic applications. Closing this particular hole is unlikely to be
worth the trouble, extra complexity in the kernel, and performance impact
it would require. So this vulnerability, despite all the press it has
obtained, will probably not lead to any changes to the kernel in the near
future. Anybody who is truly worried about this problem will be best off
simply turning off hyperthreading for now. In the longer term, authors of
cryptographic code may find that they need to add avoidance of
data-dependent memory access patterns to their arsenal of techniques.
Index entries for this article | |
---|---|
Kernel | Hyperthreading |
Kernel | Security/Vulnerabilities |
Posted May 19, 2005 9:42 UTC (Thu)
by filipjoelsson (guest, #2622)
[Link] (2 responses)
In any case, I have the answer:
Posted May 19, 2005 15:13 UTC (Thu)
by khim (subscriber, #9252)
[Link] (1 responses)
Does this very theoretical exploit affect multicore systems too? Yes and no. Yes - it's possible to use the same technique for dual-core systems. But in reality it'll require way too much brute force. The problem with Hyper-Threading is size of L1 cache - it's too small. Multicore systems only share L2 cache (if even that) and it's much bigger so problem becomes pure theory.
Posted May 26, 2005 13:30 UTC (Thu)
by MarkWilliamson (guest, #30166)
[Link]
Posted May 19, 2005 12:19 UTC (Thu)
by copsewood (subscriber, #199)
[Link] (2 responses)
Posted May 19, 2005 17:35 UTC (Thu)
by hamjudo (guest, #363)
[Link] (1 responses)
If the cryptographic keys are mission critical, they must be backed up. The backups also have to be protected, even when stored off site.
I think that the most significant benefit of a dedicated processor is that you can have a very different scheme for backing up the crypto hardware than for your other servers.
Posted May 27, 2005 20:38 UTC (Fri)
by zakaelri (guest, #17928)
[Link]
Any time a key goes out, it's encrypted. So, you can send in an encrypted key, a datastream, and an encrypt/decrypt/sign/verify instruction... the chip does the magic required, and spits out whichever data was asked for.
Note: While Palladium is evil. I actually like TPM. (My ThinkPad has it, but I havn't had the opportunity to set it up yet).
To make a desparate attempt for on-topic-ness: I disliked HTT from the beginning. It didn't seem worth paying for... It makes more sense (to me) to fine-tune cache performance.
Posted May 19, 2005 12:56 UTC (Thu)
by ballombe (subscriber, #9523)
[Link] (1 responses)
Avoidance of data-dependent memory access patterns is already in use in smartcard devices (it is much easier to exploit once you have stolen the smartcard, though).
Posted May 29, 2005 11:15 UTC (Sun)
by anton (subscriber, #25547)
[Link]
Another potential attack path was using a timing attack based on how
IMO, even that attack path could be blocked relatively easily (e.g.,
My impression was that too much emphasis was given to the
Posted May 20, 2005 17:08 UTC (Fri)
by ksmathers (guest, #2353)
[Link] (1 responses)
Posted May 26, 2005 13:34 UTC (Thu)
by MarkWilliamson (guest, #30166)
[Link]
Posted May 25, 2005 9:53 UTC (Wed)
by mjc@redhat.com (guest, #2303)
[Link]
At least one affected cryptographic library has been evaluating ways to mitigate this issue. OpenSSL is adding contsant-time exponentiation:
http://marc.theaimsgroup.com/?l=openssl-cvs&m=1116207...
Posted May 26, 2005 13:23 UTC (Thu)
by MarkWilliamson (guest, #30166)
[Link]
Does this very theoretical exploit affect multicore systems too? I'm not sure I have this right, but I think they sharing cache too. Or is the (still very theorietical) exploit dependent on the feature that the hostile thread and the cryptographic one do not run at the same time?Is hyperthreading dangerous?
<TIC>This is the perfect exploit to close with security thorugh obscurity. We have to lessen the risque that the two processes happen to be in the same processor cache at the same time. This is best (depending on wether you share my definition of 'best') done by getting an 8-way SMP system, preferably dualcore. (Well, at least _I_ would prefer that, wouldn't you?) Thus the chance would be slim, at best, that the two processes would schedule to the same processor at the same time. (Now, all I need is to persuade my wife that this is necessary in the name of security.)</TIC>
Is hyperthreading dangerous?
IIRC, the initial dual-cores from AMD and Intel will not share any cache. Dual cores...
Intel's (initial) implementation in particular is basically two
independent CPUs on the same die. Even when shared L2 caches are
implemented, it seems that sharing an L2 cache will provide lower
bandwidth to this kind of exploit, which benefits hugely from the shared
L1 cache of the two hyperthreads.
Note that IBM's POWER4 shares caches across multiple cores and the POWER5
also has SMT...
I would question whether it is even possible to make cryptographic computations optimally secure on a highly complex system shared with potentially hostile processes. This kind of attack highlights the kind of difficulties involved. If cryptography is only one of very many things done on a highly complex system, it seems to me unlikely that the security of this cryptography will be done very well. I think a simpler and likely to be more effective approach, once appropriate hardware becomes more generally available, is for the cryptography to occur on dedicated processors, designed to make obtaining access to embedded private keys very difficult and expensive.Is hyperthreading dangerous?
... is for the cryptography to occur on dedicated processors, designed to make obtaining access to embedded private keys very difficult and expensive. Is hyperthreading dangerous?
Albeit I may pay for it later, this is a great place where TPM (Trusted Computing) could be used. Within the TPM system, the chip responsible for key management is the only one that can ever "see" the keys being used.Somewhat OT: Is hyperthreading dangerous?
Interestingly, when the mergemem pacth (http://mergemem.ist.org/) was proposed in 1998, it was accompanied by a note stating a similar security concern. (Hostile programs could generate pages and see whether memory usage go up or not, and deduce they were merged).Is hyperthreading dangerous?
If one could only check whether full pages are merged, that would bemergemem and timing vulnerabilities
pretty hard to exploit (one would have to guess all of the page
correctly, i.e., usually the complete key or password).
long the merge attempt takes, or how much of the merge-attempted pages
is in the cache afterwards. That would bring the granularity down to
a word or a cache line, which makes guessing much more practical.
allow only merging corresponding pages from processes that run the
same binary, and were not tainted with ptrace or somesuch).
vulnerabilities in the mergemem announcements, and that may be one
reason why there was not much interest in it.
If what I've read is correct, hyperthreading increases processor efficiency in the event of a stalled pipeline (such as for a mispredicted branch, or a subroutine entry/exit), but doesn't provide any extra execution units for parallelism. A 30% upper end to speed improvement is about right, but the bottom end is more like -5%. For two CPU bound tasks both of which are contending for cache-lines and each with relatively long spans of instructions between function calls (typically C code instead of C++) Hyperthreading can cause thrashing in the L1 and L2 cache for a net drop in performance.5 to 30% Speed Improvement
HT doesn't provide extra functional units but it may enable better use of 5 to 30% Speed Improvement
the functional units available, as well as allowing the CPU to continue
working in the event one thread stalls.
Unfortunately, with cache contention (and the different mix of functional
units required by different threads), the performance benefits can be
highly variable - scheduling-wise it's a whole new can of worms!!!
> In the longer term, authors of cryptographic code may find that they need Is hyperthreading dangerous?
> to add avoidance of data-dependent memory access patterns
A little quibble about the first paragraph: hyperthreads actually do run Threads actually do run simultaneously
truly simultaneously, which is what distinguishes Simultaneous
MultiThreading from Fine Grained MultiThreading. This gives the
additional benefit of being able to use issue slots in a superscalar CPU
that would otherwise be wasted. Of course, if one stalls, you still get
the benefit that the other thread carries on keeping the processor busy.
Since modern x86 CPUs internally use register renaming and figure out
instruction dependencies dynamically, I've always suspected that adding HT
didn't require many (or perhaps any) changes to the middle part of the
pipeline.