|
|
Log in / Subscribe / Register

Zen and the Art of Microcode Hacking (Google Bug Hunters)

The Google Bug Hunters blog has a detailed description of how a vulnerability in AMD's microcode-patching functionality was discovered and exploited; the authors have also released a set of tools to assist with this kind of research in the future.

Secure hash functions are designed in such a way that there is no secret key, and there is no way to use knowledge of the intermediate state in order to generate a collision. However, CMAC was not designed as a hash function, and therefore it is a weak hash function against an adversary who has the key. Remember that every AMD Zen CPU has to have the same AES-CMAC key in order to successfully calculate the hash of the AMD public key and the microcode patch contents. Therefore, the key only needs to be revealed from a single CPU in order to compromise all other CPUs using the same key. This opens up the potential for hardware attacks (e.g., reading the key from ROM with a scanning electron microscope), side-channel attacks (e.g., using Correlation Power Analysis to leak the key during validation), or other software or hardware attacks that can somehow reveal the key. In summary, it is a safe assumption that such a key will not remain secret forever.


to post comments

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 6, 2025 11:41 UTC (Thu) by hmh (subscriber, #3838) [Link] (18 responses)

Defense in depth might have helped a bit, here... although when one uses weak CMAC instead of, say, HMAC, it is just a question of time.

Even so, there is previous knowledge that one should do just that. Intel changes the microcode keys often, and when a team managed to utterly break one processor family (Red Unlock errata), it limited the damage a great deal...

I am also under the impression that (nowadays?) Intel actually has at least *two* keys, and can permanently fuse-off one of them through a microcode update, switching to the backup key...

Hopefully AMD will get its act together now, and change the key rotation policy, if not redesign the whole thing to be a lot more secure...

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 6, 2025 16:57 UTC (Thu) by HenrikH (subscriber, #31152) [Link] (12 responses)

Well it shouldn't have had a key in the first place, instead the hash should have been signed using an asymmetric key, then the private key could not have been extracted from the cpu.

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 6, 2025 18:52 UTC (Thu) by bmenrigh (subscriber, #63018) [Link] (11 responses)

It *is* signed with an asymmetric key and they didn’t extract the private key from the CPU (because the private key isn’t there).

Instead they tricked the CPU into accepting a completely new RSA key that they constructed in a way that collides with the (weak) hash the CPU stored of the AMD public key. The CPU thinks the code is signed with AMD’s RSA key but it’s actually signed with Google’s carefully made fake. This is possible because AMD didn’t hardcode the full RSA public key in the CPU, only a hash of the full public key. But oops, they used a weak hash, and now anyone can make a colliding RSA key that the CPU thinks is genuine.

I’m glossing over the fact that there is a hardcoded AES key in the CPU because the hash used is AES-CMAC which needs a shared symmetric key.

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 7, 2025 11:58 UTC (Fri) by HenrikH (subscriber, #31152) [Link]

so basically they implemented it even worse than I thought... Wow.

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 7, 2025 16:02 UTC (Fri) by paulj (subscriber, #341) [Link] (9 responses)

They *did* have the key. Not extracted from the CPU, but simply obtained a NIST document! From the blog:

> We noticed that the key from an old Zen 1 CPU was the example key of the NIST SP 800-38B publication (Appendix D.1 2b7e1516 28aed2a6 abf71588 09cf4f3c) and was reused until at least Zen 4 CPUs. Using this key we could break the two usages of AES-CMAC: the RSA public key and the microcode patch contents.

The CMAC wasn't weak. The key was compromised!

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 7, 2025 16:38 UTC (Fri) by mjg59 (subscriber, #23239) [Link] (8 responses)

No. The microcode was signed with an asymmetric key, and that asymmetric key remains unknown (the private half isn't in the CPU for obvious reasons). But the public half of the key wasn't directly stored in the CPU either - a "hash" of it generated using the same CMAC algorithm was. Because CMAC isn't a secure hashing algorithm it was possible to generate a new RSA keypair with a public key that hashed to the same value, and they now had a private key under their control which could be used to sign the modified microcode. They could instead also have generated new microcodes that hashed to the same value as an existing signature and so wouldn't have required generating their own keypair, but doing so in a way that gave you working microcode was difficult.

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 7, 2025 17:18 UTC (Fri) by paulj (subscriber, #341) [Link] (7 responses)

I think either I am confused somewhere, or else you have confused/conflated the RSA key-pair which is used to encrypt the update (the private half of which of course remains unknown) with the AES symmetric key - embedded into the CPU, known to AMD, not /meant/ to be known to anyone else which is used to authenticate the RSA public key.

They obtained the AES key for the CPU from NIST documents. Once the key to a CMAC is known, the CMAC is exploitable. The CMAC based on the embedded AES shared key was used to authenticate the public key and to authenticate the patch contents. The next line from what I quoted before says it:

> Using this key we could break the two usages of AES-CMAC: the RSA public key and the microcode patch contents.

That key being the _AES_ key embedded in the CPU, having not been changed across CPU generations and published previously in a NIST document. With that, they could generate collisions for the public key and have their own RSA key-pair, which the CPU will accept to verify and decrypt the actual patch.

Essentially, the micro-code security depended on the shared, symmetric AES key - *not* the RSA key-pair. As they could calculate collisions for the embedded CMAC hash of the RSA public key to generate their own RSA key pair... game over.

CPU stored: hash of RSA pub key AND AES key used to authenticate the actual RSA pub key delivered in the patch.
AES key not extractable from CPU, but was in NIST docs.

Is what I read in the blog.

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 7, 2025 19:40 UTC (Fri) by mjg59 (subscriber, #23239) [Link] (6 responses)

RSA is used to *sign* the microcode, not encrypt it. Verifying that signature requires that you have the public key - but it's annoying to have to store the entire public key, so it's pretty typical to instead store a hash of the public key and then ship the actual public key with the signed material. As long as your hash algorithm is strong, it's unrealistic for anyone else to create something else that hashes to the same value, so even if someone replaces the public key in the update it won't hash to the value that's stored in the CPU and the CPU will reject it.

But in this case they *didn't* use a strong hash algorithm. They used CMAC, which is a message authentication algorithm. Critically, if you have possession of the secret used, it becomes easy to generate collisions. So, yes, the fact that AMD used a publicly known "secret" was an error, but so was the use of CMAC - since it's based on AES it's using symmetric cryptography, which means the secret has to be embedded in every single CPU, which means it's secret only in a nominal sense.

What the Google team did was take advantage of that lack of collision resistance and generate a new RSA keypair whose public key hashed to the same value as the legitimate public key, as you say. But fundamentally the problem was the use of an entirely inappropriate algorithm as a hash, and it's almost certainly the case that extracting the secret from the CPU would not have been much more of a speed bump for a team of this calibre. CMAC is a strong message authentication algorithm and a weak hash algorithm.

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 10, 2025 12:24 UTC (Mon) by paulj (subscriber, #341) [Link] (5 responses)

I think there's confusion here on RSA signatures as a primitive and PKCS construction RSA-SHA signatures as used in the AMD ucode updates, which is understandable as it was difficult to understand it from the blog.

> RSA is used to *sign* the microcode, not encrypt it.

I'm just using the language from the blog: "The RSA PKCS #1 signature is decrypted using the RSA public key and supplied Montgomery modular inverse. The result is a padded AES CMAC hash of the patch contents.". It sounds like the "RSA signature" is actually an encrypted blob, and they're using RSA decryption rather than RSA signing, I'm not quite sure of the construction, not enough details (the blog mentions that the PKCS1-v1_5 header is meant to use a SHA hash for the hashing of the signature - but that's not what AMD did). Though computationally it makes little difference RSA signing is encryption/decryption. It does seem like AMD wanted to avoid implementing actual full RSA in the hardware though, hence why they didn't use the normal PKCS construction for signatures - the hash has to be actually RSA signed. They used RSA for decryption, with the Montgomery mod multiplication shortcut - presumably much easier to do in the hardware available at ucode update time, which it seems must have ruled out standard PKCS signing based on RSA signature (which would require RSA /encryption/ operation) or they'd have used that.

Regardless, the actual thing that is being checked to authenticate the patch is the hash value using the AES CMAC.

> But in this case they *didn't* use a strong hash algorithm. They used CMAC, which is a message authentication algorithm.

CMAC is exactly what you need to authenticate something, and indeed most (all? modern?) authenticated-encryption algorithms now use a CMAC for the authentication part, even if public-key encryption is involved too. You could NOT have used /just/ a strong cryptographic hash here (given the apparent constrain that AMD did /not/ want to use RSA for signing the key). AES is as strong as any cryptographic hash - indeed, symmetric encryption ciphers tend to have a longer "shelf life" (staying secure) than cryoptographic hash algorithms.

> so was the use of CMAC - since it's based on AES it's using symmetric cryptography, which means the secret has to be embedded in every single CPU, which means it's secret only in a nominal sense.

The key was NOT extracted from a CPU. It was from a NIST document. Using a crypto-hash instead of AES CMAC would NOT have fixed this. How would you *authenticate* the hash, given you do NOT have full RSA available? Given the constraint that you can NOT do RSA encryption (needed to verify an RSA signature, which would be needed to perform a full PKCS signature verification using a crypto-graphic hash), how would you use a cryptographic hash to authenticate the update? You can't.

So, under that constraint, I guess they were limited to AES. AES is perfectly secure, and AES CMAC is also perfectly secure. As long as the shared key is kept secure. And for whatever reason, it had been published by NIST! That's the root cause of the problem here.

Now, you could say a shared key embedded in every CPU was also insecure and would one day be compromised. Sure, perhaps - but that isn't the fault of AES-CMAC. Also, the practicality of shaving a 14nm (or smaller), 4.8B+ transistor, CPU and being able to find and read the microfuses that correspond to the shared AES key, I don't know... Maybe others here with relevant EEE experience can speak to that.

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 10, 2025 13:45 UTC (Mon) by paulj (subscriber, #341) [Link] (2 responses)

Just to summarise, cause what AMD has done is a bit confusing, and it's taken a series of back and forth comments here and re-reading the blog to double-check and get my head around it, my understanding is this (as most likely, from the ambiguity there is in the blog text):

- AMD needed to authenticate ucode patches
- For whatever reason, they could not implement full modular exponentiation needed for RSA encryption (which is needed for RSA signatures)
- They could implement decryption, via Montgomery modular multiplication (for whatever reason, not applicable to RSA encryption / verifying RSA signature)
- This meant AMD could NOT use an RSA signature by AMDs private key to authenticate some part of the update, and then use some standard construction (i.e. PKCS and a SHA-x hash authenticated by RSA signature) to verify the remainder as authentic.
- AMD instead rolled their own construction, where the patch is essentially authenticated by an AES CMAC, using a shared secret key embedded in the CPU
- AES CMAC is perfectly secure of itself. However, relying on a shared key is intrinsically less secure than public-key cryptography, from an operational perspective.
-- I don't know how secure keys embedded in CPUs are, but they don't get leaked that often (??), maybe others know
-- But the key was "leaked", by publication by NIST

If you lose secrecy of the key to your CMAC, it's game over.

There is 1 line in the blog that kind of suggests it may be an RSA signature, worded at odds with step 7 of the description:

> AMD signs the new microcode patch using their RSA private key which corresponds to the public key embedded in the patch.

v

> 7. The RSA PKCS #1 signature is decrypted using the RSA public key and supplied Montgomery modular inverse

However, I think the former line surely is just confusingly referring to the PKCS RSA header, rather than an actual RSA signature (?). Cause if they could actually do RSA signature verification on the CPU, then they wouldn't have needed to roll their own shared-secret-key CMAC scheme, they could have just used the normal PKCS RSA-SHAx construction - or rolled their own verification construction rooted in the authenticated RSA sig.

If they /can/ do actual RSA sigs, then the use of the shared, secret key and the AES CMAC is just very very strange. Why do that? Was it just cause AES was in the hardware and cheaper than SHA-x? If they had actually transferred a value that was authenticated with a very secure RSA signature, why on earth then use that value just as an input to a CMAC with a far less secure key? Why downgrade like that? If they really had RSA-sig authenticated material, why not transfer a SHA-x hash?

The AES CMAC construction for the authentication is what strongly suggests to me that there is no real RSA signature in the ucode process, the use of RSA limited to a stripped-down decryption process (necessitating the transfer of the Montgomery inverse). Why even bother with RSA PKCS format "signatures" and the RSA decryption in that case? I don't know... probably some combination of box-ticking to be able to say "RSA encryption and signatures", and wanting to use existing PKCS tooling (???).

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 10, 2025 16:29 UTC (Mon) by Nspace (subscriber, #139097) [Link] (1 responses)

Hi, I'm one of the authors of the research.

The blog post is slightly ambiguous, where it says that "the signature is decrypted" it just means that it's raised to the power of the public exponent modulo the modulus. The standard calls this "decrypting the signature" and I guess it's technically correct but also misleading. It has nothing to do with encryption in this case. It's a standard PKCS#1 v1.5 signature using CMAC as a hash function.

Hope this clears it up.

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 10, 2025 16:44 UTC (Mon) by paulj (subscriber, #341) [Link]

Thanks very much for the clarification.

I had assumed AMD knew what they were doing, which could only lead one to think the reason for the authentication resting on the pre-shared, CPU, secret key was that there was a hardware limitation preventing them using an actual RSA signature to root the veracity in a (not shared) private key of an RSA key-pair.

So now I'm completely baffled why AMD did it this way, when they've already done the computationally difficult work of producing material that is verified by an RSA sig. Staggering.

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 12, 2025 23:35 UTC (Wed) by riking (subscriber, #95706) [Link] (1 responses)

mjg is more correct here: the use of a published example key is a *minor* mistake. It's a mistake, yes, but the symmetric key being in every shipped CPU means you only need to tear one apart to get the key.

The misappropriation of CMAC as a hash is the major mistake.

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 12, 2025 23:52 UTC (Wed) by paulj (subscriber, #341) [Link]

Using a CMAC makes sense if you have do not have asymmetric crypto available (e.g., cause of hw restrictions), and the only thing you can root trust in is a shared, secret key.

Using a CMAC when you have assymmetric crypto, when you actually /have/ authenticated a piece of material using said assymmetric crypto, only to then to switch over to a CMAC with a key that is compromised or liable to be compromised is indeed utterly daft.

I _completely agree_ on that latter point. I thought it /had to/ be the former case because a) the blog suggested "encryption" in some places and b) the latter case is just utterly daft, and AMD engineers surely aren't that - so surely it had to be former case.

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 6, 2025 21:30 UTC (Thu) by ballombe (subscriber, #9523) [Link] (4 responses)

On the other hand the availability of recent off-the-shelf CPU where you can change the microcode open lots of avenues for experiments.

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 6, 2025 22:58 UTC (Thu) by gutschke (subscriber, #27910) [Link] (3 responses)

The tinkerer in me would like to believe that. In practice, I doubt that this is really true. Way too little of the microcode ISA has been reverse-engineered to be useful for more than fun little one-offs. Making RDRAND return 4 (or maybe 5, if the carry happens to be set) is cute, but won't lead to major breakthroughs in software engineering.

A sufficiently well-funded adversary might just be able to exploit microcode to compromise things like SEV. But for almost any other type of attack, there already are easier approaches that can be tried by an attacker who has gained sufficient privileges to load microcode. I don't want to rule out microcode attacks on high-value targets, as they are likely much harder to detect. But that's a very specific edge case. 99% of use won't have to worry about that possibility.

As for tinkerers who hope to teach their CPU new tricks, I suspect that they'll be disappointed. Making these type of major changes depends on the microcode rewiring major subsystems of the CPU. And I doubt that's possible. More likely than not, all you can do is intercept an instruction and replace it with a microcoded version; or maybe tweaking some internal MSR registers. That's great for patching implementation flaws (i.e. if your IMUL instruction was broken, you could replace it with a slower but correct implementation), but if your goal was to teach AMD CPUs to natively execute RISC-V or to gain CHERI capabilities, then even with full knowledge of the microcode ISA, you wouldn't be able to pull this off. The hardware simply isn't that flexible.

In other words, I love that this research is being done in public, but I don't expect it to have immediate practical consequences.

x86 pointer authentication via Intel microcode patch

Posted Mar 7, 2025 2:13 UTC (Fri) by jbosboom (guest, #176381) [Link]

The end of the blog post being discussed links to slides from a project that implemented (among other things) ARM-style pointer authentication code instructions with a microcode patch for Intel Goldmont cores at a claimed cost of 54 uops/25 cycles per PAC instruction. If you wanted to know if/how a hypothetical x86 PAC implementation would integrate with the x86 legacy software base, this seems like a viable way to do that experiment on a full system with real workloads. It wouldn't have the overheads of Intel's software simulator or Valgrind, and it might even be easier than hacking those up, given you have compatible hardware.

That's hardly a breakthrough, and I don't think anyone would use it in production, but it could make it easier and/or faster to "get here from there".

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 7, 2025 3:44 UTC (Fri) by bmenrigh (subscriber, #63018) [Link]

Unfortunately the way the ROM patching works is that only existing instructions can be patched. New instructions can’t be introduced with a microcode update.

I think this will have a huge impact on CPU vulnerability research but probably won’t have much reach beyond that.

Why the same key (both for CMAC and for RSA) across *all* processor families?

Posted Mar 7, 2025 9:34 UTC (Fri) by ballombe (subscriber, #9523) [Link]

I am quite sure one could improve divq on Intel CPU by calling 3 time mulq if one could fix the microcode.
<https://gmplib.org/~tege/division-paper.pdf>
As it stands, we need to write different code for AMD and Intel CPU to work around the divq slowness.

Firmware is just unfixable non-free code.

Peter Gutmann say: "Bollocks"

Posted Mar 7, 2025 13:54 UTC (Fri) by PeeWee (subscriber, #175777) [Link] (4 responses)

I cannot help but feel reminded of this set of presentation slides [PDF] by Peter Gutmann. I mean, what even is the supposed attack vector here?
Google say:
In theory, after we have extracted the hash by <method>, we will be able to craft a colliding hash by <method> for our malicious microcode update, which we also need to craft in a way that it produces a colliding hash, after having reverse engineered AMD's proprietary microcode (RISC) architecture by <method>. All that's left then is to inject it somewhere in the supply chain without anybody noticing, Bob's your uncle and boom goes the Dynamite!!1!

Granted, I am not intimately familiar with the details, but usually it is not sufficient to be able to produce a hash collision, but one also needs to be able to produce something useful (like an actual working microcode update) that results in said hash, i.e. a chosen text collision. Unless CMAC is really crappy, that is nigh on impossible during the lifetime of these processors, if ever. Last I checked, even the "broken" SHA-1 is still rather safe, given these constraints.

Peter Gutmann say: "Bollocks"

Posted Mar 7, 2025 14:09 UTC (Fri) by jtaylor (subscriber, #91739) [Link] (2 responses)

They didn't need to create a hash collision on the microcode patch, that indeed would be hard and is also discussed in the article.

Instead they created a new key which hashes to the same value amds secret key hashes to in the signature verification process and then they can sign any data with that new key and the cpu accepts it.

Peter Gutmann say: "Bollocks"

Posted Mar 7, 2025 14:38 UTC (Fri) by PeeWee (subscriber, #175777) [Link] (1 responses)

Ahh, shoot! Missed that part. Should have read more carefully instead of just skimming.

But that then still leaves the reverse engineering of the microcode architecture. BTW, their example which makes RDRAND return 4 or 5 should just make Linux systems running systemd crash, because it relies on it to produce unique IDs. Thanks to all the systemd haters raising a stink back then, I can still remember this one. ;)

Peter Gutmann say: "Bollocks"

Posted Mar 11, 2025 21:35 UTC (Tue) by JoeBuck (subscriber, #2330) [Link]

One way to use an attack like this would be to undo a microcode security fix: sign the old broken version with modified time stamps and version numbers to look new. Then a fixed exploit would become a working exploit.

Peter Gutmann say: "Bollocks"

Posted Mar 7, 2025 14:52 UTC (Fri) by farnz (subscriber, #17727) [Link]

A big piece of this is that the theoretical state of the art in various applied mathematics fields is way beyond the practical state of the art. There's no research credit in repeating what we learnt in the 1980s all over again, but much of the practical state of the art isn't yet caught up with the theoretical state of the art of the 1980s, let alone the 1990s.

As a consequence, actually improving the practical state of things is a hard slog of "do the thing that theory fully exhausted as a field of investigation somewhere over 20 years ago again, and again, and again"; but that's not exciting, nor does it help if you're aiming to get a PhD. And this applies not just to cryptography, but to other areas of theory; type systems, language design, communications, computational algebra and more.

Now, there are extremely good reasons for the gap between theory and practice; I remember one case where someone in communications happened to recall hearing about an error correcting code from the 1960s that had been dismissed the last time he encountered it as "impractical - you'd need the equivalent of a whole IBM System/360 per person!", and had the insight to go "but that was over 30 years ago, and compute is much cheaper now". It does, however, mean that the things that are hyped are new capabilities that literally didn't exist 30 years ago, and not the endless repeats of "we fully understand this attack and how to fix it - we're just making the same mistakes year-in, year-out".

backdoor

Posted Mar 15, 2025 15:25 UTC (Sat) by andrejp (guest, #47396) [Link]

This is probably deliberate. This level of incompetence is rarely incidental - a weak hash selected, and the key is also "accidentally" leaked/selected (more like made publicly available to anyone/anywhere). Difficult to believe that AMD's crypto/security guys are this incompetent (and that they somehow passed all the HR/QA screening). Much more likely that this is very much deliberate, with "plausible deniability" baked in ("gee we messed up").

It guarantees both that the key can be relatively easily broken (for all AMD's cpus) as well as that a powerful ring -2 vector of attack is provided to anyone in the know (basically a pure backdoor baked into the cpu that can't be accidentally discovered and/or misused - it requires a very skilled and knowledgeable researcher to find it, but is relatively trivial to use if you know what and where to look for and have the compute resources to brute force it).


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds