Huang: IRIS (Infra-Red, in situ) Project Updates

Posted Mar 10, 2024 23:24 UTC (Sun) by dskoll (subscriber, #1630)
In reply to: Huang: IRIS (Infra-Red, in situ) Project Updates by roc
Parent article: Huang: IRIS (Infra-Red, in situ) Project Updates

I find the hybrid detection mechanism using a scan chain that the article described unconvincing. As I wrote earlier, an attacker can hide malicious circuitry in "unused" sections of the chip and just not include it in the scan chain.

Huang: IRIS (Infra-Red, in situ) Project Updates

Posted Mar 11, 2024 21:55 UTC (Mon) by smoogen (subscriber, #97) [Link]

I really don't think that is the most common attack that is going to be done.. the most common is that someone is going to sell smaller chip for bigger one. yes someone can really screw over and put out a malicious chip but if you are really worried about it you need to do a lot more thorough testing than this anyway. This is meant to cover the current 99.99% case. When it clears that out.. and the 0.01% case becomes 0.1 to 10% then more work can be done.

Huang: IRIS (Infra-Red, in situ) Project Updates

Posted Mar 12, 2024 1:26 UTC (Tue) by himi (subscriber, #340) [Link] (11 responses)

It wasn't discussed in the article itself, but this was raised in the comments - the proposed fix was to include those unused sections of the chip in the scan chains. Combined with explicitly setting up the scan chains to ensure they constrain the minimum size of any trojan logic to a level that could be detected using IRIS, that seems like it should work?

Way outside my area of expertise so I have no idea about the implications for cost/performance/etc, but if the goal of this is to achieve an independently verifiable trust root then implementing this kind of extra validation support doesn't seem unreasonable.

Huang: IRIS (Infra-Red, in situ) Project Updates

Posted Mar 12, 2024 3:37 UTC (Tue) by dskoll (subscriber, #1630) [Link] (10 responses)

But the scan chain is set up by the chip manufacturer, and it can't be changed after that.

I get that this is useful for detecting blatant fraud like selling a 1GB flash drive that really only holds 512MB, but detecting these kinds of blatant fraud is pretty easy anyway.

Most complicated digital chips are essentially software... they're synthesized from a hardware description language (typically VHDL or Verilog) input. If someone hides a trojan in that input, I don't think anyone can tell by looking at images of the chip.

Huang: IRIS (Infra-Red, in situ) Project Updates

Posted Mar 12, 2024 3:39 UTC (Tue) by dskoll (subscriber, #1630) [Link]

Ah, ok, so the manufacturer puts the unused sections in scan chains and documents them. But the case I mentioned was for spare memory rows that are used if other rows are defective; they get switched by severing some fusible links with a laser. So you can't really include them in the scan chain because they're defective and will defeat the purpose of having spares, which is to increase yield.

Huang: IRIS (Infra-Red, in situ) Project Updates

Posted Mar 12, 2024 12:44 UTC (Tue) by pizza (subscriber, #46) [Link] (8 responses)

> But the scan chain is set up by the chip manufacturer, and it can't be changed after that.

No; it's set up by the chip *designer* when they synthesize the RTL/verilog/etc into a sea of gates; this is where scan chains are generated and inserted into the design, and must be done before you can do the necessary timing analysis to find out if your design will actually *work* at the desired clock speeds.

(Granted, the designer has to trust the (usually highly proprietary) tools that synthesize the RTL into gates, but the accuracy of that can be done offline, before anything gets sent to the fab)

The sea of gates is then translated into a physical design through another synthesis pass that adds in the various analog components (eg RAMs, I/O cells, level shifters, amplifiers, and the individual transistors) and spits out a set of GDS files corresponding to the various mask layers in your design. These analog components are usually highly tuned for the specific fabrication process you're using.

You hand those GDS files to the fab, which may further modify them to add in any remainng analog blocks (or pre-synthesized 3rd party digital blocks), They then copy-n-paste your design until they can fill an entire wafer, and that is used to create the production mask set.

To the tl;dr of tihs is that unless you're intentionally plonking a 3rd-party pre-synthesized digital block into your chip design, the chip designer has full control over the scan chains, and they can't be altered after the fact by the fab.

Huang: IRIS (Infra-Red, in situ) Project Updates

Posted Mar 12, 2024 13:42 UTC (Tue) by somlo (subscriber, #92421) [Link] (3 responses)

> and they can't be altered after the fact by the fab

The fab can choose to alter anything they want, they have full control of the *physical* layer underneath!

Their challenge consists in understanding (reverse engineering) enough of your masks to alter things in a way that is 1. hard to detect by you (i.e., leaves the design in a still apparently working order) and 2. useful (to them). E.g.:

https://web.eecs.umich.edu/~taustin/papers/OAKLAND16-a2at...

Huang: IRIS (Infra-Red, in situ) Project Updates

Posted Mar 13, 2024 14:19 UTC (Wed) by pizza (subscriber, #46) [Link] (2 responses)

> The fab can choose to alter anything they want, they have full control of the *physical* layer underneath!

Sure, but doing so in a way that preserves the designer-supplied scan chains (and continues to meet timing constraints) is a _far_ more challenging proposition.

...The difficulty is not unlike that of inserting arbitrary code into a binary while maintaining its sha256sum and not inducing any failures in an exhaustive, timing-dependent test suite.

(You can design the chip such that each logical block has its own scan chains, and you have to mux the external test bus to select which module you want to check. The attack vector here is that someone inserts _additional_ logic in unused/wasted silicon. accessed via some OOB mechanism. It wouldn't get caught by the original scan chains due to the additional test mux setting, but its ability to interact with the original logic would be quite limited due to any interaction necessarily affecting signal routing (and thus timing). Realistically, the main thing you'd have to worry about is something that can monitor your external I/O busses, but since those components are usually provided as opaque 3rd-party IP anyway, you're already in a position where you have to trust them...)

Huang: IRIS (Infra-Red, in situ) Project Updates

Posted Mar 13, 2024 23:01 UTC (Wed) by himi (subscriber, #340) [Link] (1 responses)

And the point that Huang is aiming for with IRIS is where you can physically verify that there /aren't/ any of those additional logic blocks outside those that can be mapped to something verifiable by the designed-in scan chains.

From what I can tell, it may not be doable on a modern process, but it's /definitely/ doable on an old 130nm process, and it /may/ be doable on a 22nm process that's still in very common use.

Huang: IRIS (Infra-Red, in situ) Project Updates

Posted Mar 14, 2024 0:55 UTC (Thu) by pizza (subscriber, #46) [Link]

> From what I can tell, it may not be doable on a modern process, but it's /definitely/ doable on an old 130nm process, and it /may/ be doable on a 22nm process that's still in very common use.

It's definitely doable on 65/55nm processes too -- 8-ish years ago, we did some die imaging to aid in planning a series of FIB experiments [1] [2] and you could clearly see the analog components, along with relative transistor/metal density in the sea of gates and how it aligned with the EDA tools' view of the final placed-and-routed design.

(When the team was disbanded I used those die photos to make a really cool poster for everyone)

[1] https://en.wikipedia.org/wiki/Focused_ion_beam
[2] We were able to experiment with some design changes by modifying existing silicon instead of waiting ~3 months to find out if our theories of the failure were correct.

Huang: IRIS (Infra-Red, in situ) Project Updates

Posted Mar 12, 2024 14:01 UTC (Tue) by farnz (subscriber, #17727) [Link] (3 responses)

And to bring this back round to Huang's original concern; he wants to mass-produce an ASIC from an open-source design such that it is practical for purchasers of that ASIC to confirm that the source code that Huang provides matches the ASIC they've bought.

The parts of this problem under Huang's control have known solutions - he can provide all the files he has to show that the design he sent to the fab and the source code match up, and I can verify that those files match up. However, there's currently no good way to show that the fab actually manufactured the design they were sent; they could, at least in theory, change the design to one that functions the same way as the one they received, but has malicious components added either in empty space in the design, or next to the original design when they repeat it across the wafer. IRIS addresses this by providing a way to inspect the chip and confirm that it doesn't have "surprise" blocks not on the original design.

Importantly, Huang's hypothesis that drives IRIS forwards is not "this can detect any modification to any chip". It's "if you design your chip with supply-chain tampering in mind, you can design a chip such that modifications made at the fab either show up under IRIS inspection or show up on the scan chain".

Huang: IRIS (Infra-Red, in situ) Project Updates

Posted Mar 12, 2024 15:34 UTC (Tue) by paulj (subscriber, #341) [Link]

Great, succinct, summary of the use-case and threat-model. Thanks.

In-situ verifiable, open-source-design chips seems like a significant step forward. Even it means a design has to stick to IR-light resolvable process nodes, and has to accept some inefficiencies from design constraints (fill up unused space; more scan chains to allow small, recognisable blocks to be verifiable; etc.).

Self-made "home" fab seems like the other, perhaps longer-term, prong towards verifiable hardware that exists. LWN has had some articles on someone plugging away at that.

Huang: IRIS (Infra-Red, in situ) Project Updates

Posted Mar 12, 2024 15:37 UTC (Tue) by somlo (subscriber, #92421) [Link] (1 responses)

> Importantly, Huang's hypothesis that drives IRIS forwards is [...] "if you design your chip with supply-chain tampering in mind, you can design a chip such that modifications made at the fab either show up under IRIS inspection or show up on the scan chain".

Trouble with that (as also pointed out by other commenters) is that modifications made at the fab can be too small (see the UMich paper linked above) for IRIS's resolution to pick up. There are also other attacks (see https://pdfs.semanticscholar.org/6407/ebd0a24026e4dad84bc...) where even if one could obtain a die shot with perfect resolution, it would still be impossible to tell that the fab did, in fact, compromise the die (e.g., by replacing selected transistors with identically sized ones, but of the wrong doping polarity).

AFAICT, the value proposition of IRIS is to visually confirm that the number of cores, SRAM blocks, and other *major* sub-components of a die are more or less present and accounted for, and nothing much deeper beyond that...

Huang: IRIS (Infra-Red, in situ) Project Updates

Posted Mar 13, 2024 14:36 UTC (Wed) by pizza (subscriber, #46) [Link]

> AFAICT, the value proposition of IRIS is to visually confirm that the number of cores, SRAM blocks, and other *major* sub-components of a die are more or less present and accounted for, and nothing much deeper beyond that...

...And to make sure no other major/noticable chunks were added in the deadspace.

Realistically though, there's not a lot of deadspace in a typical chip -- the main reason for deadspace is if you're constrained by pincount/packaging requirements; you need a minimum amount of die area to support a given number of pins.

Part of your chip design includes a floorplan for major components, so you'll know where all your stuff is supposed to be, and that will closely correspond to what's "visible" on the die. Deadspace stands out quite clearly, so if something is there, it'll be _very_ obvious.