|
|
Log in / Subscribe / Register

Constant-time instructions and processor optimizations

By Jonathan Corbet
February 3, 2023
Of all the attacks on cryptographic code, timing attacks may be among the most insidious. An algorithm that appears to be coded correctly, perhaps even with a formal proof of its correctness, may be undermined by information leaked as the result of data-dependent timing differences. Both Arm and Intel have introduced modes that are intended to help defend against timing attacks, but the extent to which those modes should be used in the kernel is still under discussion.

Timing attacks

Timing attacks work by observing how much time is required to carry out an operation; if that time varies according to the data being operated on, it can be used to reconstruct that data. As a simplistic example, imagine a password-checking function that simply compares a provided string against a password stored in a (presumably) secure location. A logical implementation would be to start at the beginning, comparing characters, and return a failure status as soon as an incorrect character is found. That algorithm could be naively coded as:

    nchars = max(strlen(attempt), strlen(password));
    for (i = 0; i < nchars; i++)
        if (attempt[i] != password[i])
	    return false;
    return true;

The time this check takes is thus a function of the number of correct characters at the beginning of the string. An attacker could use this information to reconstruct the password, one character at a time. Real-world timing attacks that can, for example, extract cryptographic keys have been demonstrated many times.

In response, security-oriented developers have learned to avoid data-dependent timing variations in their code. In the example above, for example, the entire password string would be compared regardless of where the first wrong character is found. All of this careful work can be undermined, though, if the CPU this code runs on has timing artifacts of its own. It will surely come as a shock to LWN readers to learn that, in fact, CPUs do exhibit such behavior.

The CPU vendors are not unaware of this problem or its importance. But they are also not unaware of how beneficial some optimizations that introduce timing differences can be for certain benchmark results. The usual tension between security and performance objectives comes into play here, and the CPU vendors have taken the usual way out: make the users figure out which they want to sacrifice to gain the other.

Constant-time processor modes

Both Arm and Intel have thus introduced modes that guarantee that some instructions, at least, will execute in constant time regardless of what the operands are. In Arm's case, the mode is called Data Independent Timing (DIT); Intel calls its mode Data Operand Independent Timing (DOIT) mode, often called DOITM. Neither mode is enabled by default.

Back in August 2022, Eric Biggers asked whether these modes should be enabled by kernel. As a result of that discussion, a patch by Ard Biesheuvel was merged to enable DIT for the arm64 architecture — but only when running in the kernel. DIT remains off for user space by default, but enabling it is an unprivileged (and cheap) operation on that architecture, so user-space developers can enable it easily when they feel the need. This feature will be part of the 6.2 kernel release; it has not, as of this writing, been backported to any stable updates.

The story for x86 is less clear. DOITM is controlled by a model-specific register (MSR) and cannot be changed by user space. The August discussion wound down despite an attempt by Biggers to restart it in October. He returned in January, asking once again whether DOITM should be enabled by default for x86 — and advocating that it should:

Cryptography algorithms require constant-time instructions to prevent side-channel attacks that recover cryptographic keys based on execution times. Therefore, without this CPU vulnerability mitigated, it's generally impossible to safely do cryptography on the latest Intel CPUs.

Dave Hansen was less enthusiastic about the idea, even though he concluded that it was "generally the right thing to do". He pointed to language in Intel's documentation stating that DOITM only adds value for code that was specifically written with timing differences in mind:

Translating from Intel-speak: Intel thinks that DOITM purely a way to make the CPU run slower if you haven't already written code specifically to mitigate timing side channels. All pain, no gain.

The kernel as a whole is not written that way.

DOITM, he said, is only going to be useful for a small amount of carefully written cryptographic code in user space, and will only be a performance loss for everything else. He also noted that Intel explicitly warns that the performance impact of DOITM may be "significantly higher on future processors". The "Ice Lake" generation of processors is the first where DOITM makes any difference at all; constant-time operations are evidently the norm on earlier generations.

Biesheuvel argued that the value of DOITM extends beyond user-space cryptographic libraries, and that it is even more relevant to the kernel:

But for privileged execution, this should really be the other way around: the scope for optimizations relying on data dependent timing is exceedingly narrow in the kernel, because any data it processes must be assumed to be confidential by default (wrt user space), and it will probably be rather tricky to identify CPU bound workloads in the kernel where data dependent optimizations are guaranteed to be safe and result in a significant speedup.

Biggers questioned Hansen's focus on performance, saying that the kernel operations that benefit most from data-dependent optimizations are almost certainly the ones that most need protection, since they are the ones that will show the strongest timing differences. He concluded:

I think the real takeaway here is that the optimizations that Intel is apparently trying to introduce are a bad idea and not safe at all. To the extent that they exist at all, they should be an opt-in thing, not out-opt. The CPU gets that wrong, but Linux can flip that and do it right.

Hansen answered that the community has "looked at how bad the cure is compared to the disease for *every* one of these issues", referring to other types of hardware vulnerabilities. DOITM is different, he said, because it looks like the cure is likely to get worse over time rather than better; that makes it hard to come up with a reasonable policy in the kernel. He later added that, after discussions within Intel, he feels the kernel community should not jump to enable DOITM: "Suffice to say that DOITM was not designed to be turned on all the time. If software turns it on all the time, it won't accomplish what it was designed to do." Doing that, he said, would deprive systems of the "fancy new optimizations" that are coming in the future.

No conclusions have been reached this time either — at least, not yet. It has not helped that, so far, nobody has posted any benchmarks showing what the performance impact of DOITM is. Assuming that cost is not huge, though, it would be surprising if DOITM does not end up being enabled by default in the kernel, at least, with the ability for user space to enable it on demand. "Insecure by default" is rarely a way to impress users, after all.

Index entries for this article
KernelArchitectures
KernelCryptography


to post comments

Constant-time instructions and processor optimizations

Posted Feb 3, 2023 16:02 UTC (Fri) by mb (subscriber, #50428) [Link] (46 responses)

> "Insecure by default" is rarely a way to impress users, after all.

Well, for almost all code it's actually better to be "insecure by default", if that means it's faster.
The code where timing attacks matter is only an extremely tiny amount.

Code where timing matters has to be carefully designed today *already*.
Therefore, I think "insecure by default" is the right choice. Code can simply flip the switch and enable this constant timing feature just where it matters.

Constant-time instructions and processor optimizations

Posted Feb 3, 2023 16:18 UTC (Fri) by wsy (subscriber, #121706) [Link] (44 responses)

Problem is flipping an MSR is expensive and requires privilege. I think a safe defafult with an option to be unsafe is the way to go.

Constant-time instructions and processor optimizations

Posted Feb 3, 2023 18:39 UTC (Fri) by Wol (subscriber, #4433) [Link] (43 responses)

> I think a safe defafult with an option to be unsafe is the way to go.

And not just for flipping an MSR. What if someone takes sensitive code, written on a system BEFORE this stuff, and runs it on a processor that came after? How is it supposed to switch on a defense that didn't exist when the program was built?

Cheers,
Wol

Constant-time instructions and processor optimizations

Posted Feb 3, 2023 18:55 UTC (Fri) by pizza (subscriber, #46) [Link] (42 responses)

> How is it supposed to switch on a defense that didn't exist when the program was built?

It's supposed to behave the same way any existing binaries are supposed to behave when deployed on newer hardware -- no difference, because it can't optimize for what it doesn't know exists.

Software that cares about constant time is already going to have to be very carefully written, perhaps even hand-coded in assembler for the specific processor being used. If a new future processor has additional features to make this easier, that's still going to require opting in to those features. Similarly, it may perform existing functions differently (eg faster, which is usually the goal for newer processors) which may require re-visiting that carefully-tuned software.

Any other answer basically amounts to "backwards compatibility doesn't matter".

Constant-time instructions and processor optimizations

Posted Feb 3, 2023 19:41 UTC (Fri) by Wol (subscriber, #4433) [Link] (15 responses)

Yup but isn't the point of what they're saying here, that these optimisations (which weaken security) will be switched on, by default, by the processor?

So your code, which was perfectly secure before, is now insecure because although its inputs and outputs don't change, the way the processor runs it does. And the only way to get the original "timing profile" back is to turn off the options that didn't used to exist.

Cheers,
Wol

Constant-time instructions and processor optimizations

Posted Feb 3, 2023 19:53 UTC (Fri) by pizza (subscriber, #46) [Link] (14 responses)

Sure, but what I'm saying is that this has nearly always been the case -- different processors execute the same code in different amounts of time, because execution time has never been guaranteed at an architectural level.

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 0:12 UTC (Sat) by developer122 (guest, #152928) [Link] (13 responses)

From the recent gihub incident, Hyrum's Law:
>With a sufficient number of users of an API,
>it does not matter what you promise in the contract:
>all observable behaviors of your system
>will be depended on by somebody.

And indeed, this constant-time behavior has been heavily depended on by EVERYBODY for at least the last 30 years.

Just because you did not specify it in a contract does not mean you are not responsible for it. The linux kernel is responsible for every bit of it's behavior that a userspace application comes to depend on, for example.

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 3:54 UTC (Sat) by pizza (subscriber, #46) [Link] (12 responses)

> Just because you did not specify it in a contract does not mean you are not responsible for it. The linux kernel is responsible for every bit of it's behavior that a userspace application comes to depend on, for example.

Is "if you change the hardware the behaviour will not be identical" really such a hard thing to grasp?

Am I the only one that remembers the "turbo" buttons on old PCs? When off, it would down-clock it to the 4.77MHz of the original PC, because there was (badly-written!) software that had hard-coded timing that broke on newer (==faster) systems. Should that software have held up the progression of the entire industry?

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 13:18 UTC (Sat) by jhoblitt (subscriber, #77733) [Link] (9 responses)

#bringbackturbo

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 18:57 UTC (Sun) by barryascott (subscriber, #80640) [Link] (8 responses)

CPUs have dynamic clock speed already, aka turbo, based on the thermal overhead available on the chip.

So this should already be an issue right?

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 19:12 UTC (Sun) by jhoblitt (subscriber, #77733) [Link] (7 responses)

I was referring to the little red button that inevitably need to be in the up/off state to make game's timing loop run at the intended rate.

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 19:20 UTC (Sun) by barryascott (subscriber, #80640) [Link] (6 responses)

Yes i know the button you mean, mandatory to make games work at the right speed.

Just pointing out its a built in feature now.

Also that it will change the time a crypto algorithm runs in.

Oh and we already have efficiency cores that run slower then performance cores on recent CPUs.

Another reason a CPU cannot be relied on to execute in constant wall clock time.

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 21:11 UTC (Sun) by Wol (subscriber, #4433) [Link] (5 responses)

> Another reason a CPU cannot be relied on to execute in constant wall clock time.

Except you're missing the point ... although this too actually might be a help.

We don't care whether wall time is constant, what we care about is that time is *independent* of the operation. If they all run on a slow core, fine. If they all run on a fast core, fine. If some run on the slow core and some on the fast, AT RANDOM, brilliant!

But if time depends on the operator/operands, then anything capable of observing time will be able to leach information.

Cheers,
Wol

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 22:04 UTC (Sun) by barryascott (subscriber, #80640) [Link] (2 responses)

I hope i have not missed the point.

What happens if the CPU turns turbo on or off while the algorithm is running?
What happens if the thread is moved between cores with different performance characteristics as it runs?

Do these events help or hinder security already?

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 23:54 UTC (Sun) by Wol (subscriber, #4433) [Link] (1 responses)

What happens if turbo changes? What happens if it moves cores? That will IMPROVE security.

The attacker is wanting to watch (and analyse) differences in times caused by the operands.

Therefore the more you can introduce *noise* into the timings (eg by turning turbo on and off, or by core-switching), you're making an attacker's job harder, because they are getting statistically significant timing changes, that from their point of view is unwanted noise. It's interfering with their analysis.

That's what you're missing. The less the time differences are caused by the operands, the more the time differences are caused by random events that have nothing to do with the operands, the better. You WANT to introduce noise, to hide the signal ...

Cheers,
Wol

Constant-time instructions and processor optimizations

Posted Feb 7, 2023 7:45 UTC (Tue) by JanC_ (guest, #34940) [Link]

> What happens if turbo changes? What happens if it moves cores? That will IMPROVE security.

Depends on whether the turbo changes & core moves depend on the operands or not, of course… :)

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 20:19 UTC (Mon) by draco (subscriber, #1792) [Link] (1 responses)

Randomly changing clock rate won't help, it just increases the number of samples required. The variation due to CPU core will average out and the data driven variation will persist.

cf. https://www.google.com/search?q=timing%20attacks%20random...

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 20:33 UTC (Mon) by farnz (subscriber, #17727) [Link]

It doesn't hinder it, either - unless the choice of CPU clock rate is correlated with the input, it's just noise, and not signal. The problem only kicks in when the samples contain some signal correlated with the input data - at which point, there's no way to add noise that will prevent you ever finding the signal, given that you can get as many samples as you need.

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 3:30 UTC (Sun) by developer122 (guest, #152928) [Link] (1 responses)

The turbo button is irrelevant. It changed the clock speed but every instruction always took the same number of cycles regardless of it's setting and never, ever did any of those instructions' execution times change depending on their data.

The possibility that different hardware may behave differently isn't in question.

Intel decided to break a fundamental requirement of all security-critical code that has been well-documented for decades, from one generation fo hardware to the next. That's a colossally stupid design decision any way you slice it.

Constant-time instructions and processor optimizations

Posted Feb 7, 2023 8:07 UTC (Tue) by JanC_ (guest, #34940) [Link]

Instructions like MUL & DIV definitely had different timings depending on their operands on the original 8086 & 8088.

See e.g. https://www.oocities.org/mc_introtocomputers/Instruction_...

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 0:08 UTC (Sat) by developer122 (guest, #152928) [Link] (20 responses)

This hits the nail on the head.

It would take more human effort than every programmer on earth could muster to review all the code out there for every spot where this optimization would have to be turned off and back on again. It is not a trivial change to go back over 30+ years of code and perform this kind of sensitive analysis.

The code of the last 15+ years MUST continue to work correctly on the latest processors. Honestly, inertia is the only thing that's kept x86 alive for most of it's life. It was something AMD understood well when they came out with amd64.

Leave these dangerous tricks disabled in the general case, and let people who need the speed opt-in as they currently do with mitigations=off. If the hardware allows control on a per-core or per-thread level, it should be fine to allow opting in on that basis too.

Then there is the problem of ongoing burden, because the very last thing we need is *YET ANOTHER* way to forget and screw up security.

P.S. And before anyone asks, yes the effects will be observable. This is not an accidental cache effect as caused as with Spectre. The whole point of these optimizations is to create as large and observable as possible a difference in execution time. It's how you know whether they work.

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 11:33 UTC (Sat) by mb (subscriber, #50428) [Link] (19 responses)

>It would take more human effort than every programmer on earth could muster to review all the code >out there for every spot where this optimization would have to be turned off and back on again. It is not >a trivial change to go back over 30+ years of code and perform this kind of sensitive analysis.

That is not true.
Such an analysis is not needed, because all of the code sections which are coded with constant-time-execution are already known today. Otherwise they wouldn't be constant-time (guaranteed).

Only code sections which are *already* carefully coded with constant-time-execution today have to be revisited.

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 3:43 UTC (Sun) by developer122 (guest, #152928) [Link] (15 responses)

>constant-time-execution are already known today.

Unbelievably, colossally, wrong.

Firstoff, there is no central database of everywhere a bit of security code has been written. There are some obvious suspects, like sshd, or curl, or a browser. But would you have guessed that there was security-sensitive code in a text renderer?

Second, take the example of the browser, or the ssh server, or whatever. Obviously there's security sensitive code in there. But having been written with constant time isn't the same as being somehow "tagged" for it. There's no universal way to find such code, being as at best you might have some vague comment next to it.

You can't just grep -i "constant time" ./*.c across each codebase. Human people need to search through the code and apply human reasoning to find each and every instance. Across millions upon millions of lines of code.

If you actually read the article carefully, you'd also know that all the data the kernel handles must remain secret in some way or another, since it belongs to a particular userspace process and shouldn't be snooped on. Intel's mistake was thinking that the security sensitive stuff was limited to a couple of cipher implementations, and here you've fallen into the same line of thinking. As one of the developers in the discussion pointed out, the places in the general kernel code where this would make the most difference are also the places most likely to be sensitive to disclosure. This is in many ways a potential arbitrary information disclosure vulnerability.

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 13:08 UTC (Sun) by Wol (subscriber, #4433) [Link]

This is a TEMPEST in a teacup :-)

For those who don't remember the big old computer screens - you know, the ones that were almost as deep as they were wide - there was a snooping method called "tempest". By pointing a radio receiver or whatever it was at the building over the road, you could actually recreate the picture on the screen.

At the end of the day, EVERYTHING can be a security vulnerability, and adding new stuff by default that breaks old security guarantees should be a no-no.

If we have a new instruction set that enables this stuff, great. We can use it as appropriate. Compilers can bake it into new code, fine. BUT DON'T BREAK OLD CODE. (And yes, it doesn't change the output the *good* guys are looking for. But if it changes the output the *BAD* guys are looking for, in a manner they can take advantage of, then that's BAAAAD!).

Take advantage of this to design a new, backwards-compatible, instruction set and it'll be a good thing. Otherwise, no I don't want my personal details leaked. And while me, personally, is not worth attacking, someone like Rishi Sunak or Keir Starmer or or or ... they are a juicy target ...

Cheers,
Wol

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 15:19 UTC (Sun) by mb (subscriber, #50428) [Link] (13 responses)

If there is code which needs constant time semantics and that code is not already written and documented with constant time in mind, then that piece of code is *already* broken.
It might work. But only out of pure luck. And it might break at any time. Even today. Think of compiler update.

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 16:25 UTC (Sun) by Wol (subscriber, #4433) [Link] (12 responses)

And even if it's written and documented (which according to your definition is NOT broken), if Intel suddenly change the x86 instruction code behaviour then it will still break. And THIS is the problem here - code that works fine - SECURITY code that works fine - will be broken by Intel changing the behaviour of existing op-codes.

So ALL security-sensitive code has suddenly had the assumptions it relies on invalidated. Which means pretty much any company processing sensitive data will simply have to ban these processors from their datacentres ...

Just one more poor decision in a long list of Intel decisions that looks good until you actually think through the consequences ...

Cheers,
Wol

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 16:35 UTC (Sun) by mb (subscriber, #50428) [Link] (11 responses)

> And even if it's written and documented (which according to your definition is NOT broken),

Can you please stop putting words into my mouth.

> if Intel suddenly change the x86 instruction code behaviour then it will still break.

Yes. But that's no different from today, where some compiler upgrade may break you timing assumptions.
There's no such thing as timing assumptions in the C standard.

And the same is true for processor instructions today.
Yes, you get rough timing numbers. But you usually don't get detailed information about internal behaviour. (Pipelining, caching, microcode). And such internal behaviour did change in the past and will change in the future.

Therefore, you will have to revisit and analyze your timing critical code on a regular basis anyway.
Intel introducing this new non-constant execution timing does not fundamentally change that requirement.

Having constant-timing code and just assuming that it will be constant timing on new toolchains and new processors simply is a false assumption and has always been.

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 16:52 UTC (Sun) by Wol (subscriber, #4433) [Link] (10 responses)

> Having constant-timing code and just assuming that it will be constant timing on new toolchains and new processors simply is a false assumption and has always been.

And therein lies the problem. What that leads to is that every time you deploy a system, you need to do a full security audit. Okay, a lot of that audit could just be "oh these assumptions are still valid", but you can't even replace a failed motherboard without having to revisit everything.

That's the problem with these changes. If they're on by default, it breaks pretty much every pre-existing security audit. Do you really want to be forced to rewrite a large chunk of your code, simply to be able to replace a failed motherboard safely?

And why is it always Intel that seems to be at the centre of these storms? Okay, it could be biased reporting, but my memory goes back a long way and I don't remember the other x86 vendors being caught up in storms over bad engineering design decisions. The list of bad Intel decsions just goes on and on ...

> > And even if it's written and documented (which according to your definition is NOT broken),

> Can you please stop putting words into my mouth.

Fair enough. But in what other heavily regulated (and yes, computing - when handling PII - is heavily regulated) industry would a supplier expect to get away with just throwing all their customers' expectations in the dustbin? Would a steel manufacturer suddenly change the specification of structural girders and not expect massive kickback?

If I have to comply with legal requirements, that place ME in jeopardy, I don't expect my suppliers to move the goalposts behind my back ...

Cheers,
Wol

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 17:14 UTC (Sun) by mb (subscriber, #50428) [Link] (8 responses)

> That's the problem with these changes. If they're on by default, it breaks pretty much every
> pre-existing security audit. Do you really want to be forced to rewrite a large chunk of your
> code, simply to be able to replace a failed motherboard safely?

That's essentially true. However,

a) This has not fundamentally changed with the HW change from the article.
b) I think you're greatly over estimating the mount of affected code and the amount of work.

The affected code is basically only the well known parts of some types of crypto (e.g. remote authentication) and maybe some parts of the operating system which check credentials/permissions or does some IPC or something like that.

Almost all code running on any machine is not affected security wise. (In contrast: If this performace measure is not enabled by default, then all code running is affected).

And if there's code hidden somewhere which has constant timing requirements and we don't know about that fact, then we should develop methods to find and identify such code. Regardless of the HW change from the article, because such code is a security problem *already*.

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 9:34 UTC (Mon) by leromarinvit (subscriber, #56850) [Link] (7 responses)

> Almost all code running on any machine is not affected security wise. (In contrast: If this performace measure is not enabled by default, then all code running is affected).

I doubt that. Spectre attacks work, AFAIU, by exploiting the tiny timing differences created by speculative execution caching data (or not doing so). This, OTOH, deliberately introduces data-dependent timing differences that are designed to be as large as possible (since they're performance optimizations). So it seems like Spectre on steroids to me.

How do you know which code needs to be protected? I'd question the assumption that it only matters for cryptographic code. I'd even say that the programmer CAN NOT know in advance if the information that's processed is sensitive, since that obviously depends on the information itself.

A text renderer displaying this comment is probably irrelevant - LWN will happily serve a copy to anyone who asks. An image decoder decompressing cat pictures is probably not all that interesting a target either. But what if the same text renderer was displaying your draft of a classified report? What if the image being decoded was a construction drawing of your fancy new alien invasion shelter?

It seems to me that the only person who can decide if such optimizations are acceptable is the user, and I'm leaning towards "off by default" for security. Make it easy to switch on, by all means, but the default configuration should be the one with the least "interesting" surprises.

The only category of applications where this could probably be switched on by default are games, since it's highly unlikely that anything they process could be sensitive. But for everything else, we (as programmers) just don't know.

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 10:01 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

Spectre is only a vulnerability under the assumption that the speculatively-executed code path has access to data that is not accessible under the intended code path, the speculative code can trigger timing variations that are visible to the intended code, *and* an attacker can directly or indirectly control the intended code (to the extent necessary to exfiltrate the data). In practice, this is quite common, especially in the kernel or in VMs.

Spectre is fundamentally a privilege escalation attack, where we regard the speculative code as having a higher level of privilege than the intended code (because it has access to data that is not meant to be accessed). So, when we consider a case without this "speculative/intended" distinction, where there's just one code path (as in the case where Intel merely causes some instructions to execute faster), a Spectre-like attack cannot exist, because there is no privilege boundary to be crossed in the first place. In other words, you can't (usefully) attack yourself.

Of course, this does not mean that there are no attacks, period. It just means that we won't see the same level of far-reaching "everything is on fire" problems we saw with Spectre. Vulnerabilities will be limited to cases where an attacker can cause victim code to execute a known algorithm on unknown-to-the-attacker data, and the attacker can also observe precise timing information. In practice, that's *mostly* crypto code, to my understanding, but I suppose there might be other edge cases where this comes up.

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 11:15 UTC (Mon) by mb (subscriber, #50428) [Link] (4 responses)

> Spectre attacks work,

Spectre has nothing to do with this, at all.
It's a completely different thing.
The core of Spectre is that it opens a side channel via cache. That's not possible with the mechanism from the article.

As I already said, I would maybe agree with enabling const-time in code such as the kernel, which handles data across permission boundaries. I still would like to see a real world threat scenario first, though.
But I don't see any reason at all to enable it for all user code.

> How do you know which code needs to be protected?

You'd better know already. Otherwise your algorithm probably is broken already.

> I'd even say that the programmer CAN NOT know in advance if the information that's processed is sensitive,
> since that obviously depends on the information itself.

This is not about having "sensitive" information in the first place.
It's about the algorithm.
If your algorithm can leak information via timing to a third party that doesn't normally have access already and if the programmer can not rule out that such data is "sensitive", then it probably should be protected with constant timing.
But that's *not* new. That doesn't change with the technology from the article.
Your compiler can already compile your algorithm into data-dependent non-const-time.
Therefore you must already protect against that today.

> But what if the same text renderer was displaying your draft of a classified report?

Ok. And what kind of timing attack do you imagine?
You are making up a lot of what-ifs, but no actual scenario where (non-)constant-timing comes into play.

>It seems to me that the only person who can decide if such optimizations are acceptable is the user,

No, that's wrong. The only person who can decide is the algorithm developer.
The user has no information about how the data is handled and how the algorithm works.

There's a lot of hand waving going on here in the comments about how security would magically break everywhere. But nobody can really explain how that would actually work. Except for the already well known algorithms like password checks and credential hashing etc.

I don't buy the argument that we must slow down the world, because we think there might be some vulnerability that we can't even explain how it would work.

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 11:41 UTC (Mon) by Wol (subscriber, #4433) [Link] (2 responses)

> If your algorithm can leak information via timing to a third party that doesn't normally have access already and if the programmer can not rule out that such data is "sensitive", then it probably should be protected with constant timing.

In other words, that's EVERY SINGLE DATABASE MANAGEMENT SYSTEM. There is your problem. The programmer (for the most part) has no clue whether the data is sensitive or not, and if he doesn't know, he must assume that it is.

Cheers,
Wol

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 11:53 UTC (Mon) by mb (subscriber, #50428) [Link]

>In other words, that's EVERY SINGLE DATABASE MANAGEMENT SYSTEM. There is your problem.
>The programmer (for the most part) has no clue whether the data is sensitive or not, and if he doesn't
>know, he must assume that it is.

How do you extract data that you don't normally have access to from your database management system with this timing attack?
Can you please explain that to me?

The mere presence of "sensitive" data has nothing to do with this feature.

If you can explain how you would do that with the feature from the article, then I will immediately stop arguing.

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 13:29 UTC (Mon) by kleptog (subscriber, #1183) [Link]

Databases don't even pretend to offer any kind of security against side-channel attacks in their data. If row-level security is offered, inserting into a table with a primary key will reveal the existence of another row with that primary key, whether or not you have permission to see it. The database will make no effort to hide whether lookups in an index succeed or not, so timing queries can reveal all sorts of information.

This is all documented [1][2]. If you allow a user to execute arbitrary queries on a database, you're just asking for it. The answer is to not let users execute arbitrary queries on the database, not slow down everything to the slowest possible.

Constant-time is really very hard, since the layers of CPU cache will make almost any operation involving memory dependant on the state of the system. If your algorithm doesn't take this into account, you're screwed already. Fortunately, it makes absolutely no difference in the vast majority of cases.

[1] https://learn.microsoft.com/en-us/sql/relational-database...

[2] https://www.postgresql.org/docs/current/ddl-rowsecurity.html

Constant-time instructions and processor optimizations

Posted Feb 16, 2023 10:20 UTC (Thu) by demfloro (guest, #106936) [Link]

>I don't buy the argument that we must slow down the world, because we think there might be some vulnerability that we can't even explain how it would work.

Intel is trying to speed up the world at the cost of security. Processors before Ice Lake work as if DOITM is always enabled, Ice Lake and later processors will default to non-constant timing unless kernel enables DOITM.

"For Intel® Core™ family processors based on microarchitectures before Ice Lake and Intel Atom® family processors based on microarchitectures before Gracemont that do not enumerate IA32_UARCH_MISC_CTL, developers may assume that the instructions listed here operate as if DOITM is enabled." from https://www.intel.com/content/www/us/en/developer/article...

Furthermore Intel tries to scare not to enable DOITM as they clearly plan to design future microarchitecture around it, so enabling DOITM will become more expensive. I don't see how all this is "slowing down the world".

And on x86 currently there is no way for userspace to express to kernel need to enable DOITM at all. Hence kernel should keep DOITM enabled all the time to preserve past CPU behaviour for kernel itself and for userspace.

Constant-time instructions and processor optimizations

Posted Feb 7, 2023 4:45 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

> The only category of applications where this could probably be switched on by default are games, since it's highly unlikely that anything they process could be sensitive.

Their DRM engines probably want to avoid being attacked…

Constant-time instructions and processor optimizations

Posted Feb 8, 2023 22:31 UTC (Wed) by anton (subscriber, #25547) [Link]

And why is it always Intel that seems to be at the centre of these storms?
In this particular case it seems that they caused the storm by their own documentation and the option to turn the operand-dependent timing of the affected instructions off. Someone at Intel actually worried about constant-time routines, and did not want to break the constant-time property without giving the programmer a way to reassert it. And if I understand the Intel documentation correctly (it is not that well written), the MSR setting exists since Skylake (2015), but only in Ice Lake (2019) and later they changed some previously constant-time instructions to have operand-dependent timing.

I would not be surprised if some other CPU design house had just made the instructions have operand-dependent timing without announcing it and without such an MSR setting.

OTOH, it's now 61 months since Spectre was published and 68 months since it was revealed to Intel and AMD, and AFAIK they all have failed to produce a Spectre-immune CPU, so Intel doesn't always take security so seriously. Is this just whataboutism? No: Assume you have a secret (e.g., a private key) that is architecturally accessed only with a constant-time algorithm to avoid classical side-channel attacks. As long as Spectre is not fixed in hardware, that secret can be revealed through a successful Spectre attack on all the code (not just the tiny piece of code that performs architectural accesses to it) that has the secret in it's accessible memory; so unless you are sure that you mitigated all potential Spectre vulnerabilities (everywhere in the program), does the constant-time property of your architectural secret-handling code really matter?

Constant-time instructions and processor optimizations

Posted Feb 9, 2023 10:19 UTC (Thu) by nim-nim (subscriber, #34454) [Link] (2 responses)

Once upon a time someone added compression to http (deflate, did wonders to make websites work fast on something else than the webdev lan). And then people figured how to break https by taking advantage of the way compression strips out the fat that makes observing and timing traffic hard.

So now newer http specs explicitly deprecate compression. Even though it was a wonderful optimization, and the malvertizers continue to destroy site responsiveness with unwanted bulk.

Constant-time instructions and processor optimizations

Posted Feb 13, 2023 7:31 UTC (Mon) by jwilk (subscriber, #63328) [Link] (1 responses)

[citation needed]

Constant-time instructions and processor optimizations

Posted Feb 13, 2023 10:19 UTC (Mon) by excors (subscriber, #95769) [Link]

I don't think it's "deprecated" in general, but HTTP/2 says compression is forbidden in many cases (https://www.rfc-editor.org/rfc/rfc9113#name-use-of-compre...):

> Compression can allow an attacker to recover secret data when it is compressed in the same context as data under attacker control. HTTP/2 enables compression of field lines (Section 4.3); the following concerns also apply to the use of HTTP compressed content-codings (Section 8.4.1 of [HTTP]).
>
> There are demonstrable attacks on compression that exploit the characteristics of the Web (e.g., [BREACH]). The attacker induces multiple requests containing varying plaintext, observing the length of the resulting ciphertext in each, which reveals a shorter length when a guess about the secret is correct.
>
> Implementations communicating on a secure channel MUST NOT compress content that includes both confidential and attacker-controlled data unless separate compression dictionaries are used for each source of data. Compression MUST NOT be used if the source of data cannot be reliably determined. Generic stream compression, such as that provided by TLS, MUST NOT be used with HTTP/2 (see Section 9.2).

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 2:44 UTC (Sun) by wujj123456 (guest, #84680) [Link] (4 responses)

> Any other answer basically amounts to "backwards compatibility doesn't matter".

This is quite a bit of a stretch to me. This is not incompatibility. It still runs and produces the same results. Imaging any timing change in new processors that expose previously unobservable race conditions becomes an issue of backward compatibility. One can equally argue the correctness of that program was dependent on the specific timing of each instructions it used. In that case, even results break, but it would be pretty wrong to view it as incompatibility.

The carefully-tuned secure software made assumptions about unspecified behavior and thus more bugs are exposed on the new architecture. This needs to be fixed by the software owner and validated again on the new architecture. That would be my take. Of course, unless x86 ISA itself specified constant time data independent operation for certain instructions that Intel is going to break, then that's a different story. I haven't looked this up.

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 8:37 UTC (Sun) by matthias (subscriber, #94967) [Link] (3 responses)

> The carefully-tuned secure software made assumptions about unspecified behavior and thus more bugs are exposed on the new architecture. This needs to be fixed by the software owner and validated again on the new architecture. That would be my take. Of course, unless x86 ISA itself specified constant time data independent operation for certain instructions that Intel is going to break, then that's a different story. I haven't looked this up.

Why do you think this is unspecified behavior? The duration of most instruction (and the fact that the duration is independent of the operands) is definitely specified for the existing processors. This is necessary to produce efficient code, as the compiler needs to know when the results are ready and can be used without causing the processor to wait on the data. And this constant time behavior is absolutely vital for all cryptographic (and quite a bit of other) code in order to be secure.

What was not specified that this will never change in future processors, just like it was never specified that all processors will support x86 indefinitely. If Intel now changes such a core design aspect, then this is clearly a backwards incompatible change that requires all software relying on the old behavior to be adapted.

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 14:55 UTC (Sun) by pizza (subscriber, #46) [Link] (2 responses)

> Why do you think this is unspecified behavior? The duration of most instruction (and the fact that the duration is independent of the operands) is definitely specified for the existing processors

Yes, it is defined for *existing processors*. But each of those existing processor family/generations can (and does) have *different* instruction durations. Usually faster, but sometimes slower (eg for relatively uncommon legacy x86 instructions) No CPU manufacturer has ever, EVER, guaranteed that relative instruction timing will never change in future processor families. [1]

You cannot write one piece of software and expect it to behave *identically* (ie across the time domain) across every x86-64 (or x86, or ARM, or SPARC, or whatever) CPU ever produced to date, much less everything that will ever be produced. And that's purely at the individual instruction level! When you factor in different cache sizes/strategies, instruction decode/issue width, execution units, reorder buffers, branch predictors, uOP caches, and other microarchitectural state that varies with *every* processor family/generation, expecting identical, consistent behaviour (without careful adjustments for each processor type/family/whatever) is even more naive. (And then you have the timing variations caused by other hardware, virtualization, and the likes of multitasking OSes)

[1] And you know what? 99.99999% of software makers (and buyers) are more than happy with that, because they have come to expect each generation to be "faster" than the prior generation, even at the same clock speed.

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 16:41 UTC (Mon) by neggles (subscriber, #153254) [Link]

How long a given instruction takes on a given CPU isn't the problem; the problem is that on chips with this feature enabled, how many cycles some instructions take will change depending on what the input data is. That's exactly what constant-time crypto is trying to avoid; you can't leak data though timing changes if your timing doesn't change.

This is... potentially a Very Bad Idea. ARM's implementation is per-task, has an unprivileged userspace method to enable/disable, and the state is saved/restored when the stack is pushed/popped. If intel adjust their implementation to match, or at least made it opt in, that would be great, but...

Constant-time instructions are not constant between processors, but rather constant regardless of data

Posted Feb 6, 2023 18:20 UTC (Mon) by farnz (subscriber, #17727) [Link]

Constant time code isn't about the code taking the same time (in any unit) as on past processors; rather, it's about adapting your code so that the time taken to execute code is not dependent on the inputs to that code. In other words, if I have a function f(x), the time taken to compute the value of f(x) does not depend on x, only on f. This is already not true of all instructions on all CPUs; for example, branch instructions take a variable amount of time depending on whether the branch predictor gets it right or wrong. But it is true of a useful subset - for example, MUL r32, r32 takes 9 clock cycles on the original Intel Pentium, and 3 on AMD Zen 4, which means that MUL takes constant time regardless of the register contents.

The change is going to break that assumption for many more instructions than it's currently broken for. For example, taking MUL again, a future CPU might say that multiply by 0 or 1 is done by the register renamer and is "free" (subject to instruction decode limits) while multiply by 2 through 15 takes 1 clock cycle, multiply by 16 through 65535 takes 2 clock cycles, and multiply by 65536 through 232 - 1 takes 3 clock cycles.

This is a problem for cryptography code; if each instruction takes time depending on the input data, then there's a timing side-channel opened up, where the exact values of the input data (plaintext, key, encrypted bytes) affect the runtime of the cryptographic algorithm. It's not a problem for most code, since we don't usually go to considerable effort to avoid any timing variance in running the code - we let branch predictors, caches etc all change the runtime based on the input data.

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 0:17 UTC (Sat) by wujj123456 (guest, #84680) [Link]

"Insecure by default" kinda has one's opinion baked in already. "Slow by default" likely wouldn't impress users either. At the end of the day, it really comes down to a balance and I supposed this one is particularly hard given the unknown future regressions. On the other hand, I believe arch specific defaults aren't new. Perhaps the answer would be clear once we have benchmark results.

Constant-time instructions and processor optimizations

Posted Feb 3, 2023 18:22 UTC (Fri) by Sesse (subscriber, #53779) [Link] (10 responses)

This discussion feels strange; my code (which is not cryptographic) does not become more secure by having x * 7 be just as slow as x * 777777777. That's really what's being discussed here, no?

Constant-time instructions and processor optimizations

Posted Feb 3, 2023 18:55 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (1 responses)

It already is today (apparently). It's more like future processors may now get a different timing because they're going to see `x * 7` and instead do `(x << 3) - x` at the hardware level (or whatever).

Constant-time instructions and processor optimizations

Posted Feb 3, 2023 20:11 UTC (Fri) by wahern (subscriber, #37304) [Link]

A relatively recent survey of the topic is https://www.bearssl.org/constanttime.html Regarding multiplication they say,

Multiplications might be a problem in older systems. This one is very irksome: in almost all recent CPU designs, multiplications are constant-time, but some older CPU may have shortcuts that make multiplications faster when operands are small. In the Intel x86 line, the 80486 had a variable-time multiplication, while the Pentium has constant-time multiplication. One of the most recent yet significant CPU designs with variable-time multiplication is the ARM9.

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 0:17 UTC (Sat) by developer122 (guest, #152928) [Link] (7 responses)

If your password was 777777777 and this was an important step in checking it, then the difference in execution time would definitely help an attacker guess it.

The classical example is in password string comparison. Bailing out when you hit the first wrong character allows an attacker to guess your password one character at a time, which is very easy and fast to do. The same logic applies to virtually all secret-handling code in one way or another. You cannot allow the values being computed to affect the timing, or observing the timing will tell you the secret values.

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 7:44 UTC (Sat) by Sesse (subscriber, #53779) [Link] (6 responses)

That's fine, but 99% of my time isn't spent doing password checking of cryptography. And many places _even_ when doing crypto, it doesn't even matter; e.g. dm-crypt or my local password manager can leak as much timing it wants when decrypting my passwords, the only observer is me anyway (an external party cannot trigger testing of a given password, nor really observe how long it takes to test it).

If you want to write code that doesn't leak timing information, you need to first make sure it doesn't branch, then that it doesn't reference memory, and only _then_ can you start worrying about things like non-constant multiplication. (Insert “…depending on secret information” if you want to nitpick.) The places that have these kinds of demands are already pretty closely marked up, and that's where you should enable DOITM. Not universally across your entire machine; that makes no sense, as most code isn't written to that kind of standard in the first place.

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 9:58 UTC (Sat) by matthias (subscriber, #94967) [Link] (5 responses)

> That's fine, but 99% of my time isn't spent doing password checking of cryptography. And many places _even_ when doing crypto, it doesn't even matter; e.g. dm-crypt or my local password manager can leak as much timing it wants when decrypting my passwords, the only observer is me anyway (an external party cannot trigger testing of a given password, nor really observe how long it takes to test it).

Are you really sure with the only observer thing? The timing of dm-crypt (and the AES especially) can be observed by anyone (and any piece of code) that is able to trigger writes to the filesystem. I would not really be surprised if this could be even observed from javascript when using client local storage. Ok, the noise will be horrible, but spectre like attacks have already been shown to work in principal from javascript.

And the timing it takes to check a private key for ssh authentication can be easily observed from anyone on a local network.

And 99% of the time, the computer is probably waiting for the slow user to hit the next key. It is also only a small fraction of software that really depends on the last bit of processing power, just as it is only a small fraction of code that depends on constant timing behaviour.

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 11:42 UTC (Sat) by Sesse (subscriber, #53779) [Link] (4 responses)

> And 99% of the time, the computer is probably waiting for the slow user to hit the next key. It is also only a small fraction of software that really depends on the last bit of processing power, just as it is only a small fraction of code that depends on constant timing behaviour.

As someone who spends most of my day waiting for computers, I beg to differ. :-)

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 17:46 UTC (Sat) by Wol (subscriber, #4433) [Link] (3 responses)

:-)

The problem is, is the computer working (a lot of the time it is), is it waiting for the user (not often), or is it waiting for the Snoracle server (in my case, yes a lot of the time).

Cheers,
Wol

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 18:47 UTC (Sat) by Sesse (subscriber, #53779) [Link] (2 responses)

So do you think the Oracle server you are waiting for should be default DOITM on or off?

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 19:15 UTC (Sat) by Wol (subscriber, #4433) [Link] (1 responses)

Scrapped, and replaced with a real database like MultiValue :-)

Cheers,
Wol

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 13:56 UTC (Mon) by LtWorf (subscriber, #124958) [Link]

It's not nice to avoid a question like this.

You should have either responded or not responded.

Quick hash

Posted Feb 3, 2023 18:51 UTC (Fri) by Wol (subscriber, #4433) [Link] (22 responses)

Half of this is probably just being aware of these problems ...

If I didn't have constant-time instructions, I'd probably do something like sticking a guard hash in front of it. That's roughly constant time, requires an exact match to get past, and only then does the actual comparison.

I doubt that's perfect, but it would certainly make attacks harder / take longer.

Cheers,
Wol

Quick hash

Posted Feb 3, 2023 21:42 UTC (Fri) by runekock (subscriber, #50229) [Link] (19 responses)

I don't know what a guard hash is, but I would think that a good solution often is to use a timer to delay to the worst-case time. Then you don't have to make your algorithms constant-time, which seem like a hard problem.

Quick hash

Posted Feb 4, 2023 0:35 UTC (Sat) by wahern (subscriber, #37304) [Link] (1 responses)

I think in practice it's the opposite: dynamic padding and delay loops are brittle and invariably broken without comprehensive compiler and OS support and augmentation. They don't and can't address exposure to, e.g., cache or power side-channels. So in the absence of constant-time ops you still need mitigations like page coloring, which require extensive support from the environment. See, e.g., https://arxiv.org/pdf/1506.00189.pdf

By contrast, constant-time operations not only solve the immediate problem of pacing your algorithm's logical operations, they implicitly address some of the trickier aspects of other side channels even when they don't make promises regarding, e.g., cache bypassing. That still leaves you with some work to do (e.g. removing conditional data dependencies), but now the problem is far more tractable *and* can be solved locally, without depending on the rest of the software stack and environment to implement complex mechanisms that for the most part still only exist in the literature and a few research systems.

Quick hash

Posted Feb 4, 2023 4:00 UTC (Sat) by pizza (subscriber, #46) [Link]

> I think in practice it's the opposite: dynamic padding and delay loops are brittle and invariably broken without comprehensive compiler and OS support and augmentation.

And also disabling most of the IPC-improving advancements of the past few decades. Like, oh, use of caches or branch prediction of any sort.

That's not something you'd ever want to do globally or even by default, but it makes sense to disable it for the tiny handful of specific applications where true constant time actually matters. Or just farm those specific operations off to a dedicated coprocessor.

Quick hash

Posted Feb 4, 2023 10:04 UTC (Sat) by Wol (subscriber, #4433) [Link] (16 responses)

> I don't know what a guard hash is, but I would think that a good solution often is to use a timer to delay to the worst-case time. Then you don't have to make your algorithms constant-time, which seem like a hard problem.

A guard hash is approximately constant time, which eliminates 99% of wrong attempts.

Let's say your password is 16 chars. That's two int-32s. XOR the password and the password attempt together to give you two int-32s. Add them together. If the result isn't 0, the password is wrong (if I've got my maths right. It might be -1. Whatever. All correct passwords will give the same value). That's pretty much constant time, they've got a 1 in 4 billion chance of getting past it, and THEN you do the real password comparison.

Chances are, if they get past the guard, they've already cracked the password. I'm sure someone will spot weaknesses in this, it's not meant to be foolproof, but it is meant to jack the cost up sharply.

Cheers,
Wol

Quick hash

Posted Feb 4, 2023 11:14 UTC (Sat) by excors (subscriber, #95769) [Link] (15 responses)

A sufficiently clever compiler or CPU could recognise that your function will return exactly the same result if it deletes your XOR checks and only does the 'real' comparison. If you can trust that your compiler/CPU *won't* do that, then you can simply implement the entire comparison using XOR and no data-dependent branches, which is the standard way of implementing a constant-time comparison function - e.g. https://github.com/openssl/openssl/blob/323c47532ea7fc79d... (though they also have an assembly version to reduce the risk of compiler trickery: https://github.com/openssl/openssl/blob/323c47532ea7fc79d...). There's no need to reinvent the wheel but worse.

Constant-time comparison is the most trivial case anyway - it's far more interesting when you need e.g. constant-time elliptic curve scalar multiplication (https://minerva.crocs.fi.muni.cz/). And even more interesting when you need protection against physical attacks (on smart cards etc), where an attacker can precisely measure the current going into the chip, so it has to be not just constant time but also constant power.

Quick hash

Posted Feb 4, 2023 16:27 UTC (Sat) by Wol (subscriber, #4433) [Link] (14 responses)

> it's far more interesting when you need e.g. constant-time elliptic curve scalar multiplication

That's well out of my depth, but could you put a "constant time" guard in front of that? Anything that can eliminate most incorrect passwords in constant time before the real checks, instantly adds a hash-collision problem to the list of obstructions facing an attacker.

Even storing the int_16 sum of all the bytes of a pass-phrase, and then rejecting if that fails, should surely increase the protection pretty heavily?

Cheers,
Wol

Quick hash

Posted Feb 5, 2023 2:35 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (13 responses)

If we're talking about "elliptic curve" anything, we're *well* out of the realm of "compare two passwords to see if they are the same." That's more along the line of generalized cryptographic primitives. There is no password. Instead, we have keying material, ciphertext, and/or plaintext, and we might also have key-exchange logic (like Diffie-Hellman, but with newer math), or the like. You absolutely cannot "just" fiddle with the protocol without consulting a team of expert cryptographers, who will promptly tell you that an attacker with a bit of string and a few bitwise XORs can trivially crack your entire implementation, because cryptography is hard and unforgiving like that.

Quick hash

Posted Feb 5, 2023 9:57 UTC (Sun) by Wol (subscriber, #4433) [Link] (12 responses)

Except I'm NOT fiddling with the protocol AT ALL.

All I'm asking is "is their a simple guard that will reject a large proportion of wrong attempts in constant time". Or am I wrong in assuming the attacker is in control of the "password" being fed in?

The whole point of such a guard would be to tamper with any timing attacks, adding noise to fool an attacker into thinking a successful step has failed (or vice versa). Especially if the guard added jitter approximately equal to the last ten or twenty attacks that passed the guard ...

So an attacker now has to worry about whether the attack should have succeeded in retrieving another character if it weren't for the guard, or whether the guard has made a failed attack look like it did successfully retrieve a character.

Cheers,
Wol

Quick hash

Posted Feb 5, 2023 10:10 UTC (Sun) by excors (subscriber, #95769) [Link] (1 responses)

There is no password. https://minerva.crocs.fi.muni.cz/ is about signature generation algorithms leaking information which can be used to recover their private key.

Quick hash

Posted Feb 5, 2023 11:16 UTC (Sun) by Wol (subscriber, #4433) [Link]

Thanks. Way over my head, but interesting ...

Cheers,
Wol

Quick hash

Posted Feb 6, 2023 0:34 UTC (Mon) by wahern (subscriber, #37304) [Link]

You would use a constant-time version of memcmp. OpenBSD added timingsafe_bcmp in 2009, and then timingsafe_memcmp a few years later. (https://man.openbsd.org/timingsafe_memcmp.3) NetBSD has consttime_memequal. (https://man.netbsd.org/consttime_memequal.3) Apple and FreeBSD adopted the OpenBSD routines.

I don't think either glibc or musl libc have adopted a similar interface. So on Linux or for portable software you'd probably want to use CRYPTO_memcmp from OpenSSL.

You should of course be hashing the passwords with salts, and only comparing those hashes. In which case using a constant-time compare isn't that important as the attacker can't work backward from the short-circuiting compare to decipher the plaintext input. The hashing itself should be constant-time, assuming modern digests like SHA-256 or SHA-3, though it's possible the *length* of the input password would leak. But there are a gazillion ways for the length to leak, and when it comes to password-based authentication schemes that's the least of your worries.

Quick hash

Posted Feb 6, 2023 7:29 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

> All I'm asking is "is their a simple guard that will reject a large proportion of wrong attempts in constant time". Or am I wrong in assuming the attacker is in control of the "password" being fed in?

You've just added another method to leak out information: attempts which get past this "guard" are "similar" to the real thing in some way (depending on the constant time algorithm involved). I feel like this is the case of layering invalidating cryptographic analysis of code (e.g., CRIME being one example of extracting information by figuring out how some piece of attacker-controlled data in the out affects compression ratios: more compression -> something is more compressible -> more redundancy between the input and the surrounding context).

Quick hash

Posted Feb 6, 2023 18:01 UTC (Mon) by stressinduktion (subscriber, #46452) [Link] (7 responses)

> The whole point of such a guard would be to tamper with any timing attacks, adding noise to fool an attacker into thinking a successful step has failed (or vice versa). Especially if the guard added jitter approximately equal to the last ten or twenty attacks that passed the guard ...

I don't think adding noise will help.

Maybe this particular construction helps, haven't thought too deeply about it, but I find it curiously interesting:

sleep(hmac(attacker-supplied-passwd, const-hmac-secret-per-app) % max-time)
// check password...

... making the delay depending on the attacker supplied input/password value, which should defeat statistical analysis? The sleep could be simply spinning in a tight loop for a while, not a full-blown syscall.

I don't know the source of it anymore, would be happy if someone has pointers or maybe an analysis of this construct. Regarding key generation, I am not sure if this idea could be expanded to hash the generated keys.

Quick hash

Posted Feb 6, 2023 18:19 UTC (Mon) by mb (subscriber, #50428) [Link] (6 responses)

No. This is just a pseudo random sleep, which is effectively as bad as random sleep.
Adding noise does not remove the signal, if you have enough tries. It will always eventually filter out, because real noise is just a constant value on average.
Your hash-algorithm is indistinguishable from real noise.

The only way to remove a signal is to actually remove it. e.g. adding an amount of time so that the overall execution is always the same. Which effectively is just a constant-time algorithm.
You could also make your algorithm always take one second by use of a monotonic timer that is started before your algorithm runs and waited for after the algorithm finished. But that's just over complicated and fragile (what if the attacker can make the algorithm runtime be >1s?) compared to the usual constant time practices.

Quick hash

Posted Feb 6, 2023 19:50 UTC (Mon) by stressinduktion (subscriber, #46452) [Link] (5 responses)

> Adding noise does not remove the signal, if you have enough tries. It will always eventually filter out, because real noise is just a constant value on average.

I actually found a reference [1] (section 4.3) from 2011.

I think the argumentation was something like, that an attacker is not able to probe for the worst case execution time so easily and thus rendering the attack more difficult in practice.

[1] https://www.researchgate.net/publication/336209882_An_Eff...

Quick hash

Posted Feb 6, 2023 20:20 UTC (Mon) by excors (subscriber, #95769) [Link] (4 responses)

I think adding an f(input) delay doesn't help when the attacker knows there will be correlation between many different inputs. In the password-comparison example, an attacker could try inputs from "a#000000" to "a#999999" and record the average time; every input is distinct so the delays are effectively random and are as vulnerable to statistical analysis as usual. Then try "b#000000" to "b#999999", etc up to "z", and whichever letter gives the lowest average is the first letter in the password. So the mitigation in that paper seems pretty useless.

Quick hash

Posted Feb 7, 2023 11:29 UTC (Tue) by anselm (subscriber, #2796) [Link] (3 responses)

Presumably a fairly obvious additional mitigation would be to prevent attackers from cheaply trying a million or more passwords in a row, e.g., by increasing the delay exponentially depending on the number of attempts.

Quick hash

Posted Feb 7, 2023 11:41 UTC (Tue) by farnz (subscriber, #17727) [Link]

If you can prevent attackers from cheaply trying a very large number of passwords, then statistical attacks don't work. But you can't rely on that - if I steal a copy of the underlying data store that your password validator depends upon, then in the extreme, I can set things up so that I run each attempt in parallel on a separate unique VM, so that I'm trying 1 million passwords on independent validators, each of which thinks this is my only password attempt ever.

Quick hash

Posted Feb 9, 2023 15:09 UTC (Thu) by excors (subscriber, #95769) [Link] (1 responses)

That may help in some cases, but it introduces the problem of securely storing the attempt counter. E.g. the iPhone 5C has a passcode retry counter which triggers long delays after 5 failed attempts, and optionally wipes all data after 10 failed attempts, so that you can't brute force even a 4-digit passcode. The counter is reportedly stored in external NAND flash (presumably encrypted). It's possible to unsolder the NAND chip and connect it to a custom circuit, back up the contents, enter 6 invalid passcodes, then restore the contents and retry unlimited times. (https://arstechnica.com/information-technology/2016/09/ip...)

That could be fixed by storing the counter in internal flash in a tamper-resistant security chip (which I presume is what Apple does nowadays), but that makes the hardware more complex and expensive (which I presume is why the iPhone 5C didn't do that).

(The iPhone thing isn't a timing attack, the passcodes are short enough that you can brute force even without any side channels; it's just relevant here because it depends on a counter and delay for security.)

Even in cases where you don't have to worry about attackers with physical access, you might need to worry that an attacker can e.g. trigger an otherwise-mostly-harmless crash bug to restart your service that's storing the attempt counter in RAM, and it resets to 0 and lets them defeat your timing attack mitigation. Of course you could implement a fix for that too; my point is that you're making your security dependent on an increasingly large and complex part of the whole system (hardware and software and policy). It's also dependent on your statistical analysis of the random jitter vs the cost of the algorithm you're trying to protect (though because you're trying to avoid using a constant-time algorithm, you don't even know its worst-/best-case costs, and it may change radically on newer CPUs and invalidate your assumptions), and on your choice of threshold for the probability of an attacker being able to break it.

On the other hand, if you (or some library developers) implement a constant-time comparison function, and carefully verify the assembly code and the timing guarantees provided by your CPU architecture so you're confident it's really constant-time, then you're done. No counters to store securely, no statistical analysis, no risk of your boss complaining that he keeps getting locked out when he mistypes his password and someone removing the limit without noticing the security implications; and no timing attacks. That seems to me like such an obviously better and reasonably straightforward solution that I'm honestly quite confused why people here keep proposing using variable-time algorithms and adding random delays instead.

Quick hash

Posted Feb 12, 2023 19:09 UTC (Sun) by stressinduktion (subscriber, #46452) [Link]

> That seems to me like such an obviously better and reasonably straightforward solution that I'm honestly quite confused why people here keep proposing using variable-time algorithms and adding random delays instead.

In my case (and because this is an answer to my comment), there is absolutely no need to convince myself about using proper constant time implementations in crypto code. I just remembered above paper from somewhere and found it just to be a gimmick maybe interesting enough to post (and discuss) here.

That said, I am still not convinced this idea can so easily be beaten compared to just a random delay *in the case of passwords*, because it is not just noise. I would myself not deploy it anywhere, given the clearly better options. For all other operations on cryptographic material it is obvious to me, that it absolutely should not be deployed.

Quick hash

Posted Feb 4, 2023 0:19 UTC (Sat) by developer122 (guest, #152928) [Link] (1 responses)

Doesn't work. An attacker only needs to subtract the constant time of the hash guard from their total observed time.

Quick hash

Posted Feb 4, 2023 10:09 UTC (Sat) by Wol (subscriber, #4433) [Link]

And then, for 3,999,999,999 attempts out of four billion, they get a time of 0. You can't get a meaningful statistical analysis analysing a string of zeros.

Cheers,
Wol

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 0:33 UTC (Sat) by developer122 (guest, #152928) [Link] (10 responses)

You would think that nobody at intel is thinking anything through when they are coming up with optimizations like these.

This optimization must *obviously* be opt-in, for all the reasons stated above. It it simply not feasible to identify every place that should be protected, for any of several reasons. And yet, they push it as if it is everyone else's problem to pick up the slack and to that analysis work for 30+ years of software.

The obvious course of events is as follows:
- The kernel enables constant-time for both itself and userspace unconditionally, allowing opt-in to the optimizations either at boot or per-thread
- Intel either
A) reads the writing on the wall and makes it cheaper to *opt into* the optimization where safe (accepting safety as the default) or
B) continues the current path, with everyone seeing slower performance growth in intel processors, while AMD etc focus on other areas

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 1:00 UTC (Sat) by developer122 (guest, #152928) [Link]

If it were up to me, I'd land a patch unconditionally turning these optimizations off (except maybe with mitigations=off) and backport it to all the currently supported kernels so that the intel CPUs *already in the field* that do this can get the security fix.

Then look at whether this can be enabled on a per-thread basis, and backport that work if it makes sense to do so.

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 4:19 UTC (Sat) by pizza (subscriber, #46) [Link] (8 responses)

> And yet, they push it as if it is everyone else's problem to pick up the slack and to that analysis work for 30+ years of software.

Yet somehow you feel the correct outcome is to punish everyone else's software instead. (which outnumbers the former by several orders of magnitude!)

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 9:43 UTC (Sat) by matthias (subscriber, #94967) [Link] (6 responses)

> Yet somehow you feel the correct outcome is to punish everyone else's software instead. (which outnumbers the former by several orders of magnitude!)

You cannot punish software. Software is not self-aware (at least not yet). So you should not speak about software, but about users. So given the choice, how many users prefer a few extra percent of speed if they have to compromise the security of their home banking? And if you really prefer speed, nobody will stop you from disabling this switch.

Also I strongly disbelieve in the several orders of magnitude. That would be at least a factor of 100 (only two orders of magnitude). In fact, for most existing software, there will not be a difference at all, as they neither need constant time execution nor the absolute maximal processing power of the newest hardware designs.

However a change that makes existing software vulnerable to timing attacks is a clear regression, while existing software will most probably be fine with the processing power of future processor generations, even with the constant timing enabled. They will be much faster than the hardware the software runs on now, anyway.

And for future software, this should be a non-issue, as the programmers will be aware of the switch and should be able to actively choose the mode.

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 11:40 UTC (Sat) by intelfx (subscriber, #130118) [Link] (5 responses)

> So given the choice, how many users prefer a few extra percent of speed if they have to compromise the security of their home banking?

Prior to making such heavy-handed analogies, could you please show that home banking (which happens inside of heavily sandboxed Javascript VMs within browsers) would _actually_ be affected?

Otherwise this just reeks of "think of the children".

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 12:11 UTC (Sat) by matthias (subscriber, #94967) [Link] (4 responses)

Since when does SSL encryption happen inside a Javascript VM?

Also timing side channels are the things that usually go through all layers of virtualization. Spectre attacks against the kernel have been demonstrated even from within a Javascript VM. Not really the easiest thing to exploit. But using some correlation between several runs this is possible. Sandboxing does not help against timing attacks. It can make exploitation a bit harder, but that is all.

The constant timing mode is designed especially to allow secure cryptography and mitigate possible attacks against cryptography. I cannot predict the future ans say which algorithm/software will be attacked this way. But home banking is sure a worthwhile target for the attackers. And it is secured by exactly those algorithms that rely on constant timing behavior of processors in order to preserve the secrecy of the keys.

Constant-time instructions and processor optimizations

Posted Feb 4, 2023 19:06 UTC (Sat) by Sesse (subscriber, #53779) [Link] (3 responses)

> Since when does SSL encryption happen inside a Javascript VM?

TLS encryption happens in browsers, of which there are two relevant browser engines on Linux, both having security teams perfectly capable of flipping such a mode when relevant, and whose users run them almost exclusively on the newest stable version.

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 3:21 UTC (Sun) by stressinduktion (subscriber, #46452) [Link] (2 responses)

> TLS encryption happens in browsers, of which there are two relevant browser engines on Linux, both having security teams perfectly capable of flipping such a mode when relevant, and whose users run them almost exclusively on the newest stable version.

From reading the article, I understand Intel's design as a switch per process/address-space (either syscall or prctl) to toggle the privileged MSR of the process. Given the open ended coded nature and "leaking" of cryptographic primitives into the code which makes use of it (create crypto object; <other code;> update; <other code;> update; <o.c.> final), this seems hard to do in practice for a cryptographic library without performance tanking due to the additional syscalls. If you try to pair the "DOITM syscalls" to cover longer runs of crypto code, one must implement some kind of counting of how many overlapping cryptographic operations are currently going on...

My guess is that browsers will enable DOITM on Intel anyway (different story for ARM, which feels more sensible designed in this regard).

Also, I would prefer to plug a disk with an older distro with apache+openssl into a new server and have it not be vulnerable immediately. It sounds doable, maybe a flag being used during ELF image activation or whatever?

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 5:45 UTC (Sun) by josh (subscriber, #17465) [Link] (1 responses)

Browsers don't seem like the best use case for "give me more *numeric operation* performance"; HPC is. Video decoding happens in dedicated hardware, graphics happens in dedicated hardware, a bit more *numeric* performance won't make or break your browser.

But if you can toggle a setting and your HPC applications get 5% more performance, or maybe a lot more than that in the future? That seems like a selling point.

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 23:09 UTC (Sun) by Sesse (subscriber, #53779) [Link]

Browsers (including their associated JavaScript engines) use hash tables intensely, and those frequently use a lot of multiplications and divisions/modulo operations. DOITM is not about floating-point operations, which is what HPC is dealing with, it's about integer muls and divs. (And probably eventually also other instructions.)

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 3:46 UTC (Sun) by developer122 (guest, #152928) [Link]

The punishment at this point is essentially non-existent...

...while the damage that can occur from failing to write constant-time code is extensively demonstrated, documented, and painfully remembered.

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 1:48 UTC (Sun) by kunitz (subscriber, #3965) [Link]

I just read the article about timing attacks on Wikipedia. One attack was against RSA and the other against crypt providing info about available user names. Nothing there convinces me to enable DOITM for the whole kernel. You might want to to enable DOITM in kernel crypto code and provide a facility for enabling/disabling it in user space, but why does it need to be enabled for the whole kernel? What is the threat model behind the requirement?

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 7:48 UTC (Sun) by Tov (subscriber, #61080) [Link] (6 responses)

nchars = max(strlen(attempt), strlen(password));

That is probably a bit too naively coded. You loop for the number of characters in the _longest_ string and then beyond the length of the shorter...

C is just hard to get right :-)

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 8:18 UTC (Sun) by zaitseff (subscriber, #851) [Link] (2 responses)

As in, shouldn't it be nchars = min(strlen(attempt), strlen(password)) + 1? :-).

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 9:08 UTC (Sun) by Tov (subscriber, #61080) [Link] (1 responses)

Yes, that would ensure the lengths of the strings are equal, since you then also test the terminating 0. However, that is a bit subtle and a comment would be in place :-)

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 12:06 UTC (Sun) by mpr22 (subscriber, #60784) [Link]

And thus we are reminded why no serious string handling should be done using libc's string functions :)

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 10:00 UTC (Sun) by excors (subscriber, #95769) [Link] (2 responses)

I think the code is fine because the strings are null-terminated. If the strings are different lengths, the loop will eventually read '\0' from the shorter and non-'\0' from the longer and will return false; it won't continue to read past the end of the shorter string.

(Well, "fine" to the extent that any C string handling code can be fine, i.e. you have to think about it quite carefully to make sure there aren't terrible bugs in even the simplest operations.)

Constant-time instructions and processor optimizations

Posted Feb 5, 2023 12:45 UTC (Sun) by Tov (subscriber, #61080) [Link] (1 responses)

Hah! Good point :-)

However, it still leaves me wondering: Is the code subtly clever or just correct by accident? A comment would have been in order.

Terminating at NUL

Posted Feb 5, 2023 16:58 UTC (Sun) by corbet (editor, #1) [Link]

The loop obviously depends on the NUL at the end of the shorter string terminating the loop; that was intentional behavior and, I thought, not particularly subtle. Perhaps I'm showing that I learned C in the early 80's, when such code was commonplace.

Looks like a bad hardware design

Posted Feb 5, 2023 14:22 UTC (Sun) by david.a.wheeler (subscriber, #72896) [Link] (1 responses)

This looks like a bad hardware design.

Timing sensitivity should be part of the thread state that is quietly saved and restored when you switch to a thread, and easily enabled or disabled by a normal thread. If a thread turns it on to go slow, that is the thread's problem... it could also just run a slow algorithm, right?

What am I missing?

Looks like a bad hardware design

Posted Feb 5, 2023 15:06 UTC (Sun) by pizza (subscriber, #46) [Link]

> This looks like a bad hardware design.
> What am I missing?

My understanding is that Intel's implementation is intended to work in the way you described; ie a thread declares itself to be timing sensitive (eg via a one-off syscall that sets an appropriate flag in the thread state) and the kernel manages switching in and out when that thread is scheduled.

It also seems that the intent is for these timing-sensitive threads be as focused/isolated as possible, which is the direction that security-concious software has been taking for some time now as to shrink its attack surface.

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 8:30 UTC (Mon) by ale2018 (subscriber, #128727) [Link] (22 responses)

But why do we counter timing attacks by slowing CPUs to constant time execution? It is obvious from the password comparison example that, as many observed, if the algorithm is not designed to avoid these attacks, slowing the CPU is useless.

Rather than modifying old code, it seems much better to add a random tarpit whenever the authentication fails. Adding a line after the algorithm call is easy and safe.

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 9:56 UTC (Mon) by Wol (subscriber, #4433) [Link]

> But why do we counter timing attacks by slowing CPUs to constant time execution?

Because, as others have noted, constant time is a pre-requisite for security.

> if the algorithm is not designed to avoid these attacks, slowing the CPU is useless.

And constant-time is a pre-requisite for ANY successful defence against timing attacks. Adding random (or not so random) jitter just makes the attacker's life harder, it can't stop a timing attack.

That was the idea behind my "hash guard" - it hides the signal behind a massive amount of noise. But constant time is the only way to destroy the signal.

Cheers,
Wol

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 9:57 UTC (Mon) by marcel.oliver (subscriber, #5441) [Link] (20 responses)

I was just about to post a similar comment... I am not at all an expert in this area, but reading throught the article+comments, it appears that trying to achieve constant-time execution is a game of whack-a-mole. It involves assumptions about the microarchitecture that are ultimately outside of the control of the programmer and may change arbitrarily with any new piece of hardware.

Why not think about this in a fundamentally probabilistic manner? The cryptographic code just needs to produce enough random jitter so the the noise/signal ratio is large enough it does not matter. E.g. by weaving some unrelated computation, such as mixing an uncorrelated stream of data into a RNG entropy pool, into or close to the inner loop of the cryptographic code. This could be measured at startup time, where some timing tests would determine the timing jitter of repeated invocations with known data (good) vs. the timing differences between different data streams (bad), then compute a noise/signal ratio large enough to mask the latter. If it's carefully tuned, it should not be worse than forcing constant (thus slowest-case) timing for all operations, but could also adapt to the level of paranoia (e.g. is this a scenario where an attacker controls the data, or do I only need to defend against passive observers, how valuable an asset am I trying to protect, etc.).

I understand that this does not solve the problem with legacy code, but moving into the future this appears more safe and more flexible, especially given that hardware optimizations of this kind are likely the primary strategy to squeeze more performance out of existing architectures.

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 11:31 UTC (Mon) by matthias (subscriber, #94967) [Link] (13 responses)

This totally ignores that cryptographic code is performance critical. There is the trend to encrypt all storage (disk encryption) and all communications (tls). Making this much slower in order to defeat a possible attack vector that just exists because of some optimizations that make hardware a bit faster seems kind of backwards.

It is not just the login procedure that would be slowed down by a fraction of a second. It is every disk access, all network communication and possibly a lot more that would be impacted. And to defeat the possible correlation of several tries, you need to add quite a lot of noise.

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 11:47 UTC (Mon) by mb (subscriber, #50428) [Link] (12 responses)

>This totally ignores that cryptographic code is performance critical. There is the trend to encrypt all storage
>(disk encryption) and all communications (tls). Making this much slower in order to defeat a possible attack
>vector that just exists because of some optimizations that make hardware a bit faster seems kind of backwards.

Disabling the technology from the article is not about making crypto slower.
It's about not making certain parts of crypto faster (by making them non-const time).

>It is every disk access, all network communication and possibly a lot more that would be impacted.

What attack vector do you see for disk encryption?
Yes, maybe you could extract a tiny part of the cipher's lowest level key stream by injecting data patterns and measuring the encryption times.
But where do you go from there?
How do you compromise the master key? How do you decrypt more data then what already have access to?

Could we please first have a plausible explanation about wide ranging or universal attack vectors, before we degrade performance for everybody by disabling this feature?

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 12:14 UTC (Mon) by matthias (subscriber, #94967) [Link] (11 responses)

> Disabling the technology from the article is not about making crypto slower. It's about not making certain parts of crypto faster (by making them non-const time).

Have you even looked at the comment, which I have replied to? This was about making crypto much slower in order to mitigate the security implications of the technology described in the article.

> Could we please first have a plausible explanation about wide ranging or universal attack vectors, before we degrade performance for everybody by disabling this feature?

It might be too late if we wait for exploits. The possible attack vectors are clear. Whether they can actually be exploited is not so clear. However, the bad guys are usually quite creative when it comes to exploiting weaknesses.

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 12:20 UTC (Mon) by mb (subscriber, #50428) [Link]

>This was about making crypto much slower in order to mitigate the security implications of the
>technology described in the article.

Ok.
I misunderstood then.
Making it even slower than it is without this optimization obviously is a stupid idea.

>It might be too late if we wait for exploits.

You should immediately shut down your PC, because there are most likely open security vulnerabilities.
It might be too late, if we wait for exploits.

>The possible attack vectors are clear.

Not to me. Could you explain?

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 18:11 UTC (Mon) by lurk546 (guest, #17438) [Link] (9 responses)

As some one else commented, it's about a lot more than just crypto. The devil's always in the details, and in my opinion no one knows all the details about what all needs to be protected, or how bad the upcoming vulnerabilities are going to be. We just know that the timing of some operations are going to be more dependent on the data being processed.

Browser session cookies are likely one of many examples showing it's not just about the crypto. You can argue that one really shouldn't try to do anything secure in a browser that runs code from external websites, but it's commonly done.

I think it's much easier to identify and mark areas where things don't need to be secure (such as HPC), than where things might need to be secure.

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 19:00 UTC (Mon) by mb (subscriber, #50428) [Link]

>Browser session cookies are likely one of many examples showing it's not just about the crypto

Can you please explain how such an attack using the cpu feature from the article would look like?

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 21:03 UTC (Mon) by farnz (subscriber, #17727) [Link]

It's pretty much only about crypto, and not about anything else. In particular, nothing else has taken the time and effort to identify a timing side channel and close it on any assumptions whatsoever.

The crypto case is special because the cryptography community has gone to some effort to ensure that the time taken to perform primitive cryptography operations is independent of the key - and thus that an attacker in possession of a machine that will convert an attacker-supplied plaintext into a ciphertext cannot use that machine to learn the key - the best you can do with the machine is brute-force to find a matching plaintext for a supplied ciphertext or vice-versa.

It's also worth noting that the goal of constant-time cryptographic operations is not to prevent all information leakage; rather, it's to prevent the attacker from acquiring new information that they did not possess before they attempted their attack. If you have a machine that will say "yes, that is the correct password" or "no, that is the wrong password", then it's fine for it to take time varying with the password you supply it - it is not fine for it to take time varying with the password it's checking against.

The problem is that, absent DOIT mode, there will be no way to have any operations take constant time regardless of the value of the operands. This is a problem if you're trying to build a system whose time taken does not vary with the value of the key; if all operand values can make the time taken vary, then you can't avoid a timing side channel that reveals information about the key. DOITM fixes this, by allowing you to go into a special mode that forces ALU operations to take constant time, instead of going faster if the values are in certain ranges; you can then use this special mode when doing operations involving the key (or data derived from the key), so that the timing of operations is not correlated with the value of the key.

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 22:56 UTC (Mon) by Sesse (subscriber, #53779) [Link] (6 responses)

> The devil's always in the details, and in my opinion no one knows all the details about what all needs to be protected, or how bad the upcoming vulnerabilities are going to be. We just know that the timing of some operations are going to be more dependent on the data being processed.

Why are you talking about “upcoming vulnerabilities”? It's documented on Intel's pages which instructions' timings are based on the DOIT mode (https://www.intel.com/content/www/us/en/developer/article...); they are all integer multiplication instructions except for VPLZCNT*. This isn't some embargoed CPU vulnerability where we don't know what's ahead.

Constant-time instructions and processor optimizations

Posted Feb 7, 2023 0:32 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

As an aside, you linked the list of instructions that might have data-dependent timing even in DOIT mode if MXCSR is set to the "wrong" value - https://www.intel.com/content/www/us/en/developer/article... lists all the instructions where Intel might have data-dependent timings if DOIT mode is not enabled, and it includes some you wouldn't expect, like AESENC and AESDEC.

Constant-time instructions and processor optimizations

Posted Feb 7, 2023 9:01 UTC (Tue) by WolfWings (subscriber, #56790) [Link] (1 responses)

You're reading the table backwards.

The table "Documented Data Operand Independent Timing Instructions" if the "MCDT" column is false means that switching the MXCSR in question does NOT change the timing, meaning they're DOIT regardless.

That's a table of every single instruction they could list, with an explicit "does DOIT matter?" column. Only the parallel multiplies and leading-zero-count instructions are currently subject to DOIT.

Constant-time instructions and processor optimizations

Posted Feb 7, 2023 9:45 UTC (Tue) by excors (subscriber, #95769) [Link]

> The table "Documented Data Operand Independent Timing Instructions" if the "MCDT" column is false means that switching the MXCSR in question does NOT change the timing, meaning they're DOIT regardless.

I believe you're reading it wrong too. MXCSR is not the main DOIT flag; it's more of an erratum that you should be aware of (but might choose to ignore because it's quite minor) when you're trying to get data-independent timing.

If you set IA32_UARCH_MISC_CTL[DOITM]=1, then the instructions in that table with MCDT=False will be data-independent on all CPUs.

If you set IA32_UARCH_MISC_CTL[DOITM]=1, and either your CPU is new enough to have CPUID.(EAX=7H,ECX=2):EDX[5]=1 or it's one of the older CPUs listed by Intel or you set MXCSR=0x1fbf, then all the instructions in that table will be data-independent.

> That's a table of every single instruction they could list

It's not - it's missing loads of instructions, notably any DIVs as well anything floating-point, plus a bunch of miscellaneous ones like BSR and BSWAP etc. For anything not in that table, Intel is making no guarantees about its timing.

Constant-time instructions and processor optimizations

Posted Feb 7, 2023 0:47 UTC (Tue) by excors (subscriber, #95769) [Link] (2 responses)

I think you're reading that page wrong. The instructions listed under "Instructions That May Exhibit MCDT [MXCSR Configuration Dependent Timing] Behavior" (which are almost all vector integer multiplies) aren't the instructions that are affected by DOIT. Those are the instructions that may have (very slightly) data-dependent timing on certain CPUs even when DOIT is enabled. (That can be mitigated by modifying MXCSR to set a bunch of SSE status flags, for unclear reasons.)

There's a separate list of the instructions which are guaranteed to be data-independent under DOIT: https://www.intel.com/content/www/us/en/developer/article...

As far as I can see, there's no list of instructions which are known to be *not* data-independent without DOIT. Intel wants you to assume that all instructions may be data-dependent if you don't enable DOIT, plus all instructions not on that list even if you enable do DOIT - that still doesn't imply they are, just that Intel doesn't want to constrain all their future CPUs and/or doesn't want to verify all their old CPUs in order to provide any wider guarantees.

Constant-time instructions and processor optimizations

Posted Feb 7, 2023 4:35 UTC (Tue) by draco (subscriber, #1792) [Link] (1 responses)

If you're right about that, then I feel like our computing infrastructure (compilers, now processors) developers have lost the plot in their chase for performance.

If we ignore security in the name of performance, then attackers will take over our computers to run their software instead! Now our performance is **zero**, because our computers are no longer doing what we bought them to do!!

Constant-time instructions and processor optimizations

Posted Feb 7, 2023 10:58 UTC (Tue) by farnz (subscriber, #17727) [Link]

This is the opposite of ignoring security in the name of performance; CPUs have always had data-dependent optimizations (e.g. the variable time multiplies in the 8086 and 68000), which means in turn that it's never been the case that CPUs don't have timing side channels. However, cryptographers have found that by using a subset of operations, you can write code whose timing is not dependent on the value of the cryptographic key; this allows you to close the timing side channel that could be used to deduce the key given enough operations.

Historically, this has been done on an ad-hoc basis; measure the CPU's behaviour, producing something like Agner Fog's instruction timing information. With this information in hand, you can carefully choose operations whose timing is not correlated with information an attacker does not have; note, though, that if Agner has made a mistake putting together their tables, or when a new CPU comes out, you can have picked instructions that are not constant time for a given input source. Thus, every time a new CPU comes out, or whenever someone alleges that you've made a bad choice, you have to recheck everything in painstaking detail, since you're reverse-engineering the CPU's behaviour.

DOITM changes things round - it says that there's an architectural guarantee that certain instructions will execute in constant time relative to some or all of their inputs. As long as you run in this mode, and use only the instructions that are specified as safe, you have a guarantee that not only is your code not vulnerable to timing side channels on current CPUs, but that it will also not be vulnerable on future CPUs - Intel will make sure that DOIT mode does the right thing.

Constant-time instructions and processor optimizations

Posted Feb 6, 2023 12:17 UTC (Mon) by excors (subscriber, #95769) [Link] (5 responses)

> Why not think about this in a fundamentally probabilistic manner? The cryptographic code just needs to produce enough random jitter so the the noise/signal ratio is large enough it does not matter.

I think the fundamental problem is that the attacker can repeat the computation many millions of times, and can use statistics against you. (Or at least the cryptographic primitives should assume they can, because the system would be unpleasantly fragile if the security of low-level operations was heavily dependent on high-level rate limiting etc. Modern cryptographic primitives seem to focus increasingly on robustness, because experience shows they can't rely on their users to meet any subtle security requirements; e.g. modern hashes tend to be inherently immune to length extension attacks because they know people won't pad their inputs properly.)

Say your password comparison function takes 100ns if the nth character is incorrect, or 110ns if the nth character is correct, and then you add some random jitter with distribution N(μ, σ^2). If the attacker takes the average of a million measurements, then the averaged jitter will be N(μ, (σ/1000)^2), and you need to ensure σ/1000 is several times greater than the signal (10ns) to maintain a low SNR, so your jitter is >100x more expensive than the actual computation. And it can still be easily broken if the attacker can do a hundred million measurements.

These numbers are all made up, but (unless I'm calculating very wrong) they indicate the scale of the problem. The constant-time whack-a-mole approach isn't great but it's a lot more practical when performance matters, and it depends on relatively easily testable assumptions about the hardware rather than on assumptions about the security architecture of every system it's used in.

Constant-time instructions and processor optimizations

Posted Feb 9, 2023 13:07 UTC (Thu) by calumapplepie (guest, #143655) [Link] (4 responses)

What if you make the random jitter deterministically calculated from the password and the attempt? Like, wait an amount of time after each operation equal to the lower 8 bits of the SHA hash of "$PASSWORD+$ATTEMPT". The value you get from that will be the same every time you try an attempt, but can't be correlated with that attempt unless you already have the password or have broken the encryption algorithm, which makes the whole problem moot.

I suppose you could get around that by guessing loads of different passwords, and statistically associating the timing between them, but that makes the attack significantly more complicated. Possibly even impossible if the time-dependent being blocked behavior requires that the attempts have a specific form to them. And the more complex the inner loop of the timing attack needs to be (ie, generating lots of different attempts rather than trying the same one), the more likley that it will have data-dependent timing, which will in turn block it from spotting data-dependent timing of its own.

Constant-time operations will obviously still be faster, but adding this delay as a step in any especially exposed timing observation points could make any incidental or minor data-dependent time behavior outside of the core algorithms unusable for attackers, and would protect against bugs in said constant-time code.

Constant-time instructions and processor optimizations

Posted Feb 9, 2023 14:44 UTC (Thu) by excors (subscriber, #95769) [Link]

That sounds the same as https://lwn.net/Articles/922380/ , which (as discussed there) I think is useless for the password-comparison example - the attacker can just add a different suffix each.

Constant-time instructions and processor optimizations

Posted Feb 9, 2023 16:41 UTC (Thu) by farnz (subscriber, #17727) [Link] (2 responses)

The value you get from that will be the same every time you try an attempt, but can't be correlated with that attempt unless you already have the password or have broken the encryption algorithm, which makes the whole problem moot.

The problem is that the assumption under which constant time code is written is that the attacker knows everything but the secret key. So the attacker knows the algorithm in use, including the fact that your wait time is equal to the lower 8 bits of the SHA hash of "$PASSWORD+$ATTEMPT" - the only thing they don't know is $PASSWORD.

They're then doing statistical analysis, where they know $ATTEMPT, and they're looking at the change in time it takes to run the algorithm; that change is correlated with $PASSWORD, and thus they can deduce $PASSWORD based on the different times it takes for an attempt.

The threat model constant-time computation protects you against is an attacker who's stolen a copy of your password validation source code, and has managed to get the checker and password database into a restorable container - there's no way to do an attempt counter (because the attacker restores the container after each attempt, resetting the attempt counter), and the last line of defence is that the time taken to fail is purely a function of $ATTEMPT + NOISE, not of $ATTEMPT + $PASSWORD + NOISE. As soon as the time taken to fail an attempt is a function of $ATTEMPT + $PASSWORD + NOISE, the attacker can do statistical analyses that allow them to remove $ATTEMPT and NOISE, leaving only $PASSWORD in the data.

Constant-time instructions and processor optimizations

Posted Feb 21, 2023 22:42 UTC (Tue) by calumapplepie (guest, #143655) [Link] (1 responses)

It was my understanding that a hash could not be connected, even statistically, to its arguments: that even knowing SHA1($PASSWORD+$ATTEMPT), and being able to make arbitrary $ATTEMPTs, you still could not even constrain $PASSWORD to be more likley to have a 1 or a 0 as the first digit.

Also, I thought constant-time was more about over-network attacks, where you don't have the (encrypted) password database: if you have the database, then any time dependence in the checker would need be derived from the encrypted passwords, since if the decrypted password is acquired at any point by the validation (a prerequisite for timing to depend on it), then the attacker can just... do that step to get the decrypted password.

Constant-time instructions and processor optimizations

Posted Feb 22, 2023 11:40 UTC (Wed) by farnz (subscriber, #17727) [Link]

While constant-time is about over the network attacks, the reason for defining the threat model as I defined it is that this gives you a maximally sophisticated attacker - they have the source code, they have the hashed password database, they have control of the environment the password checker runs in. What they do not have is the decrypted password - and nor does the system that's validating passwords, since to do so, it has to have the password in plain text.

As a result, I assumed that when you wrote $PASSWORD, you meant the hashed password, which the attacker is expected to possess, and not the user's plain text password. If you store the user's plain text password, then I can bypass the cryptography completely by stealing that instead.

Enough of de-optimizations!

Posted Feb 7, 2023 6:22 UTC (Tue) by wtarreau (subscriber, #51152) [Link] (1 responses)

I'm really getting sick of all these attempts at ruining optimization efforts for everyone just because *some* users run in hostile environments.

Originally "PC" meant "personal computer", i.e. you're alone on it, it's *your* PC, you do whatever you want with it.

Nowadays there's always someone to tell you "be careful about speculation attacks which might allow someone in a concurrent VM on your machine to read your VM's memory" except I don't care about such a case, particularly when I'm developing and that my build time is essential to me, and I'm not running anything in a cloud (and no, I don't care either about the small risk of browser-induced timing attacks, there are so many more dramatic failures in a browser that these ones are rare).

Likewise we used to hear "it's really time to disable hyperthreading". In fact cloud vendors took a smarter approach, they're always selling you the vCPUs by pairs so that you get both threads of the same core.

And it seems that the new joke is going to be "please disable your CPU's optimizations, someone could possibly guess the private key you're generating". And what if I don't care because I'm not producing private keys all the day and instead am interesting in my CPU prefetching data faster and anticipating results to build faster ? In addition, while we're still having fun protecting ourselves against low-level attacks, the vast majority of malware continues to spread via the good old methods consisting in clueless users clicking links in HTML emails and starting infected executables that encrypt all their data... I definitely think we're trying hard to plug a tiny hole leaking water droplets next to the Niagara falls when it comes to end-user machines.

I would prefer to have a global kernel parameter "ultra-secure" that turns on every protection for those running VMs in shared environments and that the vast majority of others don't care about. Maybe at one point they will figure that such environments have become so slow and insecure that it was a bad idea from the start to make sensitive code operate on private data on someone else's CPU and RAM, which is first and foremost the starting point of all of this...

Enough of de-optimizations!

Posted Feb 7, 2023 7:51 UTC (Tue) by mb (subscriber, #50428) [Link]

Very well said.

And in addition to that I think people start to make up threat scenarios that are based on nothing.

Oh look, there's your sensitive data! You wouldn't want that to come into control of the bad guys! Hurry up, disable $Feature system wide, before it's too late!

That is just like snake oil selling.
That is the same FUD mechanism that anti virus vendors use (successfully; as in their business) on Windows.

It is good that Intel thought about timing attacks this time around and gave us a switch to control timing.
There is nothing scary and absolutely nothing wrong with that concept.
Maybe they could have been doing better with the switch implementation, but security wise I don't think this will break anything that's not already broken or is already known to need fixes for this.

Constant-time instructions and processor optimizations

Posted Feb 9, 2023 12:47 UTC (Thu) by calumapplepie (guest, #143655) [Link] (4 responses)

"Insecure by default" is probably the way to go for this. If you want security, you're likely willing to to trade it off for speed, and make an additional syscall, running all crypto code in a separate thread, etc. If you want performance, adding the cost of a syscall to every thread to disable the optimizations is a cost you don't want to pay.

Does this mean existing constant-time code gets hit? Sure. But constant time code is only in security-critical code on security-critical systems. If the locations of that code isn't already known, and the programs that use that code being regularly upgraded, then it is all-but certain that they are already vulnerable to some exploit. Perhaps not a timing side channel, but how old of a version of OpenSSL can you use without running into some exploit?

It doesn't make sense to slow down all code in the name of preventing timing side channels in code that is out-of-date and thus already likely has serious bugs.

Constant-time instructions and processor optimizations

Posted Feb 9, 2023 13:22 UTC (Thu) by Wol (subscriber, #4433) [Link] (3 responses)

> But constant time code is only in security-critical code on security-critical systems.

In other words, at least all business laptops? Which probably outnumber all other laptops?

There's too much "not my problem" here. It will be your problem if my work laptop is cracked and used to exfiltrate your credit cards, your NHS number, all the other sensitive info my employer holds on you ...

Cheers,
Wol

Constant-time instructions and processor optimizations

Posted Feb 9, 2023 14:43 UTC (Thu) by pizza (subscriber, #46) [Link]

> In other words, at least all business laptops? Which probably outnumber all other laptops?

Hardly. All that most laptops these days get used for ("business" or otherwise) is a container to run Chrome. Granted, "business" laptops also tend to run all their traffic over a proprietary VPN tunnel and have five different "security" suites snooping/slowing everything down to a crawl..

Constant-time instructions and processor optimizations

Posted Feb 9, 2023 16:39 UTC (Thu) by mb (subscriber, #50428) [Link]

>It will be your problem if my work laptop is cracked and used to exfiltrate your credit cards,
>your NHS number, all the other sensitive info my employer holds on you ...

And can you give an explanation of how that could actually happen in the real world?
What is the mechanism you are thinking of?

>There's too much "not my problem" here.

There's too much FUD here.

Constant-time instructions and processor optimizations

Posted Feb 21, 2023 22:36 UTC (Tue) by calumapplepie (guest, #143655) [Link]

Read the rest of the paragraph: I'm not saying that there isn't several layers of what I would consider to be security-critical constant-time code involved in the processing and sending of this message to your laptop from mine. I'm saying that that security-critical constant-time code is located in areas that are known to be security-critical, and are thus receiving regular updates.

The problem with this change to the chip is that existing compiled code will need to be updated, and source will need to have some new annotations sprinkled in. But if the source is not being actively maintained for those updates to be done, and if the compiled binaries aren't receiving security updates, for the amount of time that will need to pass between now and when these processors start appearing even in new machines, then we can assume that it is already compromised.


Copyright © 2023, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds