|
|
Log in / Subscribe / Register

Another round of speculative-execution vulnerabilities

Another round of speculative-execution vulnerabilities

Posted Aug 9, 2023 8:07 UTC (Wed) by Wol (subscriber, #4433)
In reply to: Another round of speculative-execution vulnerabilities by flussence
Parent article: Another round of speculative-execution vulnerabilities

Maybe all this speculative execution stuff is a response software and chip design going down a wrong path and chasing a local minimum at the top of a mountain ...

From what I can make out, modern CPUs are "C language execution machines", and C is written to take advantage of all these features with optimisation code up the wazoo.

Get rid of all this optimisation code, get rid of all this speculative silicon, start from scratch with sane languages and chips, ...

Sorry to get on the database bandwagon again, but I would love to go back 20 years, when I worked with databases that had snappy response times on systems with the hard disk going nineteen to the dozen. Yes the programmer actually has to THINK about their database design, but the result is a database that can start spewing results instantly the programmer SENDS the query, and a database that can FINISH the query faster than an RDBMS can optimise it ...

Cheers,
Wol


to post comments

Another round of speculative-execution vulnerabilities

Posted Aug 9, 2023 8:49 UTC (Wed) by motk (guest, #51120) [Link]

Transputers redux!

Another round of speculative-execution vulnerabilities

Posted Aug 9, 2023 9:05 UTC (Wed) by eduperez (guest, #11232) [Link] (15 responses)

I used to code on a 8-bit processor, running at (almost) 4MHz; we had a 64KB memory map, but 16KB of those were ROM, and part of the remaining 48KB where reserved for the screen memory, printer buffer and other uses. We counted and optimized the processor cycles required to execute each instruction and routine, and memory usage was counted on bytes.

However, those times have long passed away, and there is no use in trying to bring them back. Except for some very specific use cases, it is way cheaper to buy a faster machine, than spend hours upon hours optimizing the code; all that counts is the "return on investment".

You just cannot keep the optimization and attention to detail leves of the past, with the development speed and costs required by the modern world.

Another round of speculative-execution vulnerabilities

Posted Aug 9, 2023 13:33 UTC (Wed) by butlerm (subscriber, #13312) [Link] (7 responses)

Modern development techniques for web applications in particular have contributed to making application response time much worse than it used to be and orders of magnitude beyond what it was on much slower systems forty years ago. Wondering what went wrong and why no one seems to care if their page takes five or ten seconds to update is a relevant inquiry.

Another round of speculative-execution vulnerabilities

Posted Aug 9, 2023 13:59 UTC (Wed) by yodermk (subscriber, #3803) [Link] (5 responses)

Yep, I often imagine a world where a large web application is written in Rust and runs as a single process on a single, vertically-scaled server. This bucks the "everything is a microservice" trend in a big way. But think of the benefits -- nearly every request could be served from in-memory in a single process. No Redis, no reaching out to other services for most things. Only requests that needed to result in durable, committed storage would have a slight delay. Besides that, operationally it would be dirt simple.

Main drawback is upgrades would require at least a bit of downtime. But, done right, it would be quite brief. The in-process caches would need to warm, though. The other drawback is the absolute need to be sure that no part of the system can crash under any circumstances. But Rust goes a long way in helping you there.

I'm learning Axum (a backend framework for Rust) and hope to be able to implement something like this someday.

Another round of speculative-execution vulnerabilities

Posted Aug 24, 2023 6:13 UTC (Thu) by ssmith32 (subscriber, #72404) [Link] (4 responses)

I dunno. Networks are pretty snappy compared to some of the systems discussed here - I would think a bunch of Rust microservices running on bare metal that correctly used HTTP would do fine. Vertical scaling is exactly what got us into the speculative execution mess. It's vertical scale to support the many many layers of abstraction to support whatever-the-hot language is to write a simple microservice.

But if you keep the services simple - why bother with all the abstraction? Give them full control, and make them fast.

The real troublemaker is not microservices or distributed systems - it's hosting providers wanting to resell the same time on the same hardware over and over again.

Another round of speculative-execution vulnerabilities

Posted Aug 24, 2023 22:32 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

FWIW, AWS doesn't share CPUs between multiple customers, except on a couple of very cheap instance types (T2 and T3 instances). I believe the same goes for Azure.

Another round of speculative-execution vulnerabilities

Posted Aug 25, 2023 9:58 UTC (Fri) by farnz (subscriber, #17727) [Link] (2 responses)

Couple of questions:

  1. Is this documented by AWS anywhere? I can't find it in their official documentation, and the instance types documentation just says "Each vCPU on non-Graviton-based Amazon EC2 instances is a thread of x86-based processor, except for T2 instances and m3.medium.", which implies that two vCPUs assigned to different customers can be on the same core, just not using the same thread.
  2. How is the "each CPU core can only be used by one customer" enforced? Is it just relying on the kernel rarely migrating actively used vCPU threads between hardware threads, or is there scheduler affinity etc applied to enforce it?

Another round of speculative-execution vulnerabilities

Posted Aug 25, 2023 19:27 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

AWS documentation is a mess, but it's documented: https://docs.aws.amazon.com/whitepapers/latest/security-d...

> Fixed performance instances, in which CPU and memory resources are pre-allocated and dedicated to a virtualized instance for the lifetime of that instance on the host

FWIW, this design has been used from the very beginning. Even with the old Xen-based hypervisor, there was very little sharing of resources between customers. AWS engineers anticipated that the hardware might have issues allowing the state to be leaked between domains, so they tried to minimize the possible impact.

> How is the "each CPU core can only be used by one customer" enforced? Is it just relying on the kernel rarely migrating actively used vCPU threads between hardware threads, or is there scheduler affinity etc applied to enforce it?

CPUs are allocated completely statically to VMs. The current Nitro Hypervisor is extremely simplistic, and it is not capable of sharing CPUs between VMs.

Another round of speculative-execution vulnerabilities

Posted Aug 25, 2023 19:33 UTC (Fri) by farnz (subscriber, #17727) [Link]

Thanks for the link - it answers my question in full, and makes it clear that this is something that's architected into AWS. And yes, AWS documentation is a mess - it looks like I didn't find it because I wasn't looking at AWS whitepapers, but at EC2 documentation.

I had hoped that it worked the way you describe, because nothing else would meet my assumptions about how security on this would work, but I have had enough experience to know that when security is involved, hoping that people make the same assumptions as I do is a bad idea - better to see my assumptions called out in documentation, because then there's a very high chance that Amazon trains new engineers to make this set of assumptions.

Another round of speculative-execution vulnerabilities

Posted Aug 11, 2023 0:04 UTC (Fri) by khim (subscriber, #9252) [Link]

> Wondering what went wrong and why no one seems to care if their page takes five or ten seconds to update is a relevant inquiry.

That one is easy. There's just no one left who may care.

Everyone is trying to solve their own tiny, insignificant task. And the fact that when all these non-solutions to non-problems, when combined, create something awful… who may even notice that, let alone fix that? Testers? They are happy if they have time to look on the pile of pointless non-problems in the bugtracker! Users? They are not the ones who pay for the software novadays. Advertisers do that and they couldn't care less about what users experience.

Another round of speculative-execution vulnerabilities

Posted Aug 9, 2023 16:09 UTC (Wed) by Wol (subscriber, #4433) [Link] (4 responses)

> You just cannot keep the optimization and attention to detail leves of the past, with the development speed and costs required by the modern world.

Which language has the motto "if you make the right thing to do, the easy way to do it, then people will do the right thing by default".

Going back to one of my favourite war stories, where the consultants spent SIX MONTHS optimising an Oracle query so it could outperform the Pick system it was replacing. I'm prepared to bet that Pick query probably took about TEN MINUTES to write. (And the Oracle system, a twin Xeon 800, was probably 20 times more powerful than the Pentium 90 it was replacing!)

Pick "tables" are invariably 3rd or 4th normal form, because that's just the obvious, easy way to do it. Sure, you have to specify all your indices, but if you put an index on every foreign key, you've pretty much got everything of any importance - a simple rote rule that covers 99% of cases. (And no different from relational, you tell Pick it's (probably) a foreign key by indexing it, you tell an RDBMS to index it by telling it it's a foreign key. A distinction without a difference.)

Oh - and if the modern world requires horribly inflated development speeds and costs, that's their hard cheese. With your typical RDBMS project coming in massively over time and budget, surely going back to a system where the right thing is the obvious thing will massively improve those stats! Most of my time at work is spent debugging SQL scripts and Excel formulae - that's why I want to get Scarlet in there because, well, what's the quote? "Software is either so complex there are no obvious bugs, or so simple there are obviously no bugs, guess which is harder to write." Excel and Relational are in the former category, Pick is in the latter. More importantly, Pick actually makes the latter easy!

Cheers,
Wol

What is Pick and Scarlet?

Posted Aug 10, 2023 5:58 UTC (Thu) by fredrik (subscriber, #232) [Link] (1 responses)

@Wol You've mentioned Scarlet before, do you have a link where I can learn more about it?

Ditto for Pick, what is it, link? Thanks!

What is Pick and Scarlet?

Posted Aug 10, 2023 8:07 UTC (Thu) by Wol (subscriber, #4433) [Link]

@fredrik

https://github.com/geneb/ScarletDME

https://en.wikipedia.org/wiki/Pick_operating_system

Google groups openqm, scarletdme, mvdbms, u2-users, I guess there are more ...

Go to the linux raid wiki to get my email addy, and email me off line if you like ...

Pick/MV is like Relational/SQL - there are multiple similar implementations.

Cheers,
Wol

Another round of speculative-execution vulnerabilities

Posted Aug 11, 2023 0:17 UTC (Fri) by khim (subscriber, #9252) [Link] (1 responses)

> With your typical RDBMS project coming in massively over time and budget, surely going back to a system where the right thing is the obvious thing will massively improve those stats!

How would that work? Let's consider three most important stats (in the increasing order of importance):

  1. Amount of money in pockets of developers — how would that increase that?
  2. Amount of money management can stash in their pockets — how would that increase that?
  3. Amount of money CEO may get from bank loans — how would that increase that?
> More importantly, Pick actually makes the latter easy!

But where is the money in all that?

Another round of speculative-execution vulnerabilities

Posted Aug 11, 2023 0:48 UTC (Fri) by Wol (subscriber, #4433) [Link]

I'm very naive :-) , but I think you knew that :-)

Cheers,
Wol

Another round of speculative-execution vulnerabilities

Posted Aug 10, 2023 15:55 UTC (Thu) by skx (subscriber, #14652) [Link] (1 responses)

You sound very much like a ZX Spectrum user! I have similar memories and experiences.

I have a single-board Z80-based system on my desk, running CP/M, these days. I tinker with it - I even wrote a simple text-based adventure game in assembly and ported it to the spectrum.

But you're right, those days are gone outside small niches. Having time and patience to enjoy the retro-things is fun. But it's amazing how quickly you start to want more. (More RAM, internet access, little additions that you take for granted these days like readline.)

Another round of speculative-execution vulnerabilities

Posted Aug 11, 2023 5:47 UTC (Fri) by eduperez (guest, #11232) [Link]

> You sound very much like a ZX Spectrum user!

Yes, that was my fist "computer", back when I was fourteen.

Another round of speculative-execution vulnerabilities

Posted Aug 9, 2023 10:02 UTC (Wed) by roc (subscriber, #30627) [Link] (1 responses)

You have to speculate heavily to get high single-thread performance, and single-thread performance will always matter because of Amdahl's Law.

Some people commenting here claim they'd be happy with much lower performance. That's fine, but most people find some Web sites and phone apps useful, and those need high single-thread performance.

Another round of speculative-execution vulnerabilities

Posted Aug 11, 2023 0:23 UTC (Fri) by khim (subscriber, #9252) [Link]

> That's fine, but most people find some Web sites and phone apps useful, and those need high single-thread performance.

Nope. Not even close. Web sites would be equally sluggish no matter how many speculations your CPU does simply because there are no one who may care to make them fast.

If speculations would have been outlawed 10 or 20 years ago and all we had would have been fully in-order 80486 100MHz… they would have worked with precisely the same speed they work today on 5GHz CPUs.

The trick is that it's easy to go from sluggish website on 80486 100MHz device to sluggish web site on 5GHz device, but it's not clear how you can go back and if that's even possible at all.

Another round of speculative-execution vulnerabilities

Posted Aug 9, 2023 18:55 UTC (Wed) by bartoc (guest, #124262) [Link] (19 responses)

I doubt it. The reason all this dynamic optimization stuff works so well isn't because c is somehow a bad language but because real workloads are rather unpredictable and the correct optimization decisions depend on stuff that happens between when a program is compiled and when a particular instruction executes. This'll always be true in any system that does many things at the same time, when those things are determined by many different people at different times.

It's not at all clear what the "better" alternative to C is either, without sacrificing a ton of usability. Sure rust is "better" than C, but ultimately it shares the same fundamental execution model. One could argue GLSL/WGSL/HLSL/etc, but the things that those languages lack from the C execution model (mutual recursion, an ABI, a stack to which registers can be spilt, etc) are seen as things holding them back, precicely because those things make shader languages less dynamic than C, and thus require absolute explosions of up front code generation, with all the compile time and I$ issues that brings.

Another round of speculative-execution vulnerabilities

Posted Aug 9, 2023 20:51 UTC (Wed) by Wol (subscriber, #4433) [Link]

> The reason all this dynamic optimization stuff works so well isn't because c is somehow a bad language but because real workloads are rather unpredictable and the correct optimization decisions depend on stuff that happens between when a program is compiled and when a particular instruction executes.

The problem with C isn't that real workloads are unpredictable. The problem with C is that the language behaviour is undefined and unpredictable. If you're writing something simple, there's not much difference between languages. Except that few problems are simple, C gives you very little help to cope, and indeed it's full of landmines that will explode at every opportunity.

Writing bug-free code in C is much harder than most other languages ...

Cheers,
Wol

Another round of speculative-execution vulnerabilities

Posted Aug 10, 2023 9:40 UTC (Thu) by anton (subscriber, #25547) [Link] (17 responses)

Actually speculative execution works so well because it turns out that a lot of execution is very predictable. It's so predictable that the branch predictor has an accuracy of ~99% (depending on the application). This means that the instruction fetcher can fetch ahead for hundreds of instructions, and the OoO execution engine can execute these instructions ahead in an order determined by the data dependencies, not otherwise by the program order. This allows modern CPUs to complete (i.e., execute) several instructions per cycle.

I don't see that this has much to do with the programming language. Rust is as vulnerable to Spectre and Downfall as C is AFAICS. The only influence I see is that for a language like JavaScript that always bounds-checks array accesses, you have an easier time adding Spectre-V1 mitigations. But for Rust, which tries to optimize away the bounds-check overhead, you end up either putting in Spectre-V1 mitigation overhead (can this be done automatically?), slowing it down to be uncompetetive with C, or it is still Spectre-V1 vulnerable. Admittedly adding mitigations cannot be done automatically in C, because the compiler has no way to know the bounds in all cases.

The way to go is that the hardware manufacturers must fix (not mitigate) Spectre. They know how to avoid misspeculated->permanent state transitions for architectural state (Zenbleed is the exception that proves the rule), now apply it to microarchitectural state!

Another round of speculative-execution vulnerabilities

Posted Aug 10, 2023 11:52 UTC (Thu) by excors (subscriber, #95769) [Link] (2 responses)

Maybe it's more a combination of control flow being predictable and memory latency being unpredictable. Compilers (AOT and JIT) can have a go at predicting control flow using PGO, but I suspect it's largely impossible for them to predict which memory loads will come from L1 (~5 cycles) and which will come from RAM (~300 cycles) as that depends heavily on the dynamic state of the caches. That unpredictable latency prevents a compiler from doing good instruction scheduling by itself, so it has to rely on the CPU doing dynamic scheduling that can adapt to the actual latency of every single memory access. And the CPU can do that because the highly predictable control flow means that while it's waiting for RAM, it can speculatively gather hundreds of instructions to reschedule and execute out of order.

If memory latency is predictable then I think it's much easier for the compiler to statically schedule the instructions, and the CPU can be much simpler while maintaining decent performance. But that only seems practical with very small amounts of memory (e.g. microcontrollers with single-cycle latency to SRAM, but only hundreds of KBs) or very large numbers of threads (e.g. GPUs where each core runs 128 threads with round-robin scheduling, so each thread has 128 cycles between consecutive instructions in the best case, which can mask a lot of memory latency), not for general-purpose desktop-class CPUs.

Another round of speculative-execution vulnerabilities

Posted Aug 10, 2023 15:14 UTC (Thu) by farnz (subscriber, #17727) [Link]

Even with PGO, control flow still has largely unpredictable regions, which depend upon the details of user input, and can only be predicted at compile time if the exact input the user will use is provided at compile time. This was one component of why Itanium's EPIC never lived up to its performance predictions; as compilers got better at exploiting compile-time known predictability, they also benefited OoO and speculative execution machines, which could exploit predictability that only appears at runtime.

For example, in a H.264 encoder or decoder, black bars are going to send your binary down a highly predictable codepath doing the same thing time and time again; your PGO compiled binary is not going to be set up on the assumption of black bars, because that's just one part of the sorts of input you might get. However, at runtime, the CPU will notice that you're going down the same codepath over and over again as you handle the black bars, and will effectively optimize further based on the behaviour right now. Once you get back to the main picture, it'll change the optimizations it's applying dynamically, because you're no longer going down that specific route through the code.

Another round of speculative-execution vulnerabilities

Posted Aug 10, 2023 16:03 UTC (Thu) by anton (subscriber, #25547) [Link]

Compilers (AOT and JIT) can have a go at predicting control flow using PGO
Profile-based static prediction has ~10% mispredictions, while modern history-based hardware branch prediction has about 1% mispredictions (for real numbers check the research literature, but the tendency is in that direction; and it's actually hard to compare the research, because static branch prediction research stopped about 30 years ago).

Concerning memory latency, I also see very good speedups of out-of-order over in-order for benchmarks like the LaTeX benchmark which rarely misses the L1 cache.

Also, the fact that the Itanium II designers chose small low-latency L1 caches, while OoO designers went for larger and longer-latency L1 caches (especially in recent years, with the load latency from D-cache growing to 5 cycles in Ice Lake ff.) shows that the unpredictability is a smaller problem for compiler scheduling than the latency.

The dream of static scheduling has led a number of companies (Intel and Transmeta being the most prominent) to spend substantial funds on it. Dynamic scheduling (out-of-order execution) has won for general-purpose computing, and the better accuracy of dynamic branch prediction played a big role in that.

With regard to Spectre and company, compiler-based speculation would exhibit Spectre vulnerabilities just as hardware-based speculation does. Ok, you can tell the compiler not to speculate, but that severely restricts your compiler scheduling, increasing the disadvantage over OoO. Better fix Spectre in the OoO CPU.

Another round of speculative-execution vulnerabilities

Posted Aug 11, 2023 13:26 UTC (Fri) by atnot (guest, #124910) [Link] (13 responses)

> don't see that this has much to do with the programming language. Rust is as vulnerable to Spectre and Downfall as C is AFAICS.

You're thinking much too narrow here in terms of what "C" is in this context. It's has far less to do with the specific syntax and more with the general model of computation that derives from the original PDP11, i.e.:

Programs are a series of commands, whose effects become visible in order from top to bottom. The sequence of these commands can be arbitrarily replaced using a specific command, called a "branch". There is a singular, uniform thing called "memory", which is numbered from zero to infinity and you can create a valid read-write reference to any of it by using that number. And so on.

None of this is true internally for any modern compute device. It isn't even true for C anymore. But it was true for the creators of C, and as a result these assumptions were baked very deeply into the language, then tooling like gcc and LLVM, then languages that use that tooling like Rust, OpenCL, CUDA, and then architectures that wanted to be able to easily targetable by those tools like RISC-V and, most notably, AMD64. (As opposed to it's Itanic cousin). It's so established that people don't even recognize these as specific design choices anymore, it's just "how computers work".

Rust is definitely a step away from C, and one that has at least some potential to improve how chips are designed in the future, if the tooling allows for it. But it's not a very big step in the grand scheme of things.

Another round of speculative-execution vulnerabilities

Posted Aug 11, 2023 22:40 UTC (Fri) by jschrod (subscriber, #1646) [Link] (12 responses)

I wouldn't call this PDP11 semantics - I would call this semantic of any von-Neumann computer architecture that exists nowadays. (I exclude quantum computing from that statement.) In fact, this is the paradigm of the Turing Machine.

That you have statements that are executed from top to bottom, where preconditions, invariants, and postconditions exist, is basically the fundament of theoretical computer science. No proof about algorithmic semantics or correctness would work without that assumption.

If this, in your own words, isn't true any more - can you please point me to academic work that formalizes that new behavior and its semantics?

After all, I cannot believe that theoretical computer scientists have ignored this development. It is a to good opportunity to write new articles for archived journals.

If no computer science work is published on your claim, can you please explain why research is ignoring this development?

Another round of speculative-execution vulnerabilities

Posted Aug 12, 2023 11:50 UTC (Sat) by atnot (guest, #124910) [Link] (11 responses)

> I would call this semantic of any von-Neumann computer architecture that exists nowadays. [...] In fact, this is the paradigm of the Turing Machine.

I reject this outright. There is an absolute world of wonderful compute models in between a turning machine and C or a PDP11. Many of them are von neumann machines, even. This should be clear from the fact that the key problem of the C model is that it's hard to model formally (see e.g. the size of the C memory model specification), while the turing machine was purpose designed for formal modelling.

Let me just give some examples:

For a very soft start, we can look at something like the 6502, which is generally pretty boring apart from treating he first 256 bytes of memory specially. Largely because of this, it is not supported upstream in any of the big C compilers.

Then we can look at something like Itanium, in which bundles of instructions are executed in parallel and can not see the effects of each other, along with things like explicit branch prediction, speculation and software pipelining.

This is actually pretty similar to modern CPUs, except instead of having it explicitly encoded in the instruction stream, they try to re-derive that information at runtime, often by just guessing.

Then we have things like GPUs, which have multiple tiers of shared memory, primarily work with masks instead of branches. Although they are slowly becoming more C-like as people seek to target them with C and C++ code.

There's also a whole bunch of architectures like ARM CHERI and many with memory segmentation, where addresses and pointers are not the same thing.

We can also talk about various other things like lisp machines, Mill, Transmeta, EDGE and many more things I'm forgetting.

Then even further asea, you can find things like FPGAs, which are programmed using a functional specification of behavior much like TLA+. (The current fad is, of course, trying to run C on them, to limited success)

Now if you say "But most of these are all obscure architectures nobody uses", then yes that's the point. It's because they don't look enough like K&R's PDP11. Itanium is far from the only innovative architecture that C killed and as primarily a hardware person, that's deeply frustrating.

Another round of speculative-execution vulnerabilities

Posted Aug 12, 2023 15:05 UTC (Sat) by anton (subscriber, #25547) [Link] (6 responses)

What "C memory model specification" do you mean?

Why should the zero page of the 6502 be a reason not to support the 6502 in "big" C compilers? They can use the zero page like compilers for other architectures use registers (which leave no trace in C's memory model, either). Besides, there are C compilers for the 6502, like cc65 and cc64, so there is obviously no fundamental incompatibility between C and the 6502. The difficulties are more practical, stemming from having zero 16-bit registers, three 8-bit registers, only 256 bytes of stack, no stack-pointer-relative addressing, etc.

Concerning IA-64 (Itanium), this certainly was designed with existing software (much of it written in C) in mind, and there are C compilers for it, I have used gcc on an Itanium II box, and it works. C has not killed IA-64, out-of-order (OoO) execution CPUs have outcompeted it. IA-64, Transmeta and the Mill are based on the EPIC assumption that the compiler can perform better scheduling than the hardware, and it turned out that this assumption is wrong, largely because hardware has better branch prediction, and can therefore perform deeper speculative execution.

And the fact that OoO won over EPIC shows that having an architecture where instructions are performed sequentially is a good interface between software (not just written in C, but also, e.g., Rust) and hardware, an interface that allows adding a lot of performance-enhancing features under the hood.

Concerning Lisp machines, they were outcompeted by RISCs, which could run Lisp faster; which shows that they are not just designed for C. There actually was work on architectural support for LISP in SPUR, and some of it made it into SPARC, but one Lisp implementor wrote that their Lisp actually did not use the architectural support in their SPARC port, because the cost/benefit ratio did not look good.

Concerning GPUs, according to your theory C should have killed them long ago, yet they thrive. They are useful for some computing tasks and not good for others. In particular, let me know when you have a Rust or, say, CUDA compiler or OS kernel (maybe one written in Rust or CUDA) running on a GPU.

Another round of speculative-execution vulnerabilities

Posted Aug 14, 2023 9:50 UTC (Mon) by james (guest, #1325) [Link] (5 responses)

C has not killed IA-64, out-of-order (OoO) execution CPUs have outcompeted it.
It's an interesting theoretical exercise to consider what would have happened if Meltdown and Spectre had been discovered sometime around 2000. Presumably the software workaround for Meltdown would have had to have looked like Red Hat's 4G/4G split, which could:
cause a typical measurable wall-clock overhead from 0% to 30%, for typical application workloads (DB workload, networking workload, etc.). Isolated microbenchmarks can show a bigger slowdown as well - due to the syscall latency increase.
That would have made a big difference to the perceived advantages of Itanium.

Would the conservative and increasingly security-sensitive server world have adopted the position that OoO couldn't be trusted? (Once Itanium was released, Intel would almost certainly have made that part of their marketing message.)

In 2018, when in this timeline Meltdown and Spectre were discovered, the consensus of the security community was that more such attacks would be discovered, and time has sadly proven that to be correct — but we now have no other realistic option but to live with it. We had other options around 2000 — then-current in-order processors (from Sun, for example).

The triumph of OoO looks much more like an accident of history rather than something inherent to computer science to me.

Another round of speculative-execution vulnerabilities

Posted Aug 14, 2023 11:05 UTC (Mon) by anton (subscriber, #25547) [Link] (4 responses)

AFAIK a mitigation for Meltdown was indeed to not share the address space between kernel and user space, leading to TLB flushes on system calls. Intel fixed Meltdown relatively quickly in hardware, and AMD hardware has not been vulnerable to Meltdown AFAIK.

By contrast, neither Intel nor AMD (nor AFAIK ARM or Apple) has fixed Spectre in the more than 6 years since they have been informed of it. This indicated that these CPU manufacturers don't believe that they can sell a lot of hardware by being the first to offer hardware with such a fix (and making it a part of their marketing message). So they think that few of their customers care about Spectre. But if they thought that many customers care about Spectre, they would design OoO hardware without Spectre.

As for IA-64, it has architectural features for speculative loads, and is therefore also vulnerable to Spectre. This vulnerability can probably be mitigated by recompiling the program without using speculative loads (if we assume that the hardware does not perform any speculative execution, it's good enough to perform the speculative load and then not use the loaded data until the speculation is confirmed; for security the speculatively loaded data should be cleared in case of a failed speculation). This mitigation would reduce the performance of Itanium CPUs to be close to the performance of architectures without these speculative features, i.e., even lower than the Itenium performance that we saw.

OoO certainly has other options wrt. Spectre than to live with it. Just fix it. All the OoO hardware designers (the Zen2 designers are the exception that proves the rule) are able to squash speculative architectural state on a misprediction; they now just need to apply the same discipline to speculative microarchitectural state. E.g., if they had squashed the speculative branch predictor state on a miscprediction, there would be no Inception, and if they had squashed the speculative AVX load buffer state on misprediction, there would be no Downfall.

Another round of speculative-execution vulnerabilities

Posted Aug 15, 2023 4:26 UTC (Tue) by donald.buczek (subscriber, #112892) [Link] (3 responses)

> E.g., if they had squashed the speculative branch predictor state on a miscprediction, there would be no Inception

A branch predictor, which isn't allowed to learn, would't that just be a rather useless static branch predictor like "allways probably backwards" or "as hinted by machine code" ?

Another round of speculative-execution vulnerabilities

Posted Aug 15, 2023 11:01 UTC (Tue) by anton (subscriber, #25547) [Link] (2 responses)

What makes you think that this would mean "isn't allowed to learn"? The fact that architectural state is not changed on a misprediction does not mean that architectural state is immutable, either.

A straightforward way is to learn from completed (i.e. architectural) branches, with the advantage that you learn from the ground truth rather than speculation.

If that approach updates the branch predictor too late in your opinion (and for the return predictor that's certainly an issue), a way to get speculative branch predictions is to have an additional predictor in the speculative state, and use that in combination with the non-speculative predictor. If a prediction turns out to be correct, you can turn the part of the branch predictor state that is based in that prediction from speculative to non-speculative (like you do for architectural state); if a prediction turns out to be wrong, revert the speculative branch predictor state to its state when the branch was speculated on (just like you do with speculative architectural state).

Another round of speculative-execution vulnerabilities

Posted Aug 15, 2023 14:26 UTC (Tue) by donald.buczek (subscriber, #112892) [Link] (1 responses)

> If a prediction turns out to be correct, you can turn the part of the branch predictor state that is based in that prediction from speculative to non-speculative (like you do for architectural state); if a prediction turns out to be wrong, revert the speculative branch predictor state to its state when the branch was speculated on (just like you do with speculative architectural state).

Why wouldn't such a branch predictor always give the initial answer? If correct, it would be sensible to stick to it and if wrong, you want to ignore that and revert to the state of the last correct guess or the initial state.

Assuming you want to apply that to binary, taken/not taken branch predictor and not only target branch predictors?

Another round of speculative-execution vulnerabilities

Posted Aug 15, 2023 15:23 UTC (Tue) by anton (subscriber, #25547) [Link]

If the prediction is wrong, you throw away the speculative nonsense (and thus avoid Inception), but you record that the prediction was wrong. I had not written that earlier, sorry.

In more detail: If the mispredicted branch is non-speculative, you record it in the non-speculative predictor. If the mispredicted branch is still in the speculative part of execution (that would mean that you have a CPU that corrects mispredictions out-of-order; I don't know if real CPUs do that), you record it in the speculative part, and when this branch leaves the speculative realm, this record can also be propagated to the non-speculative predictor.

Another round of speculative-execution vulnerabilities

Posted Aug 12, 2023 16:51 UTC (Sat) by farnz (subscriber, #17727) [Link] (3 responses)

Itanium failed to outperform AMD64 on hand-coded assembly as well as on C code. It wasn't killed by the C model, it was killed by a failure to deliver performance greater other CPUs. VLIW CPUs like Transmeta failed because VLIW code is inherently low-density in memory, and our current bottleneck for performance tends to be L1 cache size. Mill has never reached a point where hand-written code in simulation outperforms hand-written code for AMD64 given the same simulated resources as AMD64. EDGE is an ongoing research project, and may (or may not) prove worthwhile - there's certainly not been an effort to build a good EDGE CPU that can be compared to something "C-friendly" like RISC-V.

Similar failures apply to Lisp Machines. While they had dedicated hardware to make running Lisp code faster, they lost out because RISC CPUs like SPARC and MIPS were even faster at running Lisp code for a given energy input than Lisp Machines were. Again, not about programming model, but about the Lisp Machines being worse hardware for running Lisp than MIPS or SPARC.

In terms of competing models of computation that have actually made it to retail sale, FPGAs are a commercial success, but are not programmed like CPUs, because they're defined as a sea of interconnected logic gates, and you are better off exploiting that via a Hardware Description Language than via something like C, FORTRAN or COBOL. GPUs are a commercial success; individual threads on a GPU are similar to a CPU with SIMD, with many threads per core (8 on Intel, more on others), and a hardware thread scheduler that allows you to have a pool of cores sharing thousands or even hundreds of thousands of threads.

None of this is about the "C model"; underpinning all of the noise is that humans struggle to coordinate concurrent logic in their heads, and prefer to think about a small number of coordination points (locks, message channels, rendezvous points, whatever) with a single thread of execution between those points. OoOE with speculative execution is one of the two local minima we've found for such a mental model of programming, and supports the case where a single thread of logic is the bottleneck. The other model that works well is the workgroup model used by GPU programming, where something distributes a very large number of input values to a pool of workers, and lets the workers build a large number of output values. Between the input and output values, there's very little (if not no) coordination between workers.

And while the 6502 is not supported upstream in any of the big C compilers, nor are many other CPUs of the same vintage. The Z80 is not supported in any of the big C compilers, nor is the 6809, for example, and both of those were big selling CPUs at the time the 6502 was current; the Z80 is also a lot friendlier to C than the 6502, since the Z80 does not limit you to a single 256 byte stack at a fixed location in memory, whereas the 6502 has a 256 byte stack fixed in page 1. I've never personally programmed a 6809 system, but I believe that it's also a lot more C friendly than the 6502.

Fundamentally, the thing that has killed every alternative to date is that the surviving processor types are simply faster for commercially significant problems than any competitor was, even with alternative programming models. This applies to VLIW, and to EPIC, and to Lisp Machines.

Another round of speculative-execution vulnerabilities

Posted Aug 14, 2023 20:08 UTC (Mon) by mtaht (guest, #11087) [Link] (2 responses)

I remain fond of the Mill set of ideas for many reasons, but was not aware of any benchmarks of the compiler, or public sim information? I have not kept track.

Weirdly enough I do not care about IPC, what I care about is really rapid context and priv switching, something that unwinding speculation on the TLB flush on spectre really impacted. I am tired of building processors that can only go fast in a straight line. And like everyone here, tired of all these vulnerabilities.

The mill held promise of context or priv switching in 3 clocks. The implicit zero feature and byte level protections seemed like a win. But it has been a long 10+ years since that design was announced, have there been any updates?

Another round of speculative-execution vulnerabilities

Posted Aug 14, 2023 21:52 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

I recently perused the forum and it seems that they're in another funding round and looking to go from startup to a proper company (salaries, etc.). Technical progress (well, at least publicizing it) is blocked on that. To be fair, they are apparently in it for the money (based on the Q&A in at least one of the talks that have been released).

Another round of speculative-execution vulnerabilities

Posted Aug 17, 2023 14:43 UTC (Thu) by farnz (subscriber, #17727) [Link]

It's a while since I saw the information (around 10 years), so I don't have links to hand, and it was investor-targeted. They seemed to be making the same mistake as Itanium designers, though - they compared hand-optimized code on their Mill simulator to GCC output on a then current Intel chip (Haswell, IIRC), showing that simulated Mill was better than GCC output on Haswell. The claim was that compiler improvements needed for Mill would bring Mill's performance on compiled code ahead of Haswell's performance; but it failed to take into account that, with a lot of human effort, I could get better performance from Haswell with hand-optimized code than they got with GCC output, using GCC's output as a starting point.

I am inherently sceptical of "compiler improvement" claims that will benefit one architecture and not another; while I'll accept that the improvement is not evenly distributed, until Mill Computing can show that their architecture with their compiler can outperform Intel, AMD, Apple, ARM or other cores with a modern production-quality (e.g. GCC, LLVM) compiler for the same language, I will tend towards the assumption that anything that they improve in the compiler will also benefit other architectures.

This holds especially true for compiler improvements around scheduling, which is what Mill depends upon, and what Itanium partially needed to beat OoOE - improvements to scheduling of instructions benefit OoOE by making the static schedule closer to optimal, leaving the OoOE engine to deal with the dynamic issues only, and not statically predictable hazards.

Another round of speculative-execution vulnerabilities

Posted Aug 10, 2023 23:59 UTC (Thu) by khim (subscriber, #9252) [Link] (4 responses)

> Yes the programmer actually has to THINK about their database design

And that's the beginning and the end. Most people out there don't want to think.

And once these people have took over… the whole house of cards started unraveling.

Today people don't want to think… about anything, really. They are ignoring as much as they could and concentrate on what's profitable.

Only… you couldn't eat paper and zeros and ones in central banks servers are even more useless.

It would be interesting to see if we would find a way to avoid collapse of western civilisation, but chances are not good: most people not only don't understand why it's collapsing, they don't even notice that collapse is not just started but well underway.

Another round of speculative-execution vulnerabilities

Posted Aug 11, 2023 15:10 UTC (Fri) by Wol (subscriber, #4433) [Link] (3 responses)

The problem is journalists ...

A couple of days ago we had an article about Drax in one of our daily newspapers - so we're talking maybe 20-30% of newspaper readers reading this article.

A major part of the story is about the power station shutting down and avoiding having to pay rebates to consumers - some government subsidy that had to be repaid if they were generating and selling electricity above a certain price. So they shut down and sold their fuel elsewhere instead.

That fuel being woodchip. So a second, large, part of the journalist's story was about how Drax was one of our biggest greenhouse gas emitters and polluters in the country! The eco-friendliness of shipping the wood from Canada is certainly up for debate, but burning wood? That's one of the greenest fuels we've got!

When journalists - who are supposed to inform the public! - get their facts so badly out of kilter, what hope do the public have?

Cheers,
Wol

Another round of speculative-execution vulnerabilities

Posted Aug 11, 2023 15:26 UTC (Fri) by paulj (subscriber, #341) [Link] (2 responses)

Way OT now, but using wood for power, on the back of a highly oil based system to grow, process and transport (over massive distances) that wood is not that green.

Particularly if that wood is coming from old wood forests that are being cleared. I don't know the details of Canadian wood pulp, but IVR a lot of their wood is from clearing old woods.

A final issue is that commercial forestry (least in UK and Ireland) is from dense pine forestry plantations, which is kind of a disaster for the native ecosystem. Really, we need to reforest our denuded countries (UK and Ireland) with natural, long-life forests - really good carbon capture and storage!

Which means we need something else for power. Something that is a lot more space efficient than covering the country in dense commercial and largely dead pine forests (which probably still won't give us enough fuel). The answer is obvious, but greens have irrational dogma.

Another round of speculative-execution vulnerabilities

Posted Aug 11, 2023 16:14 UTC (Fri) by joib (subscriber, #8541) [Link] (1 responses)

It's good to see some pushback on the remarkably common but simplistic idea that since biomass sucks up CO2 when it grows and releases it when it burns, all is ok, and we can just burn wood as much as we want with no ill effects. In addition to climate change, the other big environmental crisis is biodiversity loss, largely driven by land use changes. Such as turning native forests into cropland, or for that matter biomass plantations.

Burning biomass is, in the end, a very inefficient way of turning sunlight into usable energy. There just isn't enough arable land on the planet to replace the energy we currently get from fossil fuels. There are other very low carbon energy production technologies that are much more area efficient, like wind, solar and nuclear energy.

Anyway, this isn't the correct forum to debate this. ;)

Another round of speculative-execution vulnerabilities

Posted Aug 14, 2023 8:35 UTC (Mon) by paulj (subscriber, #341) [Link]

Nuclear power is the only option that is compatible with a modern lifestyle, AFAICT.

I consider myself pretty green, but I abhor the common the "green" stance on nuclear power. Which is completely at odds with having both a) A biodiverse and sustainable planet b) A modern way of life ("modern" implies high energy use in many many ways, and only nuclear can reliably replace fossil fuels to provide this). If you make society choose between A and B, society will choose B. Sigh sigh sigh.

Another round of speculative-execution vulnerabilities

Posted Aug 11, 2023 16:32 UTC (Fri) by DemiMarie (subscriber, #164188) [Link]

What would you do? What would a reasonable language be?


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds