|
|
Log in / Subscribe / Register

How many in-order cores could one fit on a die?

How many in-order cores could one fit on a die?

Posted Aug 10, 2023 16:07 UTC (Thu) by DemiMarie (subscriber, #164188)
In reply to: Another round of speculative-execution vulnerabilities by willy
Parent article: Another round of speculative-execution vulnerabilities

How many of those non-speculative cores could fit on a die? Could one make up for the reduced single-threaded performance with higher core counts and hardware support for coordination between cores?


to post comments

How many in-order cores could one fit on a die?

Posted Aug 10, 2023 16:34 UTC (Thu) by malmedal (subscriber, #56172) [Link] (7 responses)

> Could one make up for the reduced single-threaded performance with higher core counts and hardware support for coordination between cores?

In theory, but the people who have tried, e.g. Sun with Niagara and Intel with Larrabee have so far failed...

How many in-order cores could one fit on a die?

Posted Aug 11, 2023 9:58 UTC (Fri) by paulj (subscriber, #341) [Link] (6 responses)

I'm not sure Niagara failed. It did surprisingly well for Sun. For the stuff it was good at, it was very very good at.

Larrabee failed, but... Intel tried to make that into a GPU competitor. And the amount of RAM was limited.

How many in-order cores could one fit on a die?

Posted Aug 11, 2023 17:06 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

I was on a receiving end of trying to make Niagara work (the second one, with multiple FPUs). In Java, which is supposed to be its natural habitat.

It never worked well, garbage collection was slow because even the "parallel" GC in Sun JVM was not quite parallel and the sequential parts were causing huge delays because the single-threaded execution was super-slow.

Later, we tried to use Tilera CPUs (massively parallel CPUs with 32 cores) for networking software, and it ALSO failed miserably. Turns out that occasional serialized code just overwhelms everything. I still have a MikroTik Tilera-based router from that experiment, I'm using it for my home network.

How many in-order cores could one fit on a die?

Posted Aug 11, 2023 19:55 UTC (Fri) by malmedal (subscriber, #56172) [Link]

I also tried Java to try to make use of the Niagara, a major pain point was a number of non thread-safe routines in the standard library.

Especially annoying since I was not making these calls directly, they were from third-party libraries so it was practically impossible to figure out what could be safely run in parallel.

How many in-order cores could one fit on a die?

Posted Aug 14, 2023 8:48 UTC (Mon) by paulj (subscriber, #341) [Link] (2 responses)

I'm not sure Java was its natural habitat either, given the GC issues. It was designed for fairly parallel (multi-thread/process) C/C++ server software - web, SQL, etc.

Tilera, worked on software on that too. The people who architected that software had actually done a pretty good job of making sure the packet processing "hot" paths could all run independently, and each thread (1:1 to CPU core) had its own copy of the data required to process packets. Other, non-forwarding-path "offline" code would then in the background take the per-CPU packet data, process it, figure out what needed to be updated, and update each per-CPU hot-path/packer-processing data state accordingly. That worked very well.

The issue the shop I worked at had with Tilera was that it was unreliable. The hardware had weird lock up bugs. I figured out ways to increase the MTBF of these hard lock ups, by taking more care in programming the broadcom Phys attached to the chip (I think they were on ASIC, and part of the Tilera design - can't quite remember). But... MAU programming via I2C controllers shouldn't really be causing catastrophic lockups of the whole chip. We still had hard lock ups though - never fully figured them all out or work-arounds.

It seemed a 'fragile' and sensitive chip.

How many in-order cores could one fit on a die?

Posted Aug 14, 2023 9:20 UTC (Mon) by paulj (subscriber, #341) [Link]

Uhm, MII MDIO rather, not MAU, I guess.

How many in-order cores could one fit on a die?

Posted Aug 14, 2023 15:38 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> The issue the shop I worked at had with Tilera was that it was unreliable. The hardware had weird lock up bugs.

We found some strange lockups in glibc, something to do with pthreads and signals. We "solved" it by porting musl libc, at that time it was easier to do than figuring out how to build and debug glibc.

But yeah, lockups also happened.

How many in-order cores could one fit on a die?

Posted Aug 17, 2023 11:00 UTC (Thu) by davidgerard (guest, #100304) [Link]

Was also there, did that - went from a Niagara beast machine to Ubuntu VMs. Our actual reason was to get away from Oracle as absolutely fast as possible - but it turned out bogomips were what our apps actually wanted, and 300MHz vs 3GHz did in fact make our apps many times faster.

How many in-order cores could one fit on a die?

Posted Aug 10, 2023 17:16 UTC (Thu) by farnz (subscriber, #17727) [Link] (1 responses)

Most systems have a compute device in them, called a GPU, which is designed that way. For certain workloads, such as graphics rendering and machine learning, this is an amazing model, because there's a huge amount of parallelism to exploit (so-called "embarrassingly parallel" problems). For others, such as running a TCP/IP stack, it's not great, because much of the problem is serial, and you're better off pushing the problem to a CPU which is designed to run a single thread exceedingly fast.

How many in-order cores could one fit on a die?

Posted Aug 11, 2023 9:59 UTC (Fri) by paulj (subscriber, #341) [Link]

If you have X hundreds to thousands different TCP state machines to run, the many-simple-core machine may give you better throughput.

How many in-order cores could one fit on a die?

Posted Aug 11, 2023 9:52 UTC (Fri) by paulj (subscriber, #341) [Link] (4 responses)

Several times more. Sun Niagara T1 had 8 cores X 4 threads = 64 threads, at a time when the complex OOO CPUs had 4 cores max with 2-way SMT (8 threads).

That machine got considerably more throughput on highly parallel web workloads as a result (as long as you didn't run a web app in a language that indiscriminately used floating-point, like PHP, cause they gave it one FPU to share between all cores!).

See link in another comment to a blog post with more details and references to a couple of really good papers - old, but still good reading.

How many in-order cores could one fit on a die?

Posted Aug 11, 2023 11:19 UTC (Fri) by malmedal (subscriber, #56172) [Link] (3 responses)

> as long as you didn't run a web app in a language that indiscriminately used floating-point, like PHP

At one point I tried explaining to people, complete with benchmarks, why the Niagaras were not a good fit for a specific PHP application. It is quite difficult to convince people that the new expensive system they just bought will never work as well as the existing several years old servers it was supposed to replace.

Since the servers were bought and paid for I tried to find something useful for them to do, but did not really succeed.

How many in-order cores could one fit on a die?

Posted Aug 11, 2023 12:03 UTC (Fri) by paulj (subscriber, #341) [Link] (2 responses)

It was quite amazing that, in all the years it must have taken to get Niagara from concept through to actual servers, a CPU designed explicitly for loads like web serving, that no one really considered that a lot of web applications (esp then) are written in languages/frameworks that just use FP for all arithmetic.

How many in-order cores could one fit on a die?

Posted Aug 14, 2023 10:21 UTC (Mon) by epa (subscriber, #39769) [Link] (1 responses)

PHP and other scripting languages like Perl treat numbers as double-precision floating point but a lot of the time they are only smallish integers in practice. With a small amount of silicon you could give each core a 'fake FPU' that performs the necessary integer operations. If it turns out the inputs or the result aren't integer, it waits for the real FPU to become available.

How many in-order cores could one fit on a die?

Posted Aug 15, 2023 4:47 UTC (Tue) by donald.buczek (subscriber, #112892) [Link]

> PHP and other scripting languages like Perl treat numbers as double-precision floating point but a lot of the time they are only smallish integers in practice.

Not true for Perl, integers and doubles use native types [1].

[1]: https://github.com/Perl/perl5/blob/79c6bd015ed156a95e3480...

How many in-order cores could one fit on a die?

Posted Aug 13, 2023 20:16 UTC (Sun) by kleptog (subscriber, #1183) [Link]

This is basically the idea behind Erlang and it's VM. It's a completely different style of programming (the Actor model) but it basically means your program consists of thousands of threads (typically called processes in this context) that are quickly created and destroyed and work by passing messages to each other. A webserver running on it may generate dozens of threads for each request that comes in.

You're working on a VM so there is some overhead, but the result is that your application can linearly scale with the number of cores. A 256-core machine will support twice as many requests per second and a 128-core machine. It was built for telephony exchanges, and it shows. For stuff like WhatsApp where you're managing millions of TCP connections and messages, it really shines.

It's a functional language though with no per-process shared mutable state. It avoids a lot of GC overhead because most threads die before the first GC pass is run. You simply toss all the objects associated with a thread when it exits without checking liveness.

There is absolutely no way you could make the existing mass of Javascript of C/C++ run in such a way. Maybe one day we will have AI systems smart enough to reformulate code in this way for us.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds