Intel unveils 48-core cloud computing silicon chip (BBC)

[Posted December 3, 2009 by cook]

The BBC covers Intel's latest x86 processor prototype. "Intel has unveiled a prototype chip that packs 48 separate processing cores on to a chunk of silicon the size of a postage stamp. The Single-chip Cloud Computer (SCC), as it is known, contains 1.3 billion transistors, the tiny on-off switches that underpin chip technology. Each processing core could, in theory, run a separate operating system."

Intel unveils 48-core cloud computing silicon chip (BBC)

Posted Dec 3, 2009 17:48 UTC (Thu) by clugstj (subscriber, #4020) [Link] (5 responses)

No offense to the BBC, but is this really where we should be getting tech news? They feel the need to explain what a transistor is!

Intel unveils 48-core cloud computing silicon chip (BBC)

Posted Dec 3, 2009 18:01 UTC (Thu) by sbergman27 (guest, #10767) [Link] (2 responses)

Yeah, but this monster has 27 million of those little switching thingies per core.

Intel unveils 48-core cloud computing silicon chip (BBC)

Posted Dec 3, 2009 18:15 UTC (Thu) by jsatchell (guest, #6236) [Link] (1 responses)

It means each node is roughly the complexity of a P4, or maybe a P3 and a decent cache.

I can't imagine what they have done about package bandwidth - assuming all these CPUs want to access main memory. If they are just going to run benchmarks by themselves, there will be no problem.

Intel unveils 48-core cloud computing silicon chip (BBC)

Posted Dec 3, 2009 18:49 UTC (Thu) by Trelane (subscriber, #56877) [Link]

It's a NUMA SoC with 96MB onboard memory. Interestingly, each core has 2MB cache... ;)

*removes tongue from cheek.

How fast does it compile the kernel?

Posted Dec 3, 2009 20:55 UTC (Thu) by felixfix (subscriber, #242) [Link] (2 responses)

And how many Libraries of Congress can it copy per second between cores?

How fast does it compile the kernel?

Posted Dec 3, 2009 21:24 UTC (Thu) by sbergman27 (guest, #10767) [Link] (1 responses)

I've never found LoC to be a very interesting metric.

How fast does it compile the kernel?

Posted Dec 7, 2009 13:35 UTC (Mon) by Darkmere (subscriber, #53695) [Link]

Sure, it's a -great- metric:

One more thing we can quantify or at least estimate: The folks at the Packard Campus say that when their systems are fully online, they expect to be able to digitize between 3 and 5 petabytes of content per year. (That is to say, 3,000 to 5,000 terabytes, for those who are playing at home .....) And even at that rate, it would still take decades to digitize the existing content.

Source: How big is the LoC

Intel unveils 48-core cloud computing silicon chip (BBC)

Posted Dec 4, 2009 14:51 UTC (Fri) by dcoutts (subscriber, #5387) [Link] (2 responses)

The really interesting thing here is the lack of cache coherency. This breaks C memory model, put perhaps other higher level languages could adapt. In particular languages with no shared variables or which explicitly identify those shared variables.

The other issue is what does an OS on these things look like? Processes are ok but they presumably cannot use shared memory in the traditional way. That presumably also limits the number of threads you can have in a process, or at least the number that can run concurrently (since you only have cache coherency within each 2-core node).

On the other hand, models like MPI have been working on machines like this for at least a decade. The large Cray shared memory machines use a model like this consisting of SMP nodes with non-coherent global memory addressing to other nodes. The MPI lib then takes care of the synchronisation.

Intel unveils 48-core cloud computing silicon chip (BBC)

Posted Dec 4, 2009 17:05 UTC (Fri) by MarkWilliamson (guest, #30166) [Link]

I think the lack of cache consistency is very interesting. The slides in
PC Perspective mention "page-level memory consistency". I would like to
know more about what this means and how they intend for systems software to
manage the memory coherency. I'm wondering how much effort would be
required to get a kernel / hypervisor to run across all of the nodes in the
system. There must be some plan as to how OS instances will get in there!

Since this is not a cache coherent architecture, the kernel's approach of
using shared data structures isn't going to Just Work. But there is a
single shared memory space so maybe there is a workaround ... disable
caches on shared data structures maybe (surely that'll hose performance,
though). The fact that performance for kernel-intensive stuff might not
matter so much if you're just running virtual machines on all the nodes, I
guess, so long as KVM can make do with mostly local resources (and for any
IO with the VM, the page-level coherency can easily be explicit, I'd hope).

Should be really interesting to see what Intel come up with on this.

Intel unveils 48-core cloud computing silicon chip (BBC)

Posted Dec 6, 2009 0:46 UTC (Sun) by jeff_marshall (subscriber, #49255) [Link]

C doesn't admit the existence of threads (or processes, for that matter), so
I would contend that the memory model,or rather lack thereof, for
multiprocessing in c is not broken by this approach.

As a practical matter, it's trivial to implement message passing between
processes where cache coherency is not provided by the underlying hardware
given the ability to manually invalidate the cache.

Intel unveils 48-core cloud computing silicon chip (BBC)

Intel unveils 48-core cloud computing silicon chip (BBC)

Intel unveils 48-core cloud computing silicon chip (BBC)

Intel unveils 48-core cloud computing silicon chip (BBC)

Intel unveils 48-core cloud computing silicon chip (BBC)

More info linked from ... Slashdot!?

More info linked from ... Slashdot!?

How fast does it compile the kernel?

How fast does it compile the kernel?

How fast does it compile the kernel?

Intel unveils 48-core cloud computing silicon chip (BBC)

Intel unveils 48-core cloud computing silicon chip (BBC)

Intel unveils 48-core cloud computing silicon chip (BBC)