|
|
Subscribe / Log in / New account

Intel unveils 48-core cloud computing silicon chip (BBC)

The BBC covers Intel's latest x86 processor prototype. "Intel has unveiled a prototype chip that packs 48 separate processing cores on to a chunk of silicon the size of a postage stamp. The Single-chip Cloud Computer (SCC), as it is known, contains 1.3 billion transistors, the tiny on-off switches that underpin chip technology. Each processing core could, in theory, run a separate operating system."

to post comments

Intel unveils 48-core cloud computing silicon chip (BBC)

Posted Dec 3, 2009 17:48 UTC (Thu) by clugstj (subscriber, #4020) [Link] (5 responses)

No offense to the BBC, but is this really where we should be getting tech news? They feel the need to explain what a transistor is!

Intel unveils 48-core cloud computing silicon chip (BBC)

Posted Dec 3, 2009 18:01 UTC (Thu) by sbergman27 (guest, #10767) [Link] (2 responses)

Yeah, but this monster has 27 million of those little switching thingies per core.

Intel unveils 48-core cloud computing silicon chip (BBC)

Posted Dec 3, 2009 18:15 UTC (Thu) by jsatchell (guest, #6236) [Link] (1 responses)

It means each node is roughly the complexity of a P4, or maybe a P3 and a decent cache.

I can't imagine what they have done about package bandwidth - assuming all these CPUs want to access main memory. If they are just going to run benchmarks by themselves, there will be no problem.

Intel unveils 48-core cloud computing silicon chip (BBC)

Posted Dec 3, 2009 18:49 UTC (Thu) by Trelane (subscriber, #56877) [Link]

It's a NUMA SoC with 96MB onboard memory. Interestingly, each core has 2MB cache... ;)

*removes tongue from cheek.

More info linked from ... Slashdot!?

Posted Dec 3, 2009 19:01 UTC (Thu) by MarkWilliamson (guest, #30166) [Link] (1 responses)

In what is surely a sign of the End Times ;-) Slashdot has provided a link
that I think others here will find informative: http://www.pcper.com/article.php?aid=825

The original Slashdot story is here:
http://hardware.slashdot.org/story/09/12/02/215207/Intel-...
Processor

Amongst other things, the article notes that each node is dual core, so
there are 24 processing nodes on the chip. There are several memory
controllers. Cache coherency (between nodes, I assume) is not handled by
hardware - a bit of a departure for Intel.

As a result of these design decisions, one thing which immediately occurred
to me was that the design might be useful for partitioning into smaller
virtual machines, each of which has its own dedicated memory and doesn't
need to worry about cache coherency. The VMM layer would handle any
explicit coherency control when required. Interestingly, the BBC article
suggests that Intel are talking about running many OS instances on a single
chip so I guess this might be what they are really thinking of.

I wonder what pain would be involved in getting a commodity OS such as
Linux to span the nodes in the system by managing software cache coherency.
Intel must have at least considered that, I'd have thought...

More info linked from ... Slashdot!?

Posted Dec 4, 2009 3:20 UTC (Fri) by drag (guest, #31333) [Link]

Well that would be the 'cloud' part of the CPU design then.

Throw a couple datacenters of these and you could run thousands of customer
VM instances and load balance things regionally. Plus having such a high
density would easily allow you to scale your systems to meet threading
demands.

How fast does it compile the kernel?

Posted Dec 3, 2009 20:55 UTC (Thu) by felixfix (subscriber, #242) [Link] (2 responses)

And how many Libraries of Congress can it copy per second between cores?

How fast does it compile the kernel?

Posted Dec 3, 2009 21:24 UTC (Thu) by sbergman27 (guest, #10767) [Link] (1 responses)

I've never found LoC to be a very interesting metric.

How fast does it compile the kernel?

Posted Dec 7, 2009 13:35 UTC (Mon) by Darkmere (subscriber, #53695) [Link]

Sure, it's a -great- metric:

One more thing we can quantify or at least estimate: The folks at the Packard Campus say that when their systems are fully online, they expect to be able to digitize between 3 and 5 petabytes of content per year. (That is to say, 3,000 to 5,000 terabytes, for those who are playing at home .....) And even at that rate, it would still take decades to digitize the existing content.
Source: How big is the LoC

Intel unveils 48-core cloud computing silicon chip (BBC)

Posted Dec 4, 2009 14:51 UTC (Fri) by dcoutts (subscriber, #5387) [Link] (2 responses)

The really interesting thing here is the lack of cache coherency. This breaks C memory model, put perhaps other higher level languages could adapt. In particular languages with no shared variables or which explicitly identify those shared variables.

The other issue is what does an OS on these things look like? Processes are ok but they presumably cannot use shared memory in the traditional way. That presumably also limits the number of threads you can have in a process, or at least the number that can run concurrently (since you only have cache coherency within each 2-core node).

On the other hand, models like MPI have been working on machines like this for at least a decade. The large Cray shared memory machines use a model like this consisting of SMP nodes with non-coherent global memory addressing to other nodes. The MPI lib then takes care of the synchronisation.

Intel unveils 48-core cloud computing silicon chip (BBC)

Posted Dec 4, 2009 17:05 UTC (Fri) by MarkWilliamson (guest, #30166) [Link]

I think the lack of cache consistency is very interesting. The slides in
PC Perspective mention "page-level memory consistency". I would like to
know more about what this means and how they intend for systems software to
manage the memory coherency. I'm wondering how much effort would be
required to get a kernel / hypervisor to run across all of the nodes in the
system. There must be some plan as to how OS instances will get in there!

Since this is not a cache coherent architecture, the kernel's approach of
using shared data structures isn't going to Just Work. But there is a
single shared memory space so maybe there is a workaround ... disable
caches on shared data structures maybe (surely that'll hose performance,
though). The fact that performance for kernel-intensive stuff might not
matter so much if you're just running virtual machines on all the nodes, I
guess, so long as KVM can make do with mostly local resources (and for any
IO with the VM, the page-level coherency can easily be explicit, I'd hope).

Should be really interesting to see what Intel come up with on this.

Intel unveils 48-core cloud computing silicon chip (BBC)

Posted Dec 6, 2009 0:46 UTC (Sun) by jeff_marshall (subscriber, #49255) [Link]

C doesn't admit the existence of threads (or processes, for that matter), so
I would contend that the memory model,or rather lack thereof, for
multiprocessing in c is not broken by this approach.

As a practical matter, it's trivial to implement message passing between
processes where cache coherency is not provided by the underlying hardware
given the ability to manually invalidate the cache.


Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds