LWN.net Logo

MARS and The Cell Broadband Architecture

July 29, 2008

This article was contributed by Ian Ward

This article is based on a talk given by Geoff Levand at the Linux Symposium in Ottawa on July 24, 2008.

The latest TOP500 Supercomputers list was released last month and the new front-runner is using a processor quite unlike what you would find in your laptop.

The Cell Broadband Architecture (simply referred to as "Cell" in this article) was produced as a joint venture between IBM, Toshiba and Sony. The Cell is available in server hardware but is most commonly found in Sony's Playstation 3 gaming console.

The Cell is interesting because of its unusual design and performance characteristics. The Cell is described as a heterogeneous multicore CPU. It has one Power Processing Element (PPE) which is a general purpose processor and up to 8 Synergistic Processing Elements (SPEs). An SPE is a high-performance vector processing unit with 256KiB of local memory and its own DMA unit. The PPE, SPEs and memory and I/O controllers are connected by a high speed bus.

The PPE is quite slow compared to modern processors so the SPEs must be used to achieve good performance. This means writing software that takes the Cell's design into consideration because there is no simple way to optimize existing applications. Once an application has been designed to use the Cell's SPEs effectively it may run many times faster than when run on a traditional CPU.

GCC with the Cell SDK can emit code for both the PPE and SPEs, including passing messages and managing overlays when the SPE code size exceeds 256KiB. The Linux kernel can also manage multitasking the SPEs with its scheduler. These conveniences make it easier to write code for the Cell processor, but they can have a significant impact on performance. Preemptive multitasking on an SPE involves swapping all the local memory of the current process with the local memory of the process to be run. This requires time and bus bandwidth for the processor. Ideally you would always have at least as many SPEs as processes you need to run so that your process would never be swapped out.

The Multicore Application Runtime System (MARS) framework is a prototype of a cooperative multitasking system for the Cell that tries to address the performance overhead of running many processes on the Cell's available SPEs. MARS uses a library on the PPE and a very small kernel on the SPEs.

MARS currently has a priority-based cooperative scheduler. This scheduler lets you specify how much context you need to save when your process is swapped out. In the "run complete" case no context needs to be saved allowing the next process to run much more quickly.

Synchronizing of processes is commonly required between the Cell's SPEs and PPE. The only way to synchronize with the existing Cell SDK is to cause your SPE to busy-wait on a semaphore, but the MARS scheduler gives you the option of swapping out a process and doing other work instead.

Cooperative multitasking does have its downsides. You lose protection between your processes, and one process could hang and require intervention to release the PPE. It is also necessary to place manual yield points through your code or design each process to be short-lived. However, if your application needs to make the most of the Cell architecture, MARS is a promising starting point and addresses the need for a more efficient approach to scheduling.


(Log in to post comments)

MARS and The Cell Broadband Architecture

Posted Jul 31, 2008 2:22 UTC (Thu) by mdnelson (subscriber, #45517) [Link]

Great article!

Just a few minor corrections: the local store on the SPE is actually 256KiB; and, the
RoadRunner supercomputer that topped the list was actually a hybrid design using both Cell
processors and AMD Opterons (although the article is focused on the Cell processor, I thought
it important to mention the Opterons too).

MARS and The Cell Broadband Architecture

Posted Jul 31, 2008 6:48 UTC (Thu) by salimma (subscriber, #34460) [Link]

That is a curious combination. Granted, even if normal POWER cores were used in conjunction
with Cell, you'd probably still want to compile code for the Cell PPE and POWER cores using
different optimizations, but why use two CPU architectures with different bit-endianness?

MARS and The Cell Broadband Architecture

Posted Aug 1, 2008 0:20 UTC (Fri) by mdnelson (subscriber, #45517) [Link]

I believe this arrangement of Opterons and Cell chips just lead to the best performance (and
no doubt there was a question of price in there too...).

Correction

Posted Jul 31, 2008 13:44 UTC (Thu) by Webexcess (subscriber, #197) [Link]

Thank you for the corrections.

I was using some of my notes on the Cell from last year's OLS.  I wonder if the local store
got larger when they started using the 65nm process.

Correction

Posted Aug 1, 2008 0:24 UTC (Fri) by mdnelson (subscriber, #45517) [Link]

As far as I know Cell chips have always come with 256KB of LS, even on the first 90nm
variants. There were some rumours about it being increased to 512KB with a process shrink but
I believe that idea was thrown out.

a processor quite unlike what you would find in your laptop

Posted Jul 31, 2008 11:17 UTC (Thu) by zdzichu (subscriber, #17118) [Link]

MARS and The Cell Broadband Architecture

Posted Jul 31, 2008 16:17 UTC (Thu) by Quazatron (guest, #4368) [Link]

It really annoys me when someone uses KiB instead of KB. When you see 1 KB, you *know* it
stands for 1024 bytes, when you see 1 KiB, you look it up in wikipedia and find it stands
for... 1024 bytes!
It just introduces the doubt that if elsewhere the KB notation is used, somehow it must mean
1000 bytes instead... or maybe not?
This is the type of Standards Organization generated entropy that I, for one, could do
without.

MARS and The Cell Broadband Architecture

Posted Jul 31, 2008 17:47 UTC (Thu) by ndye (subscriber, #9947) [Link]

Blame the disk storage vendors:

They sold a 40 MB drive that stored "(at least) 40,000,000 bytes".  When we assumed we were
buying a MiB drive, and found a MB drive, we felt cheated ... by 4.86% !!!

Lately, with *GB drives, the difference was 7.37%.

With the impending *TB drives, the difference is 9.95%.  "Caveat emptor" and all that, but
still!

(Disk for sure, maybe tape ... not RAM ... surely?  Better check those SSD's carefully, eh?)

MARS and The Cell Broadband Architecture

Posted Aug 2, 2008 5:50 UTC (Sat) by Wisq (guest, #53222) [Link]

I tend to place the blame more on those who originally coined the term "kilobyte" to refer to
1024 bytes.

Using SI prefixes for powers of two is a blatant abuse of the SI system, but the difference
probably seemed minor at the time.  As you noted, it's not until you start getting into the
higher orders that the error becomes significant.

For people in metric countries that use metric measurements every day -- kilometres,
kilograms, etc. -- it's a whole lot easier to be annoyed by the misuse of "kilo" than by the
coining of "kibi" to clear things up.

(And to the original poster... are you seriously saying that every time you see "KiB", you
have to look it up on Wikipedia?  Especially when it's apparently your pet peeve?)

MARS and The Cell Broadband Architecture

Posted Aug 3, 2008 16:19 UTC (Sun) by cortana (subscriber, #24596) [Link]

It is 'kB' (or is it KB?) that has the ambiguous meaning. Seeing '256 KB' makes me wonder
whether they mean 262144 bytes, or 256000 bytes. '256 KiB', on the other hand, always means
262144 bytes.

MARS and The Cell Broadband Architecture

Posted Aug 7, 2008 11:20 UTC (Thu) by renox (subscriber, #23785) [Link]

>>When you see 1 KB, you *know* it stands for 1024 bytes<<

And you would be wrong: I work for a telecom company, in our document here 1 KB is 1000 Bytes.

Face it, the one would changed the meaning of kilo were the one who made the mistake..

MARS slides posted

Posted Jul 31, 2008 18:44 UTC (Thu) by Webexcess (subscriber, #197) [Link]

Geoff has just posted the slides for his talk:

ftp://ftp.infradead.org/pub/Sony-PS3/mars/MARS-OLS-2008.pdf

OLS 2007 Cell/B.E. Slides and Papers

Posted Aug 1, 2008 18:26 UTC (Fri) by Webexcess (subscriber, #197) [Link]

Arika Tsukamoto pointed me to the OLS 2007 Cell/B.E. BOF slides:
http://www.celinuxforum.org/CelfPubWiki/OttawaLinuxSympos...

There is also the talk given by Arnd Bergman:
http://ols.fedoraproject.org/OLS/Reprints-2007/bergmann-R...

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds