July 29, 2008
This article was contributed by Ian Ward
This article is based on a talk given by Geoff Levand at the
Linux Symposium
in Ottawa on July 24, 2008.
The latest
TOP500 Supercomputers list was released last month and the new front-runner
is using a processor quite unlike what you would find in your laptop.
The Cell Broadband Architecture (simply referred to as "Cell" in this article) was produced as a joint venture between IBM, Toshiba and Sony. The Cell is available in server hardware but is most commonly found in Sony's Playstation 3 gaming console.
The Cell is interesting because of its unusual design and performance characteristics. The Cell is described as a heterogeneous multicore CPU. It has one Power Processing Element (PPE) which is a general purpose processor and up to 8 Synergistic Processing Elements (SPEs). An SPE is a high-performance vector processing unit with 256KiB of local memory and its own DMA unit. The PPE, SPEs and memory and I/O controllers are connected by a high speed bus.
The PPE is quite slow compared to modern processors so the SPEs must be used to achieve good performance. This means writing software that takes the Cell's design into consideration because there is no simple way to optimize existing applications. Once an application has been designed to use the Cell's SPEs effectively it may run many times faster than when run on a traditional CPU.
GCC with the Cell SDK can emit code for both the PPE and SPEs, including passing messages and managing overlays when the SPE code size exceeds 256KiB. The Linux kernel can also manage multitasking the SPEs with its scheduler. These conveniences make it easier to write code for the Cell processor, but they can have a significant impact on performance. Preemptive multitasking on an SPE involves swapping all the
local memory of the current process with the local memory of the process to be
run. This requires time and bus bandwidth for the processor. Ideally you would always have at least as many SPEs as processes you need
to run so that your process would never be swapped out.
The Multicore Application Runtime System (MARS) framework is a prototype of a
cooperative multitasking system for the Cell that tries to address the performance overhead of running many processes on the Cell's available SPEs.
MARS uses a library on the PPE and a very small kernel on the SPEs.
MARS currently has a priority-based cooperative scheduler. This scheduler lets you specify how much context you need to save when your process is swapped out. In the "run complete" case no context needs to be saved allowing
the next process to run much more quickly.
Synchronizing of processes is commonly required between the Cell's SPEs and PPE. The only way to synchronize with the existing Cell SDK is to cause your SPE to busy-wait on a semaphore, but the MARS scheduler gives you the option of swapping out a process and doing other work instead.
Cooperative multitasking does have its downsides. You lose protection between your processes, and one process could hang and require intervention to release the PPE. It is also necessary to place manual yield points
through your code or design each process to be short-lived. However, if your application needs to make the most of the Cell architecture, MARS is a promising starting point and addresses the need for a more efficient approach to scheduling.
(
Log in to post comments)