Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for June 20, 2013
Pencil, Pencil, and Pencil
Dividing the Linux desktop
LWN.net Weekly Edition for June 13, 2013
A report from pgCon 2013
Linux support for ARM big.LITTLE
Posted Feb 15, 2012 15:08 UTC (Wed) by SEJeff (subscriber, #51588)
Posted Feb 15, 2012 15:21 UTC (Wed) by gioele (subscriber, #61675)
The idea _per se_ is not new at all, see the asymmetric multi-CPUs  in the '70s or the Cell architecture in 2000 .
Posted Feb 15, 2012 15:33 UTC (Wed) by hamjudo (subscriber, #363)
Wikipedia says the Cyber came out in 1964. There are many earlier examples.
Posted Feb 15, 2012 16:50 UTC (Wed) by drag (subscriber, #31333)
Then, of course, there is modern x86-64 systems were the GPU is a now a coprocessor that can be used for anything, rather then something dedicated just for graphics.
I don't know how common it is to have disparate general purpose processors, however. With my GP2X the application had to be optimized to take advantage of the second processor.
Posted Feb 15, 2012 17:04 UTC (Wed) by jzbiciak (✭ supporter ✭, #5246)
Heck, even desktop PCs have been asymmetric multiprocessor since their introduction (8038 in the keyboard controller, 8088 running the apps), but the OS really only thinks about the main CPU.
The A7/A15 split is rather different: This wants to run SMP Linux across both types of cores, dynamically moving tasks between the A7 and A15 side of the world seamlessly. All of the processor cores are considered generic resources, just with different performance/power envelopes. That's rather different from the GP2X model.
Posted Feb 15, 2012 18:16 UTC (Wed) by jmorris42 (subscriber, #2203)
True but that is because of reasons above the silicon. In a typical bottom of the line Android phone you already have multiple processing cores. Mine has two ARM cores, two DSPs plus the GPU. One ARM and one DSP are dedicated to the radio to keep that side a silo but the processors can talk to each other and share one block of RAM. So there isn't a techinical reason an OS couldn't be written for that old hardware that unified all five processing elements and dispatched executable code between the ARM cores as needed. It just wouldn't be a phone anymore.
The 'innovation' here is deciding to encourage the rediscovery of asymetrical multiprocessing and to relearn and update the ways to deal with the problems it brings. There was a reason everyone moved to SMP, it made the software a lot easier; now power consumption drives everything and software is already very complex so the balance shifts.
Then they will stick in an even slower CPU for the radio in chips destined for phone use and wall it off in the bootloader just like now. It is the only way to keep the lawyers (and the FCC) happy.
Posted Feb 15, 2012 20:51 UTC (Wed) by phip (guest, #1715)
In a low-end multicore SOC with noncoherent CPUs, the DRAM is usually statically partitioned between the CPUs which run independant OS images. Any interprocessor communication is done through shared buffers that are uncached or else carefully fushed at the right times (similar to DMA)
Posted Feb 15, 2012 21:55 UTC (Wed) by zlynx (subscriber, #2285)
For example, the OS could force a cache flush on both CPUs when migrating a task.
Threaded applications would have a tough time, but even that has been handled in the past. For example, the compiler and/or threading library could either track memory writes between locks so as to make sure those cache addresses are flushed, or it could use the big hammer of a full cache flush before unlock.
Cache coherency is really just a crutch. Lots of embedded programmers and Cell programmers (no cache protocol on the SPE's) know how to work without it. :-)
Posted Feb 15, 2012 22:31 UTC (Wed) by phip (guest, #1715)
It's not just migrating processes & threads - any global operating
system data structures need to be synchronized with cache flushes,
memory ordering barriers, mutexes, etc. before and after each access.
If you want to use multiple noncoherent cores to run a general-purpose,
the best approach is to treat it as a cluster (with each CPU running
its own OS image).
Posted Feb 15, 2012 22:35 UTC (Wed) by phip (guest, #1715)
Having a different instruction set on the different processor
cores moves the complexity to another level above noncoherence.
The usual programming model is to run a general-purpose OS on the
PowerPC processor(s), and treat the SPEs as a type of coprocessor
or peripheral device.
Posted Feb 15, 2012 23:02 UTC (Wed) by zlynx (subscriber, #2285)
No one runs a MP system of different instruction sets, true. It isn't impossible though. The OS would need to be built twice, once for each instruction set. It could share the data structures.
I wonder if Intel ever played around with this for an Itanium server? It seems that I remember Itanium once shared the same socket layout as a Xeon so this would have been possible on a hardware level.
Now, if you got very tricky and decided to require that all software be in an intermediate language, the OS could use LLVM, Java, .NET or whatever to compile the binary to whichever CPU was going to execute the process. That would really make core switching expensive! And you'd need some way to mark where task switches were allowed to happen, and maybe run the task up to the next switch point so you could change over to the same equivalent point in the other architecture.
A bit more realistic would be cores with the same base instruction set, but specialized functions on different cores. That could work really well. When the program hit an "Illegal Instruction" fault, the OS could inspect the instruction and find one of the system cores that supports it, then migrate the task there or emulate the instruction. Or launch a user-space helper to download a new core design off the internet and program the FPGA!. That would let programmers use specialized vector, GPU, audio or regular expression instructions without worrying about what cores to use.
Posted Feb 19, 2012 18:06 UTC (Sun) by alison (✭ supporter ✭, #63752)
"it could use the big hammer of a full cache flush before unlock.
Cache coherency is really just a crutch. Lots of embedded programmers and Cell programmers (no cache protocol on the SPE's) know how to work without it. :-)"
Full cache flush == embedded "Big Kernel Lock" equivalent?
-- Alison Chaiken, email@example.com
Posted Feb 19, 2012 21:51 UTC (Sun) by phip (guest, #1715)
Hmm, that brings up another point I should have thought of earlier:
Non-coherent multi-CPU SOCs are also likely to not implement
atomic memory access primatives (i.e. Compare/Exchange, Test and Set,
Posted Feb 15, 2012 17:27 UTC (Wed) by nix (subscriber, #2304)
It's very common for ARM systems to have disparate processors. My GP2X handheld is dual core system. It has a regular ARM processor and then a second processor for accelerating some types of multimedia functions.
Posted Feb 23, 2012 19:41 UTC (Thu) by wmf (guest, #33791)
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds