Linux support for ARM big.LITTLE
Linux support for ARM big.LITTLE
Posted Feb 15, 2012 15:02 UTC (Wed) by tajyrink (subscriber, #2750)In reply to: Linux support for ARM big.LITTLE by SEJeff
Parent article: Linux support for ARM big.LITTLE
Posted Feb 15, 2012 15:08 UTC (Wed)
by SEJeff (guest, #51588)
[Link] (14 responses)
Posted Feb 15, 2012 15:21 UTC (Wed)
by gioele (subscriber, #61675)
[Link]
The idea _per se_ is not new at all, see the asymmetric multi-CPUs [1] in the '70s or the Cell architecture in 2000 [2].
[1] https://en.wikipedia.org/wiki/Asymmetric_multiprocessing
Posted Feb 15, 2012 15:33 UTC (Wed)
by hamjudo (guest, #363)
[Link]
Wikipedia says the Cyber came out in 1964. There are many earlier examples.
Posted Feb 15, 2012 16:50 UTC (Wed)
by drag (guest, #31333)
[Link] (10 responses)
Then, of course, there is modern x86-64 systems were the GPU is a now a coprocessor that can be used for anything, rather then something dedicated just for graphics.
I don't know how common it is to have disparate general purpose processors, however. With my GP2X the application had to be optimized to take advantage of the second processor.
Posted Feb 15, 2012 17:04 UTC (Wed)
by jzbiciak (guest, #5246)
[Link] (8 responses)
Heck, even desktop PCs have been asymmetric multiprocessor since their introduction (8038 in the keyboard controller, 8088 running the apps), but the OS really only thinks about the main CPU.
The A7/A15 split is rather different: This wants to run SMP Linux across both types of cores, dynamically moving tasks between the A7 and A15 side of the world seamlessly. All of the processor cores are considered generic resources, just with different performance/power envelopes. That's rather different from the GP2X model.
Posted Feb 15, 2012 18:16 UTC (Wed)
by jmorris42 (guest, #2203)
[Link] (7 responses)
True but that is because of reasons above the silicon. In a typical bottom of the line Android phone you already have multiple processing cores. Mine has two ARM cores, two DSPs plus the GPU. One ARM and one DSP are dedicated to the radio to keep that side a silo but the processors can talk to each other and share one block of RAM. So there isn't a techinical reason an OS couldn't be written for that old hardware that unified all five processing elements and dispatched executable code between the ARM cores as needed. It just wouldn't be a phone anymore.
The 'innovation' here is deciding to encourage the rediscovery of asymetrical multiprocessing and to relearn and update the ways to deal with the problems it brings. There was a reason everyone moved to SMP, it made the software a lot easier; now power consumption drives everything and software is already very complex so the balance shifts.
Then they will stick in an even slower CPU for the radio in chips destined for phone use and wall it off in the bootloader just like now. It is the only way to keep the lawyers (and the FCC) happy.
Posted Feb 15, 2012 20:51 UTC (Wed)
by phip (guest, #1715)
[Link] (6 responses)
In a low-end multicore SOC with noncoherent CPUs, the DRAM is usually statically partitioned between the CPUs which run independant OS images. Any interprocessor communication is done through shared buffers that are uncached or else carefully fushed at the right times (similar to DMA)
-Phil
Posted Feb 15, 2012 21:55 UTC (Wed)
by zlynx (guest, #2285)
[Link] (5 responses)
For example, the OS could force a cache flush on both CPUs when migrating a task.
Threaded applications would have a tough time, but even that has been handled in the past. For example, the compiler and/or threading library could either track memory writes between locks so as to make sure those cache addresses are flushed, or it could use the big hammer of a full cache flush before unlock.
Cache coherency is really just a crutch. Lots of embedded programmers and Cell programmers (no cache protocol on the SPE's) know how to work without it. :-)
Posted Feb 15, 2012 22:31 UTC (Wed)
by phip (guest, #1715)
[Link] (2 responses)
It's not just migrating processes & threads - any global operating
If you want to use multiple noncoherent cores to run a general-purpose,
Posted Feb 15, 2012 22:35 UTC (Wed)
by phip (guest, #1715)
[Link] (1 responses)
Having a different instruction set on the different processor
The usual programming model is to run a general-purpose OS on the
Posted Feb 15, 2012 23:02 UTC (Wed)
by zlynx (guest, #2285)
[Link]
No one runs a MP system of different instruction sets, true. It isn't impossible though. The OS would need to be built twice, once for each instruction set. It could share the data structures.
I wonder if Intel ever played around with this for an Itanium server? It seems that I remember Itanium once shared the same socket layout as a Xeon so this would have been possible on a hardware level.
Now, if you got very tricky and decided to require that all software be in an intermediate language, the OS could use LLVM, Java, .NET or whatever to compile the binary to whichever CPU was going to execute the process. That would really make core switching expensive! And you'd need some way to mark where task switches were allowed to happen, and maybe run the task up to the next switch point so you could change over to the same equivalent point in the other architecture.
A bit more realistic would be cores with the same base instruction set, but specialized functions on different cores. That could work really well. When the program hit an "Illegal Instruction" fault, the OS could inspect the instruction and find one of the system cores that supports it, then migrate the task there or emulate the instruction. Or launch a user-space helper to download a new core design off the internet and program the FPGA!. That would let programmers use specialized vector, GPU, audio or regular expression instructions without worrying about what cores to use.
Posted Feb 19, 2012 18:06 UTC (Sun)
by alison (subscriber, #63752)
[Link] (1 responses)
"it could use the big hammer of a full cache flush before unlock.
Full cache flush == embedded "Big Kernel Lock" equivalent?
-- Alison Chaiken, alchaiken@gmail.com
Posted Feb 19, 2012 21:51 UTC (Sun)
by phip (guest, #1715)
[Link]
Hmm, that brings up another point I should have thought of earlier:
Posted Feb 15, 2012 17:27 UTC (Wed)
by nix (subscriber, #2304)
[Link]
Posted Feb 23, 2012 19:41 UTC (Thu)
by wmf (guest, #33791)
[Link]
Linux support for ARM big.LITTLE
Linux support for ARM big.LITTLE
[2] https://en.wikipedia.org/wiki/Cell_%28microprocessor%29
Anybody who programmed a CDC Cyber 6600 thought, wouldn't it be great if we could make a system where the little processors ran the same instruction set as the big processors?Linux support for ARM big.LITTLE
Linux support for ARM big.LITTLE
Linux support for ARM big.LITTLE
Linux support for ARM big.LITTLE
> second CPU is treated as more like a peripheral.
Linux support for ARM big.LITTLE
Linux support for ARM big.LITTLE
Linux support for ARM big.LITTLE
that nobody does it for any mainstream OS.
system data structures need to be synchronized with cache flushes,
memory ordering barriers, mutexes, etc. before and after each access.
the best approach is to treat it as a cluster (with each CPU running
its own OS image).
Linux support for ARM big.LITTLE
Synergestic Processors (or on a GPU for that matter).
cores moves the complexity to another level above noncoherence.
PowerPC processor(s), and treat the SPEs as a type of coprocessor
or peripheral device.
Linux support for ARM big.LITTLE
Linux support for ARM big.LITTLE
Cache coherency is really just a crutch. Lots of embedded programmers and Cell programmers (no cache protocol on the SPE's) know how to work without it. :-)"
Linux support for ARM big.LITTLE
Non-coherent multi-CPU SOCs are also likely to not implement
atomic memory access primatives (i.e. Compare/Exchange, Test and Set,
Load-Linked/Store-Conditional, etc.)
Linux support for ARM big.LITTLE
It's very common for ARM systems to have disparate processors. My GP2X handheld is dual core system. It has a regular ARM processor and then a second processor for accelerating some types of multimedia functions.
This goes right back to the ARM's prehistory. The BBC microcomputer's 'Tube' coprocessor interface springs to mind, allowing you to plug in coprocessors with arbitrary buses, interfacing them to the host machine via a set of FIFOs. People tended to call the coprocessor 'the Tube' as well, which was a bit confusing given how variable the CPUs were that you could plug in there.
Linux support for ARM big.LITTLE