Linux support for ARM big.LITTLE

Posted Feb 15, 2012 15:02 UTC (Wed) by tajyrink (subscriber, #2750)
In reply to: Linux support for ARM big.LITTLE by SEJeff
Parent article: Linux support for ARM big.LITTLE

I think it's NVIDIA's own approach to the same thing.

Linux support for ARM big.LITTLE

Posted Feb 15, 2012 15:08 UTC (Wed) by SEJeff (guest, #51588) [Link] (14 responses)

Right. Nvidia had the first shipping implementation vs ARM. I wonder if they got the idea from ARM, or ARM got the idea from the Tegra 3.

Linux support for ARM big.LITTLE

Posted Feb 15, 2012 15:21 UTC (Wed) by gioele (subscriber, #61675) [Link]

> I wonder if they got the idea from ARM, or ARM got the idea from the Tegra 3.

The idea _per se_ is not new at all, see the asymmetric multi-CPUs [1] in the '70s or the Cell architecture in 2000 [2].

[1] https://en.wikipedia.org/wiki/Asymmetric_multiprocessing
[2] https://en.wikipedia.org/wiki/Cell_%28microprocessor%29

Linux support for ARM big.LITTLE

Posted Feb 15, 2012 15:33 UTC (Wed) by hamjudo (guest, #363) [Link]

Anybody who programmed a CDC Cyber 6600 thought, wouldn't it be great if we could make a system where the little processors ran the same instruction set as the big processors?

Wikipedia says the Cyber came out in 1964. There are many earlier examples.

Linux support for ARM big.LITTLE

Posted Feb 15, 2012 16:50 UTC (Wed) by drag (guest, #31333) [Link] (10 responses)

It's very common for ARM systems to have disparate processors. My GP2X handheld is dual core system. It has a regular ARM processor and then a second processor for accelerating some types of multimedia functions.

Then, of course, there is modern x86-64 systems were the GPU is a now a coprocessor that can be used for anything, rather then something dedicated just for graphics.

I don't know how common it is to have disparate general purpose processors, however. With my GP2X the application had to be optimized to take advantage of the second processor.

Linux support for ARM big.LITTLE

Posted Feb 15, 2012 17:04 UTC (Wed) by jzbiciak (guest, #5246) [Link] (8 responses)

Right, but IIRC, the OS runs as a single processor OS, and the second CPU is treated as more like a peripheral. You write your video / graphics processing code for the second ARM and then it becomes an application specific accelerator, not too much different than dedicated hardware, but tuned for a particular app.

Heck, even desktop PCs have been asymmetric multiprocessor since their introduction (8038 in the keyboard controller, 8088 running the apps), but the OS really only thinks about the main CPU.

The A7/A15 split is rather different: This wants to run SMP Linux across both types of cores, dynamically moving tasks between the A7 and A15 side of the world seamlessly. All of the processor cores are considered generic resources, just with different performance/power envelopes. That's rather different from the GP2X model.

Linux support for ARM big.LITTLE

Posted Feb 15, 2012 18:16 UTC (Wed) by jmorris42 (guest, #2203) [Link] (7 responses)

> Right, but IIRC, the OS runs as a single processor OS, and the
> second CPU is treated as more like a peripheral.

True but that is because of reasons above the silicon. In a typical bottom of the line Android phone you already have multiple processing cores. Mine has two ARM cores, two DSPs plus the GPU. One ARM and one DSP are dedicated to the radio to keep that side a silo but the processors can talk to each other and share one block of RAM. So there isn't a techinical reason an OS couldn't be written for that old hardware that unified all five processing elements and dispatched executable code between the ARM cores as needed. It just wouldn't be a phone anymore.

The 'innovation' here is deciding to encourage the rediscovery of asymetrical multiprocessing and to relearn and update the ways to deal with the problems it brings. There was a reason everyone moved to SMP, it made the software a lot easier; now power consumption drives everything and software is already very complex so the balance shifts.

Then they will stick in an even slower CPU for the radio in chips destined for phone use and wall it off in the bootloader just like now. It is the only way to keep the lawyers (and the FCC) happy.

Linux support for ARM big.LITTLE

Posted Feb 15, 2012 20:51 UTC (Wed) by phip (guest, #1715) [Link] (6 responses)

You need more than just shared DRAM to enable an MP OS. The CPUs need to be cache-coherent.

In a low-end multicore SOC with noncoherent CPUs, the DRAM is usually statically partitioned between the CPUs which run independant OS images. Any interprocessor communication is done through shared buffers that are uncached or else carefully fushed at the right times (similar to DMA)

-Phil

Linux support for ARM big.LITTLE

Posted Feb 15, 2012 21:55 UTC (Wed) by zlynx (guest, #2285) [Link] (5 responses)

Nah. There's no real reason the hardware has to be cache coherent. It is just a lot easier if it is.

For example, the OS could force a cache flush on both CPUs when migrating a task.

Threaded applications would have a tough time, but even that has been handled in the past. For example, the compiler and/or threading library could either track memory writes between locks so as to make sure those cache addresses are flushed, or it could use the big hammer of a full cache flush before unlock.

Cache coherency is really just a crutch. Lots of embedded programmers and Cell programmers (no cache protocol on the SPE's) know how to work without it. :-)

Linux support for ARM big.LITTLE

Posted Feb 15, 2012 22:31 UTC (Wed) by phip (guest, #1715) [Link] (2 responses)

Strictly speaking that's true, but it is cumbersome enough in practice
that nobody does it for any mainstream OS.

It's not just migrating processes & threads - any global operating
system data structures need to be synchronized with cache flushes,
memory ordering barriers, mutexes, etc. before and after each access.

If you want to use multiple noncoherent cores to run a general-purpose,
the best approach is to treat it as a cluster (with each CPU running
its own OS image).

Linux support for ARM big.LITTLE

Posted Feb 15, 2012 22:35 UTC (Wed) by phip (guest, #1715) [Link] (1 responses)

I don't know of anyone running a general-purpose OS on the Cell
Synergestic Processors (or on a GPU for that matter).

Having a different instruction set on the different processor
cores moves the complexity to another level above noncoherence.

The usual programming model is to run a general-purpose OS on the
PowerPC processor(s), and treat the SPEs as a type of coprocessor
or peripheral device.

Linux support for ARM big.LITTLE

Posted Feb 15, 2012 23:02 UTC (Wed) by zlynx (guest, #2285) [Link]

For the Cell SPEs or GPU shaders they probably aren't capable enough to bother with having the OS run them directly. I only brought them up because there's no automatic cache coherence with them. They read the data into cache (the local 256 KB). They write the data out. Both read and write are done explicitly.

No one runs a MP system of different instruction sets, true. It isn't impossible though. The OS would need to be built twice, once for each instruction set. It could share the data structures.

I wonder if Intel ever played around with this for an Itanium server? It seems that I remember Itanium once shared the same socket layout as a Xeon so this would have been possible on a hardware level.

Now, if you got very tricky and decided to require that all software be in an intermediate language, the OS could use LLVM, Java, .NET or whatever to compile the binary to whichever CPU was going to execute the process. That would really make core switching expensive! And you'd need some way to mark where task switches were allowed to happen, and maybe run the task up to the next switch point so you could change over to the same equivalent point in the other architecture.

A bit more realistic would be cores with the same base instruction set, but specialized functions on different cores. That could work really well. When the program hit an "Illegal Instruction" fault, the OS could inspect the instruction and find one of the system cores that supports it, then migrate the task there or emulate the instruction. Or launch a user-space helper to download a new core design off the internet and program the FPGA!. That would let programmers use specialized vector, GPU, audio or regular expression instructions without worrying about what cores to use.

Linux support for ARM big.LITTLE

Posted Feb 19, 2012 18:06 UTC (Sun) by alison (subscriber, #63752) [Link] (1 responses)

zlynx comments:

"it could use the big hammer of a full cache flush before unlock.
Cache coherency is really just a crutch. Lots of embedded programmers and Cell programmers (no cache protocol on the SPE's) know how to work without it. :-)"

Full cache flush == embedded "Big Kernel Lock" equivalent?

-- Alison Chaiken, alchaiken@gmail.com

Linux support for ARM big.LITTLE

Posted Feb 19, 2012 21:51 UTC (Sun) by phip (guest, #1715) [Link]

Big Kernel Lock...

Hmm, that brings up another point I should have thought of earlier:
Non-coherent multi-CPU SOCs are also likely to not implement
atomic memory access primatives (i.e. Compare/Exchange, Test and Set,
Load-Linked/Store-Conditional, etc.)

Linux support for ARM big.LITTLE

Posted Feb 15, 2012 17:27 UTC (Wed) by nix (subscriber, #2304) [Link]

It's very common for ARM systems to have disparate processors. My GP2X handheld is dual core system. It has a regular ARM processor and then a second processor for accelerating some types of multimedia functions.

This goes right back to the ARM's prehistory. The BBC microcomputer's 'Tube' coprocessor interface springs to mind, allowing you to plug in coprocessors with arbitrary buses, interfacing them to the host machine via a set of FIFOs. People tended to call the coprocessor 'the Tube' as well, which was a bit confusing given how variable the CPUs were that you could plug in there.

Linux support for ARM big.LITTLE

Posted Feb 23, 2012 19:41 UTC (Thu) by wmf (guest, #33791) [Link]

I think the canonical reference here is "Single-ISA Heterogeneous Multi-Core Architectures" from MICRO 2003: http://www.microarch.org/micro36/html/pdf/kumar-SingleISA...