LWN.net Logo

Linux support for ARM big.LITTLE

Linux support for ARM big.LITTLE

Posted Feb 15, 2012 20:51 UTC (Wed) by phip (guest, #1715)
In reply to: Linux support for ARM big.LITTLE by jmorris42
Parent article: Linux support for ARM big.LITTLE

You need more than just shared DRAM to enable an MP OS. The CPUs need to be cache-coherent.

In a low-end multicore SOC with noncoherent CPUs, the DRAM is usually statically partitioned between the CPUs which run independant OS images. Any interprocessor communication is done through shared buffers that are uncached or else carefully fushed at the right times (similar to DMA)

-Phil


(Log in to post comments)

Linux support for ARM big.LITTLE

Posted Feb 15, 2012 21:55 UTC (Wed) by zlynx (subscriber, #2285) [Link]

Nah. There's no real reason the hardware has to be cache coherent. It is just a lot easier if it is.

For example, the OS could force a cache flush on both CPUs when migrating a task.

Threaded applications would have a tough time, but even that has been handled in the past. For example, the compiler and/or threading library could either track memory writes between locks so as to make sure those cache addresses are flushed, or it could use the big hammer of a full cache flush before unlock.

Cache coherency is really just a crutch. Lots of embedded programmers and Cell programmers (no cache protocol on the SPE's) know how to work without it. :-)

Linux support for ARM big.LITTLE

Posted Feb 15, 2012 22:31 UTC (Wed) by phip (guest, #1715) [Link]

Strictly speaking that's true, but it is cumbersome enough in practice
that nobody does it for any mainstream OS.

It's not just migrating processes & threads - any global operating
system data structures need to be synchronized with cache flushes,
memory ordering barriers, mutexes, etc. before and after each access.

If you want to use multiple noncoherent cores to run a general-purpose,
the best approach is to treat it as a cluster (with each CPU running
its own OS image).

Linux support for ARM big.LITTLE

Posted Feb 15, 2012 22:35 UTC (Wed) by phip (guest, #1715) [Link]

I don't know of anyone running a general-purpose OS on the Cell
Synergestic Processors (or on a GPU for that matter).

Having a different instruction set on the different processor
cores moves the complexity to another level above noncoherence.

The usual programming model is to run a general-purpose OS on the
PowerPC processor(s), and treat the SPEs as a type of coprocessor
or peripheral device.

Linux support for ARM big.LITTLE

Posted Feb 15, 2012 23:02 UTC (Wed) by zlynx (subscriber, #2285) [Link]

For the Cell SPEs or GPU shaders they probably aren't capable enough to bother with having the OS run them directly. I only brought them up because there's no automatic cache coherence with them. They read the data into cache (the local 256 KB). They write the data out. Both read and write are done explicitly.

No one runs a MP system of different instruction sets, true. It isn't impossible though. The OS would need to be built twice, once for each instruction set. It could share the data structures.

I wonder if Intel ever played around with this for an Itanium server? It seems that I remember Itanium once shared the same socket layout as a Xeon so this would have been possible on a hardware level.

Now, if you got very tricky and decided to require that all software be in an intermediate language, the OS could use LLVM, Java, .NET or whatever to compile the binary to whichever CPU was going to execute the process. That would really make core switching expensive! And you'd need some way to mark where task switches were allowed to happen, and maybe run the task up to the next switch point so you could change over to the same equivalent point in the other architecture.

A bit more realistic would be cores with the same base instruction set, but specialized functions on different cores. That could work really well. When the program hit an "Illegal Instruction" fault, the OS could inspect the instruction and find one of the system cores that supports it, then migrate the task there or emulate the instruction. Or launch a user-space helper to download a new core design off the internet and program the FPGA!. That would let programmers use specialized vector, GPU, audio or regular expression instructions without worrying about what cores to use.

Linux support for ARM big.LITTLE

Posted Feb 19, 2012 18:06 UTC (Sun) by alison (✭ supporter ✭, #63752) [Link]

zlynx comments:

"it could use the big hammer of a full cache flush before unlock.
Cache coherency is really just a crutch. Lots of embedded programmers and Cell programmers (no cache protocol on the SPE's) know how to work without it. :-)"

Full cache flush == embedded "Big Kernel Lock" equivalent?

-- Alison Chaiken, alchaiken@gmail.com

Linux support for ARM big.LITTLE

Posted Feb 19, 2012 21:51 UTC (Sun) by phip (guest, #1715) [Link]

Big Kernel Lock...

Hmm, that brings up another point I should have thought of earlier:
Non-coherent multi-CPU SOCs are also likely to not implement
atomic memory access primatives (i.e. Compare/Exchange, Test and Set,
Load-Linked/Store-Conditional, etc.)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds