The ARM
big.LITTLE architecture
is an asymmetric multi-processor platform, with powerful and
power-hungry processors coupled with less-powerful (in both senses) CPUs using
the same instruction set. Big.LITTLE presents some challenges for the Linux scheduler. Paul McKenney gave a readout of the status of
big.LITTLE support at the ARM minisummit, which he really meant to serve as an
"advertisement" for
the scheduling micro-conference at the Linux Plumbers Conference that
started the next day.
The idea behind big.LITTLE is to do frequency and voltage scaling by other
means, he said. Because of limitations imposed by physics, there is a floor to
frequency and voltage scaling on any given processor, but that can be
worked around by adding another
processor with fewer transistors. That's what has been done with big.LITTLE.
There are basically two ways to expose the big.LITTLE system to Linux. The
first is to treat each pair as a single CPU, switching between them "almost
transparently". That has the advantage that
it requires almost no changes to the kernel and
applications don't know that anything has changed. But, there is a delay
involved in making the switch, which isn't taken into account by the power
management code, so the power savings aren't as large as they could be. In
addition, that approach requires paired CPUs (i.e. one of each size), but some
vendors are interested in having one little and many big CPUs in their
big.LITTLE systems.
The other way to handle big.LITTLE is to expose all of the processors to
Linux, so that the scheduler can choose where to run its tasks. That
requires more knowledge of the behavior of processes, so Paul Turner has a
patch set that gathers that kind of
information. Turner said
that the scheduler currently takes averages on a per-CPU basis, but when
processes move between CPUs, some information is lost. His changes cause
the load average to move with the processes, which will allow the scheduler
to make better decisions.
Turner's patches are on their third revision, and have been "baking on our
systems at Google" for a few months. There are no real to-dos outstanding,
he said. Peter Zijlstra said that he had wanted to merge the previous
revision, but that there was "some funky math" in the patches, which has
since been changed. Turner said that he measured a 3-4% performance
increase using the patches, which means we get "more accurate tracking at
lower cost". It seems likely that the patches will be merged soon.
McKenney said that Turner's patches have been adapted by Morten Rasmussen
to be used on
big.LITTLE systems. The measurements are used to try to determine where a
task should be run. Over time, though, the task's behavior can change, so
the scheduler checks to see if that has happened and if the placement still
makes sense. There are still questions about when "race to idle" versus
spreading tasks around makes the most sense, and there have been some
related discussions of that recently on the linux-kernel mailing list.
Currently, the CPU hotplug support is less than ideal for removing CPUs
that have gone idle. But Thomas Gleixner is reworking things to "make
hotplug suck less", McKenney said. For heavy workloads, the process of
offlining a processor can take multiple seconds. After Gleixner's rework,
that drops to 300ms for an order of magnitude decrease. Part of the
solution is to remove stop_machine() calls from the offlining
path. There are multiple reasons for making hotplug work better, McKenney
said, including improving read-copy update (RCU), reducing realtime
disruption, and providing a low-cost
way to clear things off of a CPU for a short time. He also noted that it
is not an ARM-only problem that is being solved here, as x86 suffers from
significant hotplug delays too.
The session finished up with a brief discussion of how to describe the
architecture of a big.LITTLE system to the kernel. Currently, each
platform has its own way of describing the processors and caches in its
header files, but a more general way, perhaps using device tree or some
kind of runtime detection mechanism, is desired.
(
Log in to post comments)