LC-Asia: A big LITTLE MP update
LWN recently looked at the big.LITTLE switcher, which pairs fast and slow processors and uses the CPU frequency subsystem to switch between them. The switcher approach has the advantage of being relatively straightforward to get working, but it also has a disadvantage: only half of the CPUs in the system can be doing useful work at any given time. It also is not yet posted for review or merging into the mainline, though this posting is said to be planned for the near future, after products using this code begin to ship.
The alternative approach has gone by the name "big LITTLE MP". Rather than play CPU frequency governor games, big LITTLE MP aims to solve the problem directly by teaching the scheduler about the differences between processor types and how to distribute tasks between them. The big.LITTLE switcher patch touches almost no files outside of the ARM architecture subtree; the big LITTLE MP patch set, instead, is focused almost entirely on the core scheduler code. At Linaro Connect Asia, developers Vincent Guittot and Morten Rasmussen described the current state of the patch set and the plans for getting it merged in the (hopefully) not-too-distant future.
The big LITTLE MP patch set has recently seen a major refactoring effort.
The first version was strongly focused on the heterogeneous multiprocessing
(HMP) problem but, among other things, it is hard to get developers for the
rest of the kernel interested in HMP. So the new patch set aims to improve
scheduling results on all systems, even traditional SMP systems where all
CPUs are the same. There is a patch set that is in internal review and
available on the Linaro git server.
Some parts have been publicly posted recently; soon the rest should be more
widely circulated as well.
The new patches are working well; for almost all workloads, their performance is similar to that achieved with the old patch set. The patches were developed with a view toward simplicity: they affect a critical kernel path, so they must be both simple and fast. Some of the patches, fixes for the existing scheduler, have already been posted to the mailing lists. The rest try to augment the kernel's scheduler with three simple rules:
- Small tasks (those that only use small amounts of CPU time for brief
periods) are not worth the trouble to schedule in any sophisticated
way. Instead, they should just be packed onto a single, slow core
whenever they wake up, and kept there if at all possible.
- Load balancing should be concerned with the disposition of
long-running tasks only; it should simply pass over the small tasks.
- Long-running tasks are best placed on the faster cores.
Implementing these policies requires a set of a half-dozen patches. One of them is the "small-task packing" patch that was covered here in October, 2012. Another works to expand the use of per-entity load tracking (which is currently only enabled when control groups and the CPU controller are being used) so that the per-task load values are always available to the scheduler. A further patch ensures that the "LB_MIN" scheduler feature is turned on; LB_MIN (which defaults to "off" in mainline kernels) causes the load balancer to pass over small tasks when working to redistribute the computing load on the system, essentially implementing the second policy objective above.
After that, the patch set augments the scheduler with the concept of the "capacity" of each CPU; the unloaded capacity is essentially the clock speed of the processor. The load balancer is tweaked to migrate processes to the CPU with the largest available capacity. This task is complicated by the fact that a CPU's capacity may not be a constant value; realtime scheduling, in particular, can "steal" capacity away from a CPU to give to realtime-priority tasks. Scheduler domains also need to be tuned for the big.LITTLE environment with an eye toward reducing the periodic load balancing work that needs to be done.
The final piece is not yet complete; it is called "scheduling invariance." Currently, the "load" put on the system by a process is a function of the amount of time that process spends running on the CPU. But if some CPUs are faster than others, the same process could end up with radically different load values depending on which CPU it is actually running on. That is suboptimal; the actual amount of work the process needs to do is the same in either case, and varying load values can cause the scheduler to make poor decisions. For now, the problem is likely to be solved by scaling the scheduler's load calculations by a constant value associated with each processor. Processes running on a CPU that is ten times faster than another will accumulate load ten times more quickly.
Even then, the load calculations are not perfect for the HMP scheduling problem because they are scaled by the process's priority. A high-priority task that runs briefly can look like it is generating as much load as a low-priority task that runs for long periods, but the scheduler may want to place those processes in different ways. The best solution to this problem is not yet clear.
A question from the audience had to do with testing: how were the developers testing their scheduling decisions? In particular, was the Linsched testing framework being used? The answer is that no, Linsched is not being used. It has not seen much development work since it was posted for the 3.3 kernel, so it does not work with current kernels. Perhaps more importantly, its task representation is relatively simple; it is hard to present it with something resembling a real-world Android workload. It is easier, in the end, to simply monitor a real kernel with an actual Android workload and see how well it performs.
The plan seems to be to post a new set of big LITTLE MP patches in the near future with an eye toward getting them upstream. The developers are a little concerned about that; getting reviewer attention for these patches has proved to be difficult thus far. Perhaps persistence and a more general focus will help them to get over that obstruction, clearing the way for proper scheduling on heterogeneous multiprocessor systems in the not-too-distant future.
[Your editor would like to thank Linaro for travel assistance to attend
this event.]
| Index entries for this article | |
|---|---|
| Kernel | Architectures/Arm |
| Kernel | big.LITTLE |
| Conference | Linaro Connect/2013 |
