Brief items
The current development kernel is 3.3-rc3,
released on February 8. "
No big
surprises, which is just how I like it. About a third of the patches are in
ARM, but the bulk of that is due to the removal of the unused DMA map code
form the bcmring support. So no complaints."
Stable updates: 3.0.21, 3.2.6 and 2.6.32.57 were released on February 13
with a long list of important fixes.
For those still using the 2.6.27 kernel, 2.6.27.60 was released on February 12,
quickly followed
by 2.6.27.61 to fix the inevitable build
error. There's lots of fixes that have gone in since 2.6.27.59 was
released last April.
Comments (none posted)
-static inline void *kcalloc(size_t n, size_t size, gfp_t flags)
+static inline void *wtf_do_i_call_this(size_t n, size_t size, gfp_t flags)
{
if (size != 0 && n > ULONG_MAX / size)
return NULL;
- return __kmalloc(n * size, flags | __GFP_ZERO);
+ return __kmalloc(n * size, flags);
+}
+
+static inline void *kcalloc(size_t n, size_t size, gfp_t flags)
+{
+ return wtf_do_i_call_this(n, size, flags | __GFP_ZERO);
}
--
Andrew Morton
I was thrilled a year ago at last to discover who Virginia is,
celebrated in mm/memory.c and mm/page-writeback.c
--
Hugh Dickins
I'm sitting here wondering what you meant to type when you typed
"ftrace pouch". I'm stumped! But you're not allowed to tell us -
that would take all the fun out of it.
--
Andrew Morton
Comments (2 posted)
The
February
2012 issue of ;login: has
a
detailed overview of Btrfs [PDF] written by developer Josef Bacik.
"
Btrfs’s snapshotting is simple to use and understand. The snapshots
will show up as normal directories under the snapshotted directory, and you
can cd into it and walk around like in a normal directory. By default, all
snapshots are writeable in Btrfs, but you can create read-only snapshots if
you so choose. Read-only snapshots are great if you are just going to
take a snapshot for a backup and then delete it once the backup
completes. Writeable snapshots are handy because you can do things such as
snapshot your file system before performing a system update; if the update
breaks your system, you can reboot into the snapshot and use it like your
normal file system."
Comments (14 posted)
The
Lima driver project has released the
code for its open source graphics driver supporting the Mali-200 and Mali-400 GPUs. "
The aim of this driver is to finally bring all the advantages of open source software to ARM SoC graphics drivers. Currently, the sole availability of binary drivers is increasing development and maintenance overhead, while also reducing portability, compatibility and limiting choice. Anyone who has dealt with GPU support on ARM, be it for a linux with a GNU stack, or for an android, knows the pain of dealing with these binaries. Lima is going to solve this for you, but some time is needed still to get there." (Thanks to Paul Wise.)
Comments (9 posted)
Kernel development news
By Jake Edge
February 15, 2012
On the first day of this year's Android
Builders Summit, a panel was held to discuss the Android patches to the
Linux kernel, including the progress on getting them upstream. The panel
consisted of Zach Pfeffer from Linaro, Tim Bird from Sony, Arnd Bergmann of
IBM/Linaro, and Greg Kroah-Hartman of the Linux Foundation (LF), with LWN
executive editor Jonathan Corbet moderating. The overall feeling from the
panel was that things with the Android kernel patches were proceeding
apace—recent kernels can boot into an Android user space—but
there is still work to be done. While I could not attend ABS this year,
this report comes via
the magic of streaming video.
Each of the panelists introduced themselves and connected themselves to
Android in various ways. Pfeffer is the Android platform lead for Linaro
and is responsible for creating Android builds for each of the member
companies, Bird represents Sony as the architecture chair of the LF Consumer
Electronics working group, Bergmann has recently been working on the cleanup and consolidation of
the ARM subtree, and Kroah-Hartman maintains the staging tree where most of
the Android code currently lives.
Corbet kicked off the discussion by noting that he had recently looked at
the current Red Hat Enterprise Linux (RHEL) kernel, which includes more
than 7600 patches on top of the mainline kernel. He also
pointed out that the Fedora kernel carries a number of different patches
that aren't likely to go upstream anytime soon (utrace, in particular).
Given that, "why are we here?", why is Android (and its
patches) treated differently than other distributions' kernels, he asked.
Kroah-Hartman said that the Android patches are only about 7000 lines of
code and that some UART drivers are larger. But people care more about the
Android patches because device makers and others have to pull down all
those patches and apply them to get a mainline kernel working with
Android. It is different from enterprise kernels, he said, because there
are no real downstream users of the source code of those. Bergmann also
pointed out that much of the change in the RHEL or Fedora kernels was for
things like drivers which don't change the operating system core as some of
the Android patches do.
Pfeffer noted that in the past the kernel developers were seen as "the
rebel alliance", but that now the Android developers have assumed that
role to some extent. Bird pointed out that part of the problem is that due
to the success of Android, the kernels for board support packages (BSPs)
are being
built with Android kernels, rather than kernel.org kernels. That
essentially causes a split in the community.
The Android patches are largely in the mainline at this point (in staging),
Kroah-Hartman said, except for the wakelocks code. The 3.3-rc3 kernel can
boot an Android user space, but lacks the power management features that
wakelocks provide, so battery
life is poor. Bergmann said that he had seen a demo of Android running
on a mainline kernel, and there is "still a long way to go".
One area that
needs attention, Bergmann said, is the user-space interfaces of some of the
Android features. Those may
not be what the kernel developers want to support long-term, so they need
to be addressed before the Android patches make their way out of staging
and those interfaces become
part of the kernel ABI.
The PMEM patches, which provide a means to reserve contiguous memory and to
share buffers between the kernel and user space, was the next topic.
Corbet noted that PMEM had been in and out of staging twice, but had never
been merged. Since then, Android has moved on and is not using PMEM any
longer. So, was it the right move not to merge PMEM, he asked.
The panel seemed in agreement that it was right not to merge those patches,
with Pfeffer noting that they were simply an expedient to get products out
the door. As ARM matures, he said, common usage models will come about,
rather than various quick fixes. Kroah-Hartman pointed out that the memory
management kernel developers told the PMEM developers "how to do it right",
but that no
one ever did that work, which is "a problem that we've had forever". Bird
agreed with Pfeffer, saying that he is pushing to get things into the
mainline, but that if there is pushback, "that's fine", as there are
sometimes "quick and dirty" things done to ship products.
Corbet pursued that topic with a mention of the contiguous memory allocator (CMA) that is
being pushed by Samsung and is aimed at Android. But he noted that Android
itself has moved from PMEM to the ION memory
allocator, which duplicates some of the CMA functionality while adding
another user-space interface. What should be done about ION, he asked.
Android is not "pushing" ION, Pfeffer said, and if the kernel community
doesn't want it, it shouldn't take it. There was a need for a solution
that didn't exist in the mainline, so Android went ahead with ION. It may
not necessarily be easy to work with all of the Android kernel patches, he
said, because of the time pressures it is operating under. In the end, ION
could be handled much like PMEM was, he said.
But Bergmann pointed out that ION could be integrated with CMA and the DMA
buffer sharing work that are getting close to the mainline. ION and the
soon-to-exist mainline solutions are not mutually exclusive, he said.
Pfeffer said that the "entire room" doesn't really care which solution is
chosen, they just want something that has been integrated and tested. That
may be ION now, and something else down the road.
In a sense, this is a result of the longstanding conflict between the needs
of the embedded world vs. those of the rest of kernel, Corbet posited. But
Kroah-Hartman disagreed, saying that the enterprise distributions have the
same problems, just on a different timescale. Those distributions need to
ship, often well before the code is upstream. Embedded isn't really
different, they just need to get their code upstream as well, he said.
One place where it is different, though, is that in the embedded world
things may move from hardware to software at a relatively rapid pace,
Pfeffer said. That
means that a driver written for hardware JPEG decoding may really only be
needed for one six-month product cycle, so that driver really doesn't need
to go upstream. Because of that, Bergmann said, many of those drivers are
designed to stay out of the mainline. Bird echoed that, saying that
sometimes a "fatalistic approach" to kernel development is the pragmatic
choice. Even for long-lived code, if there is no hope that it can go
upstream, developers will just try to focus on making it maintainable out of
tree.
For a long time, the trajectory of Android code was heading away from the
mainline, Pfeffer said, but that has started to correct itself. Though the
kernel needs more maintainers so that code can get upstream faster, he
said. Bergmann agreed that more maintainers would help, but the work needs
to be done in a more organized fashion with an eye toward getting the
user-space interfaces right.
Bird said that he doesn't expect there to be a need for
another panel of this sort because the problem is solving itself at this
point. Kroah-Hartman more or less agreed, noting that the Android problems
are "nothing new" and that the kernel community has been solving these
kinds of problems for 20 years.
With luck, the Android developers won't be adding much more to the core,
Corbet said, but what about drivers? CyanogenMod is trying to get Ice
Cream Sandwich (Android 4.0) in the hands of its users, but is running into
problems with getting drivers from some vendors. What can be done to solve
that problem?
Kroah-Hartman noted that things are getting better in that department, but
that some companies care about getting their drivers into the mainline,
while others don't. The latter don't realize that it will save them money
in the long run. He is often talking with various companies, so if there
are specific instances of problem drivers, he wants to know about them.
But kernel drivers are not the whole problem, Pfeffer said. In the
graphics and video realms, the line between the kernel and user space has
been "blown away", he said. There is now firmware, kernel interfaces,
binary blobs, and user-space interfaces to do graphics which are all
released in lockstep, and often all as binary blobs, which is not a good
thing. It isn't an engineering issue, but a legal one, he said. Google's
engineers do not want binary blobs, he said, and have been pushing back on
vendors for things like the Nexus line of phones.
Bergmann also pointed to
the open source driver for the Mali GPU as
an indicator of the direction where things are headed. If the other
graphics vendors don't get their act together with respect to free drivers,
they will not survive, he said.
With that, the 45-minute slot had expired. The upshot seems to be that
mainline kernel support for Android is moving along reasonably well. It
won't be too long before it will be possible to run Android on a mainline
kernel while still maintaining some reasonable battery life. Beyond that,
though, the process is working more or less as it should. The out-of-tree
Android patches were just another in a long line of hurdles that the kernel
community has overcome.
Comments (none posted)
By Jonathan Corbet
February 15, 2012
Anybody who does low-level kernel programming for long enough learns that
the hardware is not their friend. Expecting the hardware to be nice is a
recipe for disaster; instead, one must treat the hardware as if it were a
clever and willful dog. With some effort, it can be made to perform
impressive tricks, but, given a moment of inattention, it will snag your
dinner from the grill and hide under the couch. The good news
is that Linux kernel developers understand the nature of their relationship
with the hardware and take great care not to trust it too far. Or, at
least, that is what we would like to think.
Consider this snippet of code from drivers/char/hpet.c:
do {
m = read_counter(&hpet->hpet_mc);
write_counter(t + m + hpetp->hp_delta, &timer->hpet_compare);
} while (i++, (m - start) < count);
Here, read_counter() is a thin macro wrapper around
readl(). The driver is writing to the timer compare register in a
loop, assuming that the "main counter" value read from the HPET will
eventually exceed the threshold value. Almost always, that is exactly what
happens. But if the HPET ever goes a little bit weird and stops returning
something meaningful when the main counter is read, the above code could
easily turn into an infinite loop. The kernel is trusting the hardware to
be rational, but the hardware may not choose to live up to that
expectation.
"Usbmuxd" is a daemon which facilitates communications with various Apple
iDevices. Recently, this
patch to usbmuxd was recognized to be a security fix for a bug
eventually designated as CVE-2012-0065. In short, this daemon would read a
serial
number string from the device and copy it into an internal array without
checking its length. Exploiting this vulnerability is not easy - it
requires the ability to plug in a USB device that has been designed to
overflow that particular buffer with something interesting. But it
is a vulnerability, and it is worth noting that an increasing number
of USB devices are really just Linux systems using the "USB gadget" code;
creating that malicious device would not be hard to do. So this
vulnerability could be interesting to the "leave a malicious USB stick in
the parking lot" school of attacker.
This bug, too, is the result of trusting the hardware. As seen here, the
hardware could be overtly evil. In other cases, it is just subject to
electrical wear, power spikes, cosmic rays, and the varying skills of those
who write the firmware - closed source which is never reviewed by anybody.
Even in a world where price pressures didn't mandate that each component
must cost as little as possible, hardware bugs would be a problem.
By now, the lesson
should be clear: driver developers should always regard their hardware with
extreme suspicion and take nothing for granted. The problem is that even
highly diligent developers (and reviewers) can easily let this kind of
bug slip by. In almost all cases, the driver appears to work just fine without the
extra sanity checks; the hardware plays along most of the time, after all,
until that especially inopportune moment arrives. Sometimes the developer
sees the resulting failure, resulting in that "oh, yeah, I have to
make sure that the hardware doesn't flake there" moment that is discouragingly
common in driver development. Other times, some far away user sees strange
problems and nobody really knows why.
What would be nice would a way for the computer to tell developers when
they are being overly trusting of the hardware; then it might be possible
to skip the "tracking down the weird problem" experience. As it happens,
such a way exists in the form of a static analysis tool called Carburizer,
developed by Asim Kadav, Matthew J. Renzelmann and Michael M. Swift. Those
wanting a lot of information about this tool can find it in this
one-page poster [PDF], this ACM
Symposium on Operating Systems Principles (SOSP) paper [PDF], or in this
rather over-the-top web site.
In short: Carburizer analyzes kernel code, looking for insufficiently
robust dealings with the hardware. Its key strength at the moment appears
to be the identification of possible infinite loops - loops whose exit
condition depends solely on a value obtained from the hardware. There are,
it seems,
over 1000
such loops in the 3.2.1 kernel. The tool also looks for cases where
unchecked values from hardware are used to index arrays or are used
directly as pointers, though the false-positive rate seems to be higher for
these checks. The result is an output file as linked above, from which
developers can go and investigate.
Naturally enough, the tool shows some signs of its academic origins. It is
written in Ocaml and requires some modifications to the kernel source tree before
it can be run. Carburizer also requires that multi-file drivers be merged
into one big file, with the result that the line numbers in the resulting
diagnostics do not correspond to the source tree everybody else has. That
may be part of why, despite a positive response to a
posting of the tool on kernel-janitors in January, 2011, little in the
way of actual fixes seems to have resulted. Or it may just be that, so
far, these results have only been seen by a relatively small group of
developers.
Interestingly, Carburizer can propose fixes of its own. These include
putting time limits into potentially infinite loops and adding bounds
checks to suspect array references. While it is at it, Carburizer fixes up
seemingly unnecessary panic() calls and adds logging code to
places where, it thinks, the driver neglects to report a hardware failure.
With a separate runtime module, it can even deal with stuck interrupts (the
driver is forced into a polling mode) and more. The resulting code has not
been posted for consideration, which is not entirely surprising; the fixes
are, necessarily, of a highly conservative "don't break the driver"
nature. Such fixes are almost certain not to be what a human would write
after looking at the code. But the tool is open source, so interested
developers can run it themselves to see what it would do.
Meanwhile, even without automatic fixes, these results seem worthy of some
attention. Computers can be far better than humans at finding many classes
of bugs; when computers have been used in that role, some types of bugs
have nearly disappeared from the kernel. Maybe someday we'll have a
version of Carburizer that can be folded into a tool like checkpatch; for
now, though, we'll have to look at its complaints about our code separately
and decide what action is needed.
Comments (11 posted)
February 15, 2012
This article was contributed by Nicolas Pitre
ARM Ltd recently announced the big.LITTLE architecture consisting of
a twist on the SMP systems that we've all gotten accustomed to. Instead of
having a bunch of identical CPU cores put
together in a system, the big.LITTLE architecture is effectively pushing
the concept further by pulling two different SMP systems together: one
being a set of "big" and fast processors, the other one consisting of
"little" and power-efficient processors.
In practice this means having a cluster of Cortex-A15 cores, a
cluster of Cortex-A7 cores, and ensuring cache coherency between
them. The advantage of such an arrangement is that it allows for
significant power
saving when processes that don't require the full performance of the
Cortex-A15 are executed on the Cortex-A7 instead. This way,
non-interactive background operation, or streaming multimedia decoding,
can be run on the A7 cluster for power efficiency, while sudden screen
refreshes and similar bursty operations can be run on the A15 cluster to
improve responsiveness and interactivity.
Then, how to support this in Linux? This is not as trivial as it may
seem initially. Let's suppose we have a system comprising a cluster of
four A15 cores and a cluster of four A7 cores. The naive approach would
suggest making the eight cores visible to the kernel and letting the
scheduler do its job just like with any other SMP system. But here's
the catch: SMP means Symmetric Multi-Processing, and in the big.LITTLE
case the cores aren't symmetric between clusters.
The Linux scheduler expects all available CPUs to have the same
performance characteristics. For example, there are provisions in the
scheduler to deal with things like hyperthreading, but this is still an
attribute which is normally available on all CPUs in a given system.
Here we're purposely putting together a couple of CPUs with significant
performance/power characteristic discrepancies in the same system, and
we expect the kernel to make the optimal usage of them at all times,
considering that we want to get the best user experience together with
the lowest possible battery consumption.
So, what should be done? Many questions come to mind:
- Is it OK to reserve the A15 cluster just for interactive tasks and the
A7 cluster for background tasks?
- What if the interactive tasks are sufficiently light to be processed by
the small cores at all times?
- What about those background tasks that the user interface is actually
waiting after?
- How to determine if a task using 100% CPU on a small core should be
migrated to a fast core instead, or left on the small core because
it is not critical enough to justify the increased power usage?
- Should the scheduler auto-tune its behavior, or should user-space
policies influence it?
- If the latter, what would the interface look like to be useful and
sufficiently future-proof?
Linaro started an initiative
during the most recent Linaro Connect to
investigate this problem. It will require a high degree of
collaboration with the upstream scheduler maintainers and a good amount
of discussion. And given past history, we know that scheduler changes
cannot happen overnight... unless your name is Ingo that is.
Therefore, it is safe to assume that this will take a significant amount
of time.
Silicon vendors and portable device makers are not going to wait though.
Chips implementing the big.LITTLE architecture will appear on the market
in one form or another, way before a full heterogeneous multi-processor
aware scheduler is available. An interim solution is therefore needed
soon. So let's put aside the scheduler for the time being.
ARM Ltd has produced a prototype software solution
consisting of a small hypervisor using the virtualization extensions of
the Cortex-A15 and Cortex-A7 to make both clusters appear to the
underlying operating system as if there was only one Cortex-A15 cluster.
Because the cores within a given cluster are still symmetric, all the
assumptions built into the current scheduler still hold. With a
single call, the hypervisor can atomically suspend execution of the
whole system, migrate the CPU states from one cluster to the other, and
resume system execution on the other cluster without the underlying
operating system being aware of the change; just as if nothing has
happened.
Taking the example above, Linux would see only four Cortex-A15 CPUs at all
times. When a switch is initiated, the registers for each of the 4 CPUs
in cluster A are transferred to corresponding CPUs in cluster B,
interrupts are rerouted to the CPUs in cluster B, then CPUs in cluster B are
resumed exactly where cluster A was interrupted, and, finally, the CPUs in
cluster A are powered off. And vice versa for switching back to the
original cluster. Therefore, if there are eight CPU cores in the system,
only four of them are visible to the operating system at all times. The
only visible difference is the observable execution speed, and of course
the corresponding change in power consumption when a cluster switch
occurs. Some latency is implied by the actual switch of course, but
that should be very small and imperceptible by the user.
This solution has advantages such as providing a mechanism which should
work for any operating system targeting a Cortex-A15 without
modifications to that operating system. It is therefore OS-independent
and easy to integrate. However, it brings a certain level of complexity
such as the need to virtualize all the differences between the A15 and
the A7. While those CPU cores are functionally equivalent, they may
differ in implementation details such as cache topology. That would force
every
cache maintenance operation to be trapped by the hypervisor and
translated into equivalent operations on the actual CPU core when the
running core is not the one that the operating system thinks is running.
Another disadvantage is the overhead of saving and restoring the full
CPU state because, by virtue of being OS-independent, the hypervisor
code may not know what part of the CPU is actually being actively used
by the OS. The hypervisor could trap everything to be able to know what
is being touched allowing partial context transfers, but that would be
yet more complexity for a dubious gain. After all, the kernel already
knows what is being used in the CPU, and it can deal with differing
cache topologies natively, etc. So why not implement this switcher
support directly in the kernel given that we can modify Linux and do
better?
In fact that's exactly what we are doing i.e. take the ARM Ltd BSD
licensed switcher code and use it as a reference to actually put the
switcher functionality directly in the kernel. This way, we can get
away with much less support from the hypervisor code and improve
switching performances by not having to trap any cache maintenance
instructions, by limiting the CPU context transfer only to the minimum
set of active registers, and by sharing the same address space with the
kernel.
We can implement this switcher by modeling its functionality as a CPU
speed change, and therefore expose it via a cpufreq driver. This
way, contrary to the reference code from ARM Ltd which is limited to a
whole cluster switch, we can easily pair each of the A15 cores with one
of the A7 cores, and have each of those CPU pairs appear as a single
pseudo CPU with the ability to change its performance level via cpufreq.
And because the cpufreq governors are already available and understood
by existing distributions, including Android, we therefore have a
straightforward solution with a fast time-to-market for the big.LITTLE
architecture that shouldn't cause any controversy.
Obviously the "switcher" as we call it is not replacing the ultimate
goal of exposing all the cores to the kernel and letting the scheduler
make the right decisions. But it is nevertheless a nice self-contained
interim solution that will allow pretty good usage of the big.LITTLE
architecture while removing the pressure to come up with scheduler
changes quickly.
Comments (60 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Memory management
Architecture-specific
Security-related
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>