Brief items
The 3.9 merge window is open so there is no current development
kernel. See the separate article below for a summary of changes merged
into the mainline for 3.9 so far.
Stable updates:
3.4.33 and 3.0.66 were released on February 21; they
are single-patch updates fixing a security issue in the printk()
code.
3.5.7.6 was released on February 22,
and 3.7.10 (the final planned 3.7 update)
was released on February 27.
As of this writing, the 3.8.1,
3.4.34,
and 3.0.67 updates are in the review
process; they can be expected on or after February 28.
Comments (2 posted)
Note that as of
5eaf563e53294d6696e651466697eb9d491f3946,
you can now mount filesystems as an unprivileged user after a call
to unshare(CLONE_NEWUSER | CLONE_NEWNS), or a similar clone(2)
call. This means all those random random filesystem bugs you have
laying around in the junk bin are now quite useful. ++tricks;
—
Jason A. Donenfeld
I suspect part of the problem is scale. Most people don't
understand the scale at which the Linux Kernel and vendors handle
bug fixes and code changes. External people simply see a few poorly
handled security related issues and probably think "well how hard
can it be to properly a few extra security flaws?" but they don't
see that those 5 security issues were buried in 10,000 other code
fixes. The resources needed to audit every code change for a
security impact simply aren't available (and even if we had enough
talented people who exactly is going to pay them all?).
—
Kurt Seifried
This naming alone would inhibit [BUG_ON()] use through two channels:
- Putting the word 'CRASH' into your code feels risky,
dissonant and wrong (perfect code does not crash) and thus
needs conscious frontal lobe effort to justify it - while
BUG_ON() really feels more like a harmless assert to most
kernel developers, which is in our muscle memory through
years training.
- CRASH_ON() takes one character more typing than WARN_ON(),
and we know good kernel developers are fundamentally lazy.
—
Ingo Molnar
Comments (19 posted)
Kernel development news
By Jonathan Corbet
February 27, 2013
As of this writing, just over 8,000 non-merge changesets have been pulled
into the mainline for the 3.9 development cycle — 7,600 since
last week's summary. Quite a few new features
of interest have been merged for the 3.9 kernel; the most significant of
those are listed below.
But first, a warning for development kernel testers: there are
reports of ext4 filesystem corruption with current mainline kernels. The
problem appears to have been identified and fixed, but it will
remain as a permanent hazard for anybody running bisections over the
3.9 merge window. Development kernels have not often lived up to their
fearsome reputation recently, but they can still bite at times.
- The ARM architecture has gained support for the KVM virtualization
mechanism on Cortex-A15 processors. Support for the ARM "power state
coordination interface" has been added so that virtual CPU's can be
"powered up" and down.
- The socket filtering mechanism has a new SO_LOCK_FILTER
option that prevents further changes to the filter. It is intended
for privileged programs that install a filter before running untrusted
code.
- TCP and UDP sockets have a new option, SO_REUSEPORT, that allows
multiple sockets listening for new connections or packets
(respectively) at the same time. See this
commit message for more information.
- The netfilter connection-tracking code now supports "connection
labels," which are bitmasks that can be attached to tracking entries and
tested by netfilter rules.
- The wireless networking subsystem has gained core support for the
detection of radar systems operating on the networking frequencies;
this is a necessary component for dynamic
frequency selection in the 5GHz range.
- VMware's "VM Sockets" subsystem, a mechanism for communication between
virtual machines and a hypervisor, has been merged. Also merged is
the "Virtual Machine Communication Interface" subsystem for high-speed
communication between the host and guests.
- The networking layer has support for the "Multiple VLAN Registration
Protocol" (MVRP), which facilitates communication about registered
virtual networks to switches.
- The block layer's handling of pages under writeback has been changed to address the
performance penalty imposed by the previous "stable pages" work.
- The PowerPC architecture supports a new set of transactional memory
instructions; at this time, only user-space support is provided (the
kernel does not use these instructions). See Documentation/powerpc/transactional_memory.txt
for more information.
- The Xen virtualization subsystem gained support for ACPI-based CPU and
memory hotplugging, though, in both cases, only the "add" operation is
supported currently.
- The ext4 filesystem now supports hole punching in block-mapped files.
- A long list of old network drivers has been deleted; these include the
venerable 3c501, 3c505, and 3c507 drivers, various Intel i825xx
drivers, parallel port-based drivers(!), and many more. It is
expected that these drivers will not be
missed, as many of them did not work all that well in the first
place. As Paul Gortmaker put
it: "You know things are not good when the Kconfig help text suggests
you make a cron job doing a ping every minute." The
long-unused "WAN router" subsystem has also been removed.
- New hardware support includes:
- Systems and processors:
NVIDIA Tegra114 SoCs,
the ARM "dummy virtual machine" (a minimal stub platform for
virtualization uses),
Prodrive PPA8548 AMC modules, and
Tensilica Diamond 233L Standard core Rev.C processors.
- Audio:
NVIDIA Tegra20 AC97 interfaces.
- Block:
Renesas R-Car SATA controllers and
Broadcom BCM2835 SD/MMC controllers.
- Graphics:
Marvell MMP display controllers,
Samsung LMS501KF03 LCD panels,
Himax HX-8357 LCD panels,
Austrian Microsystems AS3711 backlight controllers,
TI LCDC display controllers, and
NXP Semiconductors TDA998X HDMI encoders.
- Input:
Steelseries SRW-S1 steering wheel devices.
- Miscellaneous:
STMicroelectronics ST33 I2C TPM devices,
STMicroelectronics accelerometers, magnetometers, and gyroscopes,
InvenSense ITG3200 digital 3-axis gyroscopes,
Invensense MPU6050 gyroscope/accelerometer devices,
NVIDIA Tegra20/30 SoC serial controllers,
Comtrol RocketPort EXPRESS/INFINITY serial adapters,
PCI-Express non-transparent bridges,
Maxim MAX77686 and MAX8997 realtime clocks (RTCs),
TI LP8788 RTCs,
TI TPS80031/TPS80032 RTCs,
Epson RX-4581 RTCs,
ST-Ericsson Ux500 watchdogs,
Intel Lynxpoint GPIO controllers,
Atmel Timer Counter pulse-width modulators,
TI/National LP5521 and LP5523/55231 LED controllers,
Intel iSMT SMBus host controllers, and
Broadcom BCM2835 I2C controllers.
- Networking:
8devices USB2CAN interfaces and
Inside Secure microread NFC interfaces.
- USB:
SMSC USB3503 USB 2.0 hub controllers.
- Video4Linux:
SuperH VEU mem2mem video processors,
TI DM365 VPFE media controllers,
Montage Technology TS2020-based tuners,
Masterkit MA901 USB FM radios,
OmniVision OV9650/OV9652 sensors, and
Samsung S5C73M3 sensors.
- Staging graduations: the
Analog Devices ADXRS450/3 Digital Output Gyroscope SPI driver,
Analog Devices ADIS16400 inertial sensor driver,
Analog Devices ADIS16080/100 yaw rate gyroscope driver,
Kionix KXSD9 accelerometer driver,
TAOS TSL2560, TSL2561, TSL2562 and TSL2563 ambient light sensor
driver, and
OMAP direct rendering driver have been moved out of the staging
tree and into the mainline kernel.
Changes visible to kernel developers include:
- The netpoll mechanism now supports IPv6, allowing network consoles to
be run over IPv6 networks.
- Most drivers no longer depend on the EXPERIMENTAL
configuration option. So much code needed that option that it is
turned on almost universally, with the result that it does not
actually mean anything. So now it defaults to "yes," and it will soon
be removed entirely.
- The sound layer has a generic parser for Intel high definition audio
(HDA) codecs. Many drivers have been converted to use this parser,
resulting in the removal of a great deal of duplicated code.
- The __get_user_8() function is now available on 32-bit x86
systems; it will fetch a 64-bit quantity from user space.
- The module signing code has a few usability enhancements. The
sign-file utility has new options to specify which hash
algorithm to use or to simply provide the entire signature (which will
have been computed elsewhere). There is also a new
MODULE_SIG_ALL configuration option that controls whether
modules are automatically signed at modules_install time.
- The descriptor-based GPIO patch set
has been merged, with significant changes to how GPIO lines are
handled within the kernel.
- The new file_inode() helper should be used instead of the
traditional file->f_dentry->d_inode pointer chain.
The merge window should stay open through approximately March 5, though,
one assumes, the rate of change will drop off somewhat toward the end.
Next week's edition will summarize the changes that go in for the final
part of the 3.9 merge window.
Comments (7 posted)
By Jake Edge
February 27, 2013
The ARM big.LITTLE architecture has been the subject of a number of
LWN articles (here's another) and conference talks, as well as a fair amount of
code. A number of upcoming systems-on-chip (SoCs) will be using the
architecture, so some kind of near-term solution for Linux support is
needed. Linaro's Mathieu Poirier came to the 2013 Embedded
Linux Conference to describe that interim solution: the in-kernel switcher.
Two kinds of CPUs
Big.LITTLE incorporates architecturally similar CPUs that have different
power and performance characteristics. The similarity must consist of a
one-to-one mapping between instruction sets on the two CPUs, so that code
can "migrate seamlessly", Poirier said.
Identical CPUs are grouped into clusters.
The SoC he has been using
for testing consists of three Cortex-A7 CPUs (LITTLE: less performance,
less power consumption) in one cluster and two
Cortex-A15s (big) in the other. The SoC was deliberately chosen to have a different number
of processors in the clusters as a kind of worst case to catch any problems
that might arise from the asymmetry. Normally, one would want the same
number of processors in each cluster, he said.
The clusters are connected with a cache-coherent interconnect, which can
snoop the cache to keep it coherent between clusters. There is an
interrupt controller on the SoC that can route any interrupt from or to any
CPU. In addition, there is support in the SoC for I/O coherency that can be
used to keep
GPUs or other external processors cache-coherent, but that isn't needed for
Linaro's tests.
The idea behind big.LITTLE is to provide a balance between power
consumption and performance. The first idea was to run CPU-hungry tasks on
the A15s, and less hungry tasks on the A7s. Unfortunately, it is "hard to
predict the future", Poirier said, which made it difficult to make the
right decisions because there is no way to know what tasks are CPU
intensive ahead of time.
Two big.LITTLE approaches
That led Linaro to a two-pronged approach to solving the problem:
Heterogeneous Multi-Processing (HMP) and the In-Kernel Switcher (IKS).
The two projects are running in parallel and are both in the same kernel
tree. Not only that, but you can enable either on the kernel command line
or switch at run time via sysfs.
With HMP, all of the cores in the SoC can be used at the same time, but the
scheduler needs to be aware of the capabilities of the different processors
to make its decisions. It will lead to higher peak performance for some
workloads, Poirier said. HMP is being developed in the open, and anyone
can participate, which means it will take somewhat longer before it is
ready, he said.
IKS is meant to provide a "solution for now", he said, one that can be used
to build products with. The basic idea is that one A7 and one A15 are
coupled into a single virtual CPU. Each virtual CPU in the system will
then have the same capabilities, thus isolating the core kernel from the
asymmetry of big.LITTLE. That means much less code needs to change.
Only one of the two processors in a virtual CPU is active at any given
time, so the decision on which of the two to use can be made at the CPU
frequency (cpufreq) driver level. IKS was released to Linaro members in
December 2012, and is "providing pretty good results", Poirier said.
An alternate way to group the processors would be to put all the A15s
together and all the A7s into another group. That turned out to be too
coarse as it was "all or nothing" in terms of power and performance. There
was also a longer synchronization
period needed when switching between those groups. Instead, it made more sense
to integrate "vertically", pairing A7s with A15s.
For the test SoC, the "extra" A7 was powered off, leaving two virtual CPUs
to use. The processors are numbered (A15_0, A15_1, A7_0, A7_1) and then
paired up (i.e. {A15_0, A7_0}) into virtual CPUs; "it's not rocket
science", Poirier said. One processor in each
group is turned off, but only the cpufreq driver and the switching logic
need to know that there are more physical processors than virtual processors.
The virtual CPU presents a list of operating frequencies that encompass the
range of frequencies that both A7 and A15 can operate at. While the
numbers look like frequencies (ranging from 175MHz to 1200MHz in the
example he gave), they don't really need to be as they are essentially just
indexes
into a
table in the cpufreq driver. The driver maps those values to a real
operating point
for one of the two processors.
Switching CPUs
The cpufreq core is not aware of the big.LITTLE architecture, so the driver
does a good bit of work, Poirier said, but the code for making the
switching decision is simple. If the requested frequency can't be
supported by the current processor, switch to the other. That part is
eight lines of code, he said.
For example, if virtual CPU 0 is running on the A7 at 200MHz and a request
comes in to go to 1.2GHz, the driver recognizes that the A7 cannot support
that. In that case, it decides to power down the A7 (which is called the
outbound processor) and power up the A15 (inbound).
There is a synchronization process that happens as part of the transition so
that the inbound
processor can use the existing cache.
That process is
described in Poirier's slides
[PDF], starting at slide 17.
The outbound processor powers up the inbound and continues executing normal
kernel/user-space code until
it receives the "inbound alive" signal. After sending that signal, the
inbound processor initializes both the cluster and interconnect if it is
the first
in its cluster (i.e. the other processor of the same type, in the other
virtual CPU is powered down). It then waits for a signal from the outbound processor.
Once the outbound processor receives "inbound alive" signal, the blackout period
(i.e. time when no kernel or user code is running on the virtual CPU)
begins. The outbound processor
disables interrupts, migrates the interrupt signals to the inbound
processor, then saves the current CPU context. Once that's done, it
signals the inbound processor, which restores the context, enables
interrupts, and continues executing from where the outbound processor left
off. All of that is possible because the
instruction sets of the
two processors are identical.
As part of its cleanup, the outbound processor creates a new stack for
itself so that it won't interfere with the inbound. It then flushes the
local cache and checks to see if it is the last one standing in its
cluster; if so, it flushes the cluster cache and disables the
cache-coherent interconnect. It then
powers itself off.
There are some pieces missing from the picture that he painted, Poirier
said, including "vlocks" and other mutual
exclusion mechanisms to handle simultaneous desired cluster power
states. Also missing was discussion of the "early poke" mechanism as well
as code needed to track the CPU and cluster states.
Performance
One of Linaro's main targets is Android, so it used the interactive power
governor for its testing. Any governor will work, he said, but will need
to be tweaked.
A second threshold (hispeed_freq2) was added to the interactive
governor to delay going into "overdrive" on the A15 too quickly as those
are "very power hungry" states.
For testing, BBench was used. It gives a performance score based on how
fast web pages are loaded. That was run with audio playing in the
background. The goal was to get 90% of the performance of two A15s, while
using 60% of the power, which was achieved. Different governor parameters
gave 95%
performance with 65% of the power consumption.
It is important to note that tuning is definitely required—without it you
can do worse than the performance of two A7s. "If you don't tune, all
efforts are wasted", Poirier said. The interactive governor has 15-20
variables, but Linaro mainly concentrated on hispeed_load and
hispeed_freq (and the corresponding *2 parameters added
for handling
overdrive). The basic configuration had the virtual CPU run on the A7 until
the load reached 85%, when it would switch to the first six
(i.e. non-overdrive) frequencies on the A15. After 95% load, it would use
the two overdrive frequencies.
The upstreaming process has started, with the cluster power management code
getting "positive remarks" on the ARM Linux mailing list. The goal is to
upstream the code entirely, though some parts of it are only available to
Linaro members at the moment. The missing source will be made public once
a member ships a product using IKS. But, IKS is "just a stepping stone",
Poirier said, and "HMP will blow this out of the water". It may take a
while before HMP is ready, though, so IKS will be available in the meantime.
[ I would like to thank the Linux Foundation for travel assistance to attend ELC. ]
Comments (1 posted)
By Jonathan Corbet
February 27, 2013
The kernel does not run programs in Microsoft's
Portable
Executable (PE) format. So when
a
patch came along adding support for those binaries — not to run
programs, but to use them as a container for trusted keys — the reaction
was not entirely positive. In truth, the reaction was
sufficiently negative to be widely quoted
across the net. When one looks beyond the foul language, though, there are
some fundamental questions about how Linux should support the UEFI secure
boot mechanism and how much the kernel community needs to be concerned
about Microsoft's wishes in this area.
The work done at Red Hat, SUSE, the Linux Foundation, and elsewhere is sufficient
to enable a distributor
to ship a binary distribution that will boot on a secure-boot-enabled
system. Such distributions are often built so that they will only load
kernel modules that have been signed by a trusted key, normally the
distributor's own key. That restriction naturally causes problems for
companies that ship binary-only modules; such modules will not be loadable
into a secure-boot system. Many developers in the kernel community are not
overly concerned about this difficulty; many of them, being hostile to the
idea of binary-only modules in the first place, think this situation is
just fine. Distributors like Red Hat, though, are not so sanguine.
One solution, of course, would be for those distributors to just sign the
relevant binary modules directly. As Matthew Garrett points out, though, there
are a number of practical difficulties with this approach, including the
surprisingly difficult task of verifying the identity and trustworthiness of the
company shipping the module. There's also the little problem that
signing binary-only modules might make Red Hat look bad in various parts of
our community and give strength to those claiming that such modules have no
GPL compliance problems. So Red Hat would like to find a way to enable
proprietary modules to be loaded without touching them directly, allowing
the company to pretend not to be involved in the whole thing.
Red Hat's solution is to convince the kernel to trust any signing key that
has been signed by Microsoft. Binary module vendors could then go to
Microsoft to get
their own key signed and present it to the kernel as being trustworthy;
the kernel would then agree to load modules signed with this key. This
only works, of course, if the kernel already
trusts Microsoft's key, but that will be the case for all of the secure
boot solutions that exist thus far. There is one other little problem in
that the only thing Microsoft will sign is a PE binary. So Red Hat's
scheme requires that the vendor's key be packaged into a PE binary for
Microsoft to sign. Then the kernel will read the binary file, verify
Microsoft's signature, extract the new key, and add that key to the ring of
keys it trusts. Once that is done, the kernel will happily load modules
signed by the new key.
This solution seems almost certain not to find its way into the mainline
kernel. In retrospect, it is unsurprising that a significant patch that is
seen as simultaneously catering to the wishes of Microsoft and binary
module vendors would run into a bit of resistance. That is even more true
when there appear to be reasonable alternatives, such as either
(1) having Red Hat sign the modules directly, or (2) having Red
Hat sign the vendor keys with its own key. Such solutions are unpopular
because, as mentioned above, they reduce Red Hat's plausible deniability;
they also make revocation harder and almost
certainly require vendors to get a separate signature for each distribution
they wish to support.
Linus has made it clear that he is not
worried about those problems, though. Security, he says, should be in the
control of the users; it should not be a mechanism used to strengthen a big
company's control. So, rather than wiring Microsoft's approval further
into the kernel, he would rather that distributors encourage approaches
that educate users, improve their control, and which, he says, would
ultimately be more secure. Loading a module in this environment, he said,
would be a matter of getting the user to verify that the module is wanted
rather than verifying a signing key.
The other reason that this patch is running into resistance is that there
is widespread skepticism of the claim that the loading of unsigned modules
must be blocked in the first place. Proponents claim that module signing
(along with a whole set of other
restrictions) is needed to prevent Linux from being used as a way to
circumvent the secure boot mechanism and run compromised versions of
Windows. Microsoft, it is said, will happily blacklist the
Linux bootloader if Linux systems are seen as being a threat to Windows systems.
Rather than run that risk, Linux, while running under secure boot, must
prevent the running of arbitrary kernel code in any way. That includes
blocking the loading of unsigned kernel modules.
It seems that not all kernel developers are worried about this
possibility. Greg Kroah-Hartman asserted
that module signature verification is not mandated by UEFI. Ted Ts'o added that Microsoft would suffer public
relations damage and find itself under antitrust scrutiny if it were to act
to block Linux from booting. It also seems unlikely to some that an attacker could
rig a system to boot Linux, load a corrupted module, then chain-boot into a
corrupted Windows system without the user noticing.
For all of these reasons, a number of developers seem to feel that this
is a place where the kernel community should maybe push back rather than
letting Microsoft dictate the terms under which a system can boot on UEFI
hardware. But some of Red Hat's developers, in particular, seem to be
genuinely afraid of the prospect of a key revocation by Microsoft; Dave
Airlie put it this way:
Its a simple argument, MS can revoke our keys for whatever reason,
reducing the surface area of reasons for them to do so seems like a
good idea. Unless someone can read the mind of the MS guy that
arbitrarily decides this in 5 years time, or has some sort of
signed agreement, I tend towards protecting the users from having
their Linux not work anymore...
Others counter that, if Microsoft can revoke keys for any reason, there is
little to be done to protect the kernel in any case.
In the end, this does not appear to be an easy disagreement to resolve,
though some parts are easy enough: Linus has refused to accept the
key-loading patch, so it will not be merged. What may well happen is that the patch will drop out of
sight, but that distributors like Red Hat will quietly include it in their
kernels. That will keep this particular disagreement from returning to the
kernel development list, but it does little to resolve the larger question
of how much Linux developers should be driven by fear of Microsoft's power
as they work to support the UEFI secure boot mechanism.
Comments (42 posted)
Patches and updates
Kernel trees
Build system
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>