Brief items
The 3.12 merge window is still open, so there is no development
kernel as of this writing.
Stable updates:
3.10.11, 3.4.61, and 3.0.95 were all released on September 7;
3.2.51 came out on September 11.
Comments (none posted)
Dropping the spinlocks means more cores; unfortunately, a quad-core
seems to be the limit. Users must divide their time between reading
history and contributing to the present: some amount of persistent
data is a must on every user's machine. Pixel seems to be heading
in the wrong direction: that's what is stressing us out.
— Somebody seems to have unleashed
a robot
on linux-kernel.
Let's see if I can remember the candidates...
rcu_is_cpu_idle() # reversed sense from the others
rcu_is_ignored() # reversed sense from the others
rcu_is_not_active() # reversed sense from the others
rcu_is_watching_cpu()
rcu_read_check()
rcu_is_active()
rcu_is_active_local()
rcu_is_online()
rcu_is_watching_task()
rcu_is_watching_thread()
rcu_is_watching_you()
all_your_base_are_belong_to_rcu()
rcu_is_active_loco()
rcu_kilroy_was_here()
Maybe I should just lock them all in a room overnight and see which
are still alive in the morning.
—
Paul McKenney struggles with naming
Comments (1 posted)
Kernel development news
By Jonathan Corbet
September 11, 2013
As of this writing, nearly 8,500 non-merge changesets have been pulled into
the mainline repository for the 3.12 development cycle; almost 5,000 of
those have been pulled since
last week's
summary. The process was
slowed
somewhat when Linus's primary disk drive failed, but not even hardware
failure can stop the kernel process for long.
This development cycle continues to feature a large range of internal
improvements and relatively few exciting new features. Some of the
user-visible changes that have been merged include:
- The direct rendering graphics layer has gained the concept of "render
nodes," which separate the rendering of graphics from modesetting and
other display control; the "big three" graphics drivers all support
this concept. See this
post from David Herrmann for more information on where this work
is going.
- The netfilter subsystem supports a new "SYNPROXY" target that
simulates connection establishment on one side of the firewall before
actually establishing the connection on the other. It can be thought
of as a way of implementing SYN cookies at the perimeter, preventing
spurious connection attempts from traversing the firewall.
- The TSO sizing patches and FQ
scheduler have been merged. TSO sizing helps to eliminate bursty
traffic when TCP segmentation offload is being used, while FQ provides
a simple fair-queuing discipline for traffic transiting through the
system.
- The ext3 filesystem has a new journal_path= mount option that
allows the specification of an external journal's location using a
device path name.
- The Tile architecture has gained support for ftrace, kprobes, and full
kernel preemption. Also, support for the old TILE64 CPU has been
removed.
- The xfs filesystem is finally able to support user namespaces. The
addition of this support should make it easier for distributors to
enable the user namespace feature, should they feel at ease with the
security implications of such a move.
- Mainline support for ARM "big.LITTLE" systems is getting closer; 3.12
will include a new cpuidle driver that builds on the multi-cluster power management patches to
provide CPU idle support on big.LITTLE systems.
- The MD RAID5 implementation is now multithreaded, increasing its
maximum I/O rates when dealing with fast drives.
- The device mapper has a new statistics module that can track I/O
activity over a range of blocks on a DM device. See Documentation/device-mapper/statistics.txt
for details.
- The device tree code now feeds the entire flattened device tree text
into the random number pool in an attempt to increase the amount of
entropy available at early boot. It is not clear at this point how
much benefit is gained, since device trees are mostly or entirely
identical for a given class of device. It is possible for a device
tree to hold unique data — network MAC addresses, for example — but
that is not guaranteed, and some developers think that entropy would
be better served by just feeding the unique data directly.
- New hardware support includes:
- Systems and processors:
Freescale P1023 RDB and C293PCIE boards.
- Graphics:
Qualcomm MSM/Snapdragon GPUs.
The nouveau graphics driver has also gained proper power
management support, and the power management support for Radeon
devices has been improved and extended to a wider range of chips.
- Miscellaneous:
GPIO-controlled backlights,
Sanyo LV5207LP backlight controllers,
Rohm BD6107 backlight controllers,
IdeaPad laptop slidebars,
Toumaz Xenif TZ1090 GPIO controllers,
Kontron ETX/COMexpress GPIO controllers,
Fintek F71882FG and F71889F GPIO controllers,
Dialog Semiconductor DA9063 PMICs,
Samsung S2MPS11 crystal oscillator clocks,
Hisilicon K3 DMA controllers,
Renesas R-Car HPB DMA controllers, and
TI BQ24190 and TWL4030 battery charger controllers.
- Networking:
MOXA ART (RTL8201CP) Ethernet interfaces,
Solarflare SFC9100 interfaces, and
CoreChip-sz SR9700-based Ethernet devices.
- Video4Linux:
Renesas VSP1 video processing engines,
Renesas R-Car video input devices,
Mirics MSi3101 software-defined radio dongles (the first SDR
device supported by the mainline kernel),
Syntek STK1135 USB cameras,
Analog Devices ADV7842 video decoders, and
Analog Devices ADV7511 video encoders.
Changes visible to kernel developers include:
- The GEM and TTM memory managers within the graphics subsystem are now
using a unified subsystem for the management of virtual memory areas,
eliminating some duplicated functionality.
- The new lockref mechanism can now mark
a reference-counted item as being "dead." The separate state is
needed because lockrefs can be used in places (like the dentry cache)
where an item can have a reference count of zero and still be alive
and usable. Once the structure has been marked as dead, though, the
reference count cannot be incremented and the structure cannot be used.
The closing of the merge window still looks to happen on September 15, or,
perhaps, one day later to allow Linus to get back up to speed after his
planned weekend diving experience.
Comments (7 posted)
By Jake Edge
September 11, 2013
The reporting and handling of security issues is a tricky proposition.
There are numerous competing interests to try to balance, and a general
tendency toward secrecy that can complicate things further. Thus it is not
surprising that kernel developers are discussing security handling on the
Kernel
Summit discussion mailing list (ksummit-2013-discuss).
It seems likely that discussion will pick up again at the summit itself,
which will be held in Edinburgh, October 23-25.
James Bottomley kicked off the discussion by noting
that several recent fixes had gone into the kernel without following the
normal process because they were "security fixes". Given that some of
those fixes caused problems of
various sorts, he is concerned about circumventing the process simply
because the patches fix security issues:
In both cases we had commits with cryptic messages, little explanation
and practically no review all in the name of security.
Our core processes for accepting code require transparency, review and
testing. Secrecy in getting code into the kernel is therefore
fundamentally breaking this and risking the kinds of problems we see in
each of the instances.
Bottomley would like to explore whether security vulnerabilities need to be
handled in secret at all. Given that he thinks that may not be
popular, looking into what can be done to inject more transparency into the
process would be a reasonable alternative.
Part of his theory is that "security people" who "love
secrecy" are running the vulnerability-handling process.
For example, the closed kernel security mailing list (security@kernel.org)
is either made up of "security officers" (according to
Documentation/SecurityBugs) or "'normal' kernel
developers" (according
to Greg Kroah-Hartman). There is no inherent interest in secrecy by
the participants on that list,
Kroah-Hartman said, though he did agree that posting a list of the members
of security@kernel.org—which has not yet happened—would help to make things
more transparent. The relationship
between the kernel security list and the linux-distros mailing list (a
closed list
for distribution security concerns—the successor to vendor-sec) is also a
bit murky, which could use some clearing up, Bottomley said.
A big part of the problem is that there are a few different constituencies to
try to satisfy, including
distributions (some of which, like enterprise distributions, may have
additional needs or wants), users (most of whom get their kernel from a
distributor or device maker), security researchers (who sometimes like to
make a big splash with their findings), and so on. While it might be tempting
to dismiss the security researchers as perpetrators of what Linus Torvalds
likes to call "the security circus", it is important to include them. They
are often the ones who find vulnerabilities; annoying them often results in
them failing to report what they find, sadly.
Secrecy in vulnerability handling may be important to the enterprise
distributions for other reasons, as Stephen Hemminger said.
Security vulnerabilities and response time are often used as a "sales" tool
in those markets, so that may lead to a push for more secrecy:
It seems to me that the secrecy is more about avoiding sensationalist
news reports that might provide FUD to competitors.
For the enterprise products this kind of FUD might impact buying
decisions and even the financial markets.
Torvalds's practice of hiding
the security implications of patches also plays a role here. He wants to
mask vulnerabilities so that "black hats" cannot easily grep
them from commit logs, but as James Morris pointed
out, that's not really effective: "The cryptic / silent fixes are
really only helping the bad guys. They are watching these commits and
doing security analysis on them."
It seems unlikely (though perhaps not completely impossible) that Torvalds would
change his mind on the issue, so various ideas on collecting known
security information correlated with the commit(s) that fixed them were
batted around. Clearly, some information about security implications only
comes to light after the
commit has been made—sometimes long after—so there is a need to collect it
separately in any case.
Kees Cook described
some of the information that could be collected, while Andy Lutomirski expanded
on the idea by suggesting separate CVE files stored in the kernel tree.
The idea
seemed fairly popular; others
chimed in with suggestions for collaborating with Debian and/or the
linux-distros mailing
list participants.
In a separate sub-thread, Lutomirski created
a template for how the information could be stored. Cook concurred
and suggested that the files could live under Documentation/CVEs
or something similar. It is clear that there is an interest in having more
data available on security vulnerabilities and fixes in the kernel, so
that could lead to a lively discussion in October.
Some seem to have already started down the path of more openness in the
security reporting realm.
Lutomirski recently publicly posted a fix that was
clearly marked as a security fix from the outset. Cook did much the same
with a list of vulnerabilities in the kernel's human
interface device (HID) code. Exploiting the HID bugs requires physical access and
specialized devices, but that may be part of the threat model for certain
users. These aren't the first reports of this kind;
others have been made from time to time. In fact, certain subsystems
(networking, in particular) essentially never use the closed list and
prefer to work on security problems and fixes in the open.
An even more recent example comes from Wannes Rombouts's report of a networking security hole (use
after free), which was
referred to the netdev mailing list by security@kernel.org.
The implications of the bug were not completely clear (either to Rombouts or to
Hemminger, who replied), but Ben Hutchings
recognized that user namespaces could make
the problem more widespread (when and if they are enabled in most kernels
anyway). Though it is networking related—thus the referral to netdev,
presumably—this is the kind of vulnerability that could have been handled behind
closed doors. But because it was posted to an open list, the full implications
of the problem were discovered. In addition, for this bug (as well as for
Lutomirski's and Cook's
bugs), those affected have the ability to find out about the problems and
either patch their kernels or otherwise mitigate the problem. And
that is another advantage of openness.
Comments (12 posted)
By Jonathan Corbet
September 11, 2013
Most of the hand-wringing over the UEFI secure boot mechanism has long
passed; those who want to run Linux on systems with secure boot enabled
are, for the most part, able to do so. Things are quiet enough that one
might be tempted to believe that the problem is entirely solved. As it
happens, though, the core patches that implement the lockdown that some
developers think is necessary for proper secure boot support still have not
made their way into the mainline. The developer behind that work is still
trying to get it merged though; in the process, he has brought back an old
idea that was last rejected in 1998.
By Matthew Garrett's reading of the secure boot requirements, a system
running in secure boot mode must not allow any user to change the
running kernel; not even root is empowered to do so. Just over one year
ago, Matthew posted a set of patches that
implemented the necessary restrictions. In secure boot mode (as defined by
the absence of a new capability called, at that time,
CAP_SECURE_FIRMWARE), the kernel would not allow the loading of
unsigned kernel modules, direct access to I/O ports or I/O memory, or,
most controversially, use of the kexec_load() system call to
reboot directly into a new kernel. As one might expect, not everybody
liked this type of restriction, which flies in the face of the longstanding
Unix tradition of giving root enough rope to shoot itself in the foot.
So there were discussions around various aspects of these patches, but one of
the biggest problems only came to light later. It seems that there is a
fundamental flaw in the capability model: it is nearly impossible to add
new capability bits without risking problems with applications that do not
know about the new bits. In particular:
- Some capability-aware applications work by turning off every
capability that they do not think they need. If a new bit is added
controlling functionality that such an application uses, it will
unknowingly disable a necessary capability and cease to work properly.
From the point of view of users of this application, this kind of
change constitutes an incompatible ABI change.
- Other applications work in a blacklist-oriented mode, turning off
capabilities that are known not to be needed. In essence, such an
application simply sets the capability mask to zero, then sets the
bits corresponding to the capabilities it wants. If some sort of
security-related functionality is put behind a new bit that is unknown
to this kind of application, that application will leave the
capability enabled. That, in turn, could make the application
insecure.
In this case, the biggest risk is that whitelist-style applications would
inadvertently turn off CAP_SECURE_FIRMWARE, essentially putting
themselves into secure boot mode even if the system as a whole is not
running in that mode. That could cause things to break in mysterious ways.
What it comes down to is that, if one is designing a capability-based
system, one really must come up with the full list of needed capabilities
at the outset. Back in 1998, when capabilities for Linux were being hashed
out, nobody had UEFI secure boot in mind. So there is no relevant
capability bit available, and adding one now is not really an option.
More recently, Matthew posted a new patch
set that eliminates the new capability. Instead, all of the secure
boot restrictions were tied to the existing flag controlling whether
unsigned kernel
modules can be loaded. Matthew's reasoning was that the restriction on
module loading exists to prevent the loading of arbitrary code into the
running kernel, so it made sense to lock down any other functionality that
might make it possible to evade that restriction. Other developers
disagreed, though, saying that they needed the ability to restrict module
loading while still allowing other functionality — kexec_load() in
particular — to be used normally. After some discussion, Matthew backed
down and withdrew the patches.
Eventually he came back with what he called his
final attempt at providing a kernel lockdown facility that wasn't tied
to the secure boot mechanism itself. This time around, we have a new
sysfs file at /sys/kernel/security/securelevel that accepts any of
three values. If it is set to zero (the default), everything works as it
always has, with no new restrictions. Setting it to one invokes "secure
mode," in which all of the restrictions related to secure boot go into
effect. Secure mode is also irrevocable; once it has been enabled, it
cannot be disabled (short of compromising the kernel, at which point the
battle is already lost). There is also an interesting "permanently
insecure" mode obtained by setting securelevel to -1; the
system's behavior is the same as with a setting of zero, but it is no
longer possible to change the security level.
In the UEFI secure boot setting, the bootstrap code would take pains to set
securelevel to one before allowing any processes to run. That
helps to avoid race conditions where the system is subverted before
the lockdown can be applied.
Some readers will, by now, have recognized that "securelevel" looks an
awful lot like the
BSD functionality that goes by the same name; it was clearly patterned
after BSD's version. Amusingly, this is not the first time that
securelevel has been considered for Linux; there was an extensive discussion on the
subject in early 1998, when Alan Cox was pushing strongly for a
securelevel feature. At that time, Linus rejected the feature because he
had something much better in mind: capabilities. As is usually the case,
Linus won out, and Linux got capabilities instead of securelevel.
More than fifteen years later, it seems that we might just end up with both
mechanisms. Thus far, Matthew's latest patch set has not resulted in many
screams of agony, so it might just pass review this time — though, at this
point, it is almost certainly too late for 3.12. Meanwhile, Vivek Goyal
has posted the first version of a signed kexec
patch set that would limit kexec_load() to signed images.
That would allow some useful features (kdump, for example) to continue to
work properly in the secure boot environment without leaving
kexec_load() completely open. That, too, will make the secure
boot restrictions a bit more palatable and increase their chances of being
merged.
Comments (35 posted)
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
- Marco Stornelli: pramfs .
(September 9, 2013)
Memory management
Networking
Security-related
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>