Brief items
The current development kernel is 2.6.37-rc7,
released on December 21. Linus says:
I'm still nervous about some of the regression reports for intel
graphics, so please keep testing and reporting. This is the last -rc
before xmas (or whatever your holiday may be), so now you all have a
few free days when you have nothing better to do than test out an -rc
release, right?
The full changelog can be found on
kernel.org.
Stable updates:
Willy Tarreau released 2.4.37.11 on
December 18. "It fixes a number of minor security issues, mainly information leaks
from the kernel stack on some 64-bit architectures, or possible NULL
derefs and crashes in some less commonly used protocols (eg: econet,
x25, irda)." He also notes that 2.4 will now be supported through
the end of 2011.
Comments (none posted)
I talked with Alexandre [Oliva] a few months ago, and we decided to
change the way Linux Libre deals with outside nonfree firmware.
The current practice is to change the code to fail instead of
trying to load any firmware.
The change is to obfuscate the names of the firmware files in the
Linux source code. That way, if a user tracks down what firmware
to install and installs it under the name that the code wants, it
will. But Linux Libre will still not suggest installation of the
nonfree firmware file to handle a particular device.
--
Richard Stallman on freedom through
obscurity
The final BKL removal isn't really a big step forward for
Linux. It's more a symbolic gesture, but I prefer to leave those to
politicians and priests.
--
Andi Kleen
In conclusion: don't get surprised if technically inferior
propositions, such as proprietary 3D libraries coupled with
kernel-side interfaces, are met with strong or even vehement
opposition. Some people will be sufficiently moderated to tell you
that if you want to do such thing then you get to deal with it all
yourself and that they are not interested in any accommodation that
would help you. But it is clear that you will never get a
consensus for supporting such technically inferior solution in the
mainline tree, as from an Open Source point of view such a move
simply makes no sense.
Accepting such things in mainline would weaken the very principle
that as made Open source in general and Linux in particular such a
success, while refusing it isn't going to affect the survival of
Open Source anyway. The compromise here would be only in the
corporate world's favor. And as the past history has shown in such
cases, the Open Source way always ends up prevailing eventually,
despite the lack of corporate assistance.
--
Nicolas Pitre
Anyone can try shipping this and risk a lawsuit, and all copyright
holders of the kernel can try suing people that distribute such
code. Most sensible people stay out of both the shipping
questionable code and the suing part, but apparently the entire
mobile phone industry is already doing both, so we can just wait
and see if anyone has deep enough pockets to bring this up in court
first.
--
Arnd Bergmann
Comments (16 posted)
By Jonathan Corbet
December 22, 2010
The SCSI protocol normally specifies a two-sided conversation between an
"initiator," which initiates requests, and a "target," which acts upon
those requests. Normally, the initiator is the host computer, while the
target is a storage device; it's thus not surprising that the bulk of the
SCSI-related code in the kernel implements the initiator role. There are
times, though, when it is useful to have a Linux system act as a SCSI
target, usually when that system is the interface to some sort of
complicated storage array. The target mode is currently implemented by the
STGT code, but that subsystem
has been seen as being ripe for replacement for some time.
There are two main contenders to replace STGT: LIO and SCST. In the end, there's really
only room in the kernel for one SCSI target implementation, so there
naturally has been a fair amount of tension between these two projects.
Whenever the discussion turned to choosing one, it tended toward the ugly
side. SCSI maintainer James Bottomley has done his best to stay out of the
flames, but, in the end, he must make a decision and merge one of them.
A few months back, it began to become clear that LIO was going to be the
winner. More recently, James gave the green
light to begin merging this code for the 2.6.38 kernel. Suffice to say
that SCST maintainer Vladislav Bolkhovitin did
not take the decision well and did his best to restart the battle in a
wider context. James has stuck with his decision, though, saying that there is not much to choose
between technically, and that it came down to community:
Or said a different way: as long as you choose the most community
oriented of competing offerings, the community will fill any
perceived gaps. Conversely, you can destroy a project simply by
alienating the community. That's why community is more important
than feature set.
The previous discussions appear to have worn down most other participants,
so few people chose to join in this time around. There doesn't seem to be
anything to suggest that the decision will change at this point; unless
something surprising happens, LIO will be the in-kernel SCSI target
subsystem as of 2.6.38.
Comments (1 posted)
By Jonathan Corbet
December 22, 2010
The Openwall Linux developers have an interesting problem: they have
managed to create a distribution which is entirely free of setuid-root binaries,
with one exception:
ping still needs to be setuid root to be able
to send ICMP echo packets. That seems a little untidy, so the project put
together
a patch which allows
ping
to run as an unprivileged user. It implements a new type of socket
protocol (
IPPROTO_ICMP) which, despite its name, is not usable for
ICMP communications in general. The only type of message which is allowed
through is
ICMP_ECHO (and the associated replies).
Interestingly, this patch has been trimmed down from the version which is
applied to Openwall kernels. In the full version, the ability to create
ICMP sockets is restricted to a specific group, which can be set by way of
a sysctl knob. The ping binary is then installed setgid. In this
way, full access to ICMP sockets is not given to unprivileged users, while
ping only gets enough privilege to create such sockets. The group
check was removed from the posted patch to make acceptance easier, but it
seems likely to be added back before the next posting.
For more information about the thinking behind this design, see this message from Solar Designer.
Comments (3 posted)
Kernel development news
By Jonathan Corbet
December 21, 2010
In the US, at least, the term "radar detection" is usually associated with
devices designed to warn heavy-footed drivers about police officers lurking
in the vicinity. As far as your editor knows, none of those devices run
Linux. Radar detection may become important for Linux in another context,
though: wireless networking - especially in a base station mode - will
require it. Some early work is now afoot to give that capability to the
Linux kernel.
Most wireless networking happens in the 2.4 GHz frequency band; as many
users will have noticed, that band tends to get crowded and noisy in
places. For this reason, both 802.11a and 802.11n specify a number of
channels in the 5GHz band as well. The relative lack of traffic at 5GHz
makes it attractive for this use, even though the effective range of an
access point is reduced somewhat. Pushing more wireless traffic to 5GHz
will greatly increase the total bandwidth available.
Naturally, there is a catch. While other uses of that frequency range are
few, among them are counted air traffic control and weather radars.
Interfering with these radars will be frowned upon by regulators who
have strange notions about how aviation safety should take priority
over that post-lunch Twitter update. These regulators typically show a
distinct lack of humor toward anybody who doesn't pay attention to their
rules; once again we see how wireless networking often tends to be the
leading edge of encounters between Linux and the regulatory environment.
To make the 5GHz band available for wireless networking in a safe manner,
various agencies have laid out specifications for how a wireless device
selects an operating channel. This scheme, called "dynamic frequency
selection" (DFS), requires that a "master" station listen to a channel for
a minimum period of time to ensure that no radars are operating there
before transmitting. Thereafter, the station must continue to listen for
radars; should one happen to move into the neighborhood, the station must
shut down all communications and move to a different channel. In essence,
wireless devices operating in the 5GHz band must actively avoid
transmitting on channels where radars are operating.
Most Linux systems will not have to concern themselves directly with radar
detection. A "slave" device, as might be found in a typical laptop, need
only follow the master device's instructions with regard to where it can
transmit. But any device which wants to function as a master - including
access points and anything running in ad hoc mode - must notice radars and
react accordingly.
Wireless adapters, having radio receivers tuned to the frequency range of
interest, can help with this process. Should a blast of RF energy hit the
antenna, the adapter can return an error to the host system indicating that
a radar-like patch of interference was encountered. It's not quite that
simple, though: random interference is far from unknown in the wireless
world. If a wireless
device bailed out of a channel every time it received some unexpected
interference, communication would be painful at best. So something a
little smarter needs to be done.
That something, of course, is to look for the specific patterns of
interference that will be generated by a radar. Radars emit short bursts
of RF radiation, followed by longer periods of listening for the returns.
The good news is that these patterns are fairly well defined in terms of
the radar's pulse width, pulse repetition interval, and frequency. The bad
news is that these parameters vary from one regulatory domain to the next.
So while the US has specified a specific set of patterns that a device must
recognize, the European Union has defined something different, and Japan
has a variant of its own. So radar detection must be specific to the
environment in which the device is operating.
A group of developers, mostly representing wireless hardware companies has
started a project to
implement DFS for Linux. A preliminary
patch set has been posted by Zefir Kurtisi to how how DFS might be
done. These patches add a simple function to the ieee80211 API:
void ieee80211_add_radar_pulse(u16 freq, u64 ts, u8 rssi, u8 width);
The hardware driver can use this function to inform the 802.11 core
whenever the interface reports the detection of a radar pulse. These
events will be tracked; if, over time, they match the pattern for radars
defined by the regulatory environment, the code will conclude that a radar
is operating and that evasive action is called for. If the hardware can do
full radar detection directly, the driver can report the existence of a
radar with:
void ieee80211_radar_detected(u16 freq);
The current patch is only able to detect one variety of European radar; it
is meant as a sort of proof of concept. The means by which parameters will
be loaded to describe radars in different jurisdictions is yet to be worked
out; one assumes that the existing regulatory compliance mechanism will be
used, but alternatives are being considered. One way or the other, Linux
should be able to coexist with radars in the 5GHz band in the near future.
A version which helps in the avoidance of speeding tickets may take a
little longer.
Comments (6 posted)
By Jonathan Corbet
December 19, 2010
Realtime scheduling for audio applications (or the lack thereof) has been a recurring theme over
a number of years. LWN last
visited this
issue in 2009, when the addition of rtkit was put forward as the
(pulseaudio-based) solution for casual audio use. Serious audio users -
those using frameworks like
JACK - have
always wanted more direct access to realtime scheduling, though. That
access has, for some years, been provided through resource limits. Now
it seems that a feature merged for the 2.6.25 kernel is, two years later,
beginning to cause
grief for some JACK users. The resulting discussion is an interesting
illustration of technical differences, how long it can take for new
features to filter through to users, and how one should best deal with the
kernel development community.
The combination of the RLIMIT_RTPRIO and RLIMIT_RTTIME resource limits
allows the system administrator to give specific users the ability to run
tasks with realtime priority for a bounded period of time. The feature is
easily configured in /etc/security/limits.conf and will prevent
casual users from locking up the system with a runaway realtime process.
This feature is limited in its flexibility, though, and is relatively easy
to circumvent, so it has never been seen as an ideal solution.
The better way, from the point of view of the scheduler developers, is to
use realtime group scheduling. Group scheduling uses control groups to
isolate groups of processes from each other and to limit the degree to which
they can interfere with each other; there has been an increase in interest
in group scheduling recently because this feature can be used to improve
interactivity on loaded systems. But group scheduling can also be used to
give limited access to realtime scheduling in a way which cannot be
circumvented and which guarantees that the system cannot be locked up by a
rogue process. It is a flexible mechanism which can be configured to
implement any number of policies - even if the full feature set has not yet
been implemented. More information on how this feature works can be found
in sched-rt-group.txt in the kernel
documentation tree.
If realtime group scheduling is enabled in the kernel configuration, access
to realtime priority based on resource limits is subordinated to the limits
placed on the control group containing any given process. So if a process
is run in a control group with no access to realtime scheduling, that
process will not be able to put itself into a realtime scheduling class
regardless of any resource limit settings. And that is where the trouble
starts.
The kernel, by default, grants realtime access to the "root" control group
- the one which contains all processes in the absence of some policy to the
contrary. So, with a default setup, processes will
be able to use resource limits to run with realtime priority. If, however,
(1) the libcgroup package
has been installed, and (2) that package has been configured to put
all user processes into a default (non-root) group, the situation changes. The
libcgroup default group does not have realtime access, so processes
expecting to be able to run in a realtime scheduling class will be
disappointed.
As it happens, Ubuntu 10.10 the upcoming Ubuntu 11.04
release installs and configures libcgroup in just this
mode. That causes trouble for Ubuntu users running JACK-based audio
configurations; audio dropouts are not the "perfect 10" experience they had
been hoping for. In response, there has been quite a bit of complaining on the
JACK list, most of which has been aimed at the kernel. But it is not, in
fact, a kernel problem; the kernel is behaving exactly as intended - a
fact which has not made JACK developers feel any better.
As libcgroup developer Dhaval Giani pointed
out, there are a few ways to solve this problem. The easiest is to
simply turn off the default group feature with a one-line configuration
change; only slightly less easy is enabling realtime access for that default
group. The best solution, according to Dhaval, is to create a separate
control group for JACK which would provide realtime access to just the
processes which need it. That solution is slightly trickier than he had
imagined, mostly because JACK clients are not necessarily started by JACK
itself, so they won't be in the special JACK group by default. There are
ways of getting around this difficulty, but they may require
Linux-specific application
changes.
The JACK developers were not greatly mollified by this information; in their
view, audio developers have been getting the short end of the stick from
the kernel community for years, and this change is just more of the same.
They would, it seems, rather stick with the solution they have, which has
been working for a few years now. As Paul
Davis put it:
But I hope you can perhaps understand how incredibly irritating it
is that *just* as almost [all] mainstream distros now finally come
with the mechanism to grant ordinary users SCHED_FIFO and memlock
without too much hassle (its taken more than 8 years to get there),
RT_GROUP_SCHED appears without any apparent appreciation for the
impact on what is probably the most widely used RT-scheduled
application "ecosystem" on Linux.
Many of the other thoughts expressed on the list were rather less polite.
The audio development community, it seems, feels that it is not being
treated with the respect that it deserves.
It is true that the audio folks have had a bit of a hard time of it. They
have made a few attempts to engage with the kernel community which have
been less than successful; since then, they have mostly just had to accept
what came their way. And what has come their way has not always been what
they felt they needed. As expressed by Alex
Stone, the audio community clearly feels that the kernel developers
should be paying more attention:
So no-one thought, while building this exciting new feature, to do
a quick test, or at least have a think about, of the significance
of the impact on jack/RT, given the nature of the feature as a
scheduler, and what many users think is JACK and jack based apps
importance in the linux community?
Sort of confirms the indifference to jack/RT as a significant
component in the linux audio/midi/video world, doesn't it?
One other sentence in Alex's message deserves special attention, though:
"If we don't yell, we don't get considered?" The answer to
that question is "yes." The kernel serves a huge community of users, many
of whom are represented within the kernel development community. It is
entirely unsurprising that groups which don't "yell" tend to find that
their needs are not always met. Any group which declines to participate,
feeling instead that it's so important that kernel developers should come
to them, is bound to be disappointed with future kernels. We all have to
yell when our toes are stepped on; the sooner we yell the better the
results will be.
That said, no amount of yelling at the kernel will help when the problem is
elsewhere. Ubuntu has created a configuration in which allowing
unprivileged access to realtime scheduling requires a bit more
administrative work than it did before. Fedora, which also installs
libcgroup, has, perhaps accidentally,
avoided this problem by not enabling the "default group" option. So one
might say that Ubuntu would be an appropriate target for any yelling on
this topic.
But increased use of control groups is clearly on the horizon for a number
of distributions; systemd depends
on them heavily. So the realtime audio community will need to work with
control groups, like it or not. The good news is that control groups
provide the needed features, and they do it in a way which is more secure
and which allows more control over policy.
The JACK community seems to have figured this out; there have already been
some patches posted to give JACK an understanding of control groups. It
would also appear that the libcgroup developers are working on the problem in the hope of
producing a solution which doesn't require application changes. Then,
hopefully, Linux audio developers will have a solution which they can
expect to rely on for many years (though they will want to keep an eye on
the progress of the deadline scheduling patches). Certainly this kind of
solution is something they have been wanting for a long time.
(Thanks to David Nielson for the heads-up).
Comments (73 posted)
By Jonathan Corbet
December 20, 2010
Operating system kernels, at their best, should not be noticed by user
space at all; in particular, the resource cost of the kernel should be as
small as possible. The Linux kernel has been written with that idea in
mind, but, for some people, anything is still too much. High-performance
computing users want all of the CPU time for themselves, while some
latency-sensitive users want their code to never have to wait for the
processor. These users have been asking for a way to run processes on at
least one CPU with no kernel interference at all - no timer ticks, no
interrupts, etc. Thus far, no satisfactory solution has been found; a new
patch set by Frederic Weisbecker is not such a solution yet, but it shows
another way of attacking the problem.
The idea behind Frederic's patch set is to
enable a process to disable the timer interrupt while it is running. If a
set of conditions can be met, this will allow the process to run without
regular interference from the timer tick. If other sources of interrupts
are directed away from the CPU as well, this process should be able to run
uninterrupted for some time. There are a few complications, though.
Actually going into the tickless mode is relatively easy; the process need
only write a nonzero value to /proc/self/nohz. The patch imposes
a couple of conditions on these processes: (1) the process must be
bound to the
CPU it is running on, and (2) no other process can be running in the
tickless mode on that CPU. If those conditions hold, the write to
/proc/self/nohz will succeed and the kernel will try to disable
the timer tick while that process runs.
The key word here is "try"; there are a number of things which can keep the
disabling of the tick from happening. The first of those is any sort of
contention for the CPU. If any other processes are trying to run on the
same CPU, the scheduler tick must happen as usual so that decisions on
preemption can be made. Since a process can be made runnable from anywhere
in the system, Frederic's patch performs a potentially expensive
inter-processor interrupt whenever the second process is made runnable on
any CPU, regardless of whether that CPU is currently running in the no-tick
mode or not.
Another thing that can gum up the works is read-copy-update (RCU). If
there are any RCU callbacks which need to be processed on the CPU, that CPU
will not go into the no-tick mode. RCU also needs to be notified whenever
the CPU goes into a "quiescent state," so that it can know when it is safe
to invoke RCU callbacks on other CPUs. If RCU has indicated an interest in
knowing when the target CPU goes quiescent, once again, no-tick mode cannot
be entered. The CPU can also be forced out of the no-tick mode if RCU
develops a curiosity about quiescent states anywhere in the system.
Given that RCU is heavily used in contemporary kernels, one would think
that its needs would prevent no-tick mode most of the time. Another part
of the patch set tries to mitigate that problem with the realization that,
if a process is running in user space with the timer tick disabled, the
associated CPU is necessarily quiescent. When a CPU is running in this
mode, it will enter an "extended quiescent state" which eliminates the need
for notification to the rest of the system. The extended quiescent state
will probably increase the amount of no-tick time on a processor
considerably, but at a small cost: the architecture-level code must add
hooks to notify the no-tick code on every kernel entry and exit.
Reviews of the code, so far, have focused on various details which need to
be managed differently, but there has not been a lot of criticism of the
concept. It's early-stage code, so it doesn't take care of everything that
normally happens during the timer tick, a fact which reviewers have pointed
out. The biggest gripe, perhaps, has to do with the conditions
mentioned at the beginning of the article: the process must be bound to a
single CPU, and there can only be one no-tick process running on that CPU.
Peter Zijlstra said:
Well yes, this interface of explicitly marking a task and cpu as
task_no_hz is kinda restrictive and useless. When I run 4
cpu-bound tasks on a quad-core I shouldn't have to do anything to
benefit from this.
Frederic has indicated that the code can be changed to lift those
restrictions, but at the cost of some added complexity. Once the
restrictions are gone, it may make sense to just enable the no-tick mode
whenever the workload is right for it, regardless of a request (or the lack
thereof)
from any specific process. That would make the no-tick mode more generally
useful; it would also reduce the role of the timer tick just a little
more. The kernel would still be far from a fully tickless system, but
every step in that direction helps.
Comments (none posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Security-related
Benchmarks and bugs
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>