Brief items
The current 2.6 prepatch is 2.6.23-rc4,
released by Linus (under the
code name "Pink Farting Weasel") on August 27. It has a rather large
pile of fixes; "most regressions" have been dealt with at this point. See
the short-form changelog for details, or
the
long-form changelog for lots of details.
As of this writing, there have been no patches merged into the mainline
repository since the -rc4 release. There have been no -mm tree releases
over the last week.
The current stable 2.6 kernel is 2.6.22.5, released on August 22. It
contains about 20 patches for serious problems. The 2.6.22.6 review
process (involving a couple dozen more patches) is underway, with the
release being a bit overdue as of this writing.
For older kernels: 2.6.20.17 was released on
August 25 with a long list of fixes. 2.6.20.18, released on
August 28, reverts two of those fixes which turned out not to be such
a good idea after all.
Comments (2 posted)
Kernel development news
In other words, consuming half of your processor is (surprise!)
detrimental to multimedia playback performance. At this point, it
becomes clear that the process scheduler folks and the networking
folks are bitter enemies and do not converse.
--
Robert
Love (not talking about Linux)
Comments (21 posted)
By Jonathan Corbet
August 24, 2007
For the past several years, the annual, invitation-only kernel developers'
summit has been held immediately prior to the Ottawa Linux Symposium. This
year is different, though: the summit is, instead, happening just after
LinuxConf Europe in
Cambridge, UK. As usual, your editor will be there and will be able to
report from the event. The
preliminary
agenda has been posted, though, as has
the
list of attendees [PDF]. So it is possible to look forward and get a sense
for what is likely to be discussed.
A few months ago, a discussion of interesting topics was held on the 2007
summit list. Many of the usual topics came around; there is always
plenty of interesting development work going on in the kernel community.
Andrew Morton objected
to many of the topics under discussion, though, saying that the summit was
not the appropriate venue to talk about them:
My overall take on kernel summit: we spend far too much time
talking about technical stuff. There is little benefit in doing
this: we conduct technical discussions over email and we do it
well, and there are many very good reasons for doing it that
way.... We fly halfway around the world to yap on about dentry
cache scalability? Spare me, we'd get more done by staying home.
Andrew's conclusion, which was seconded by a number of other developers,
was that the process-oriented discussions are always more interesting and
useful than the deep technical sessions. Discussions of virtualization,
memory management, or device drivers will always be uninteresting to a
significant part of the group, and they do not necessarily add much over
what can be done with email. But the process-oriented talk affects
everybody and is much harder to do electronically.
So this year's agenda is more high-level than in previous years. That does
not mean that there will be no technical talk, though. Some of the more
technical sessions will cover:
- Reports from mini-summits. The kernel is a big program, and
developers often find that subsystem-specific questions are better
addressed in smaller groups. At the summit, attendees from some
recent mini-summits (covering power management, filesystems, storage,
and virtualization, at least) will report back to the larger group.
- Real time and scheduler issues are on the agenda because there are
some big decisions to make. While much of the real-time tree has
found its way into the mainline, some of the more disruptive chunks
(sleeping spinlocks, threaded interrupt handlers) remain outside.
Also outside of the mainline is the syslets/threadlets patch set.
Hopefully some decisions will be made on whether these features should
be merged, and, if so, what needs to be done to get them into shape.
- There are a number of memory management issues out there, including
the variable page and variable block size patches, approaches to
deadlock avoidance, scalability work, and more. Also on the agenda is
the more process-oriented question of why memory management patches
are so hard to get into the mainline.
- Virtualization has fallen off the agenda because most of the
kernel-level work in this area has already been merged. The
containers developers are just getting going, though, and there are a
lot of questions about what their final destination is thought to be.
A full containers implementation could impose significant overhead -
on developers and on run-time performance - and could prove hard to
sell.
That's about it for the serious technical talks; everything else will have
a higher-level focus. The summit will start with a panel of distributor
kernel maintainers. To a great extent, distributors are the immediate
customers for the kernels that the developers put out; those distributors
are then charged with getting mainline releases into a condition that
allows it to be shipped to users. Distributor kernel maintainers tend to
be on the front line when things go wrong; they always hear about all the
problems. This panel will be a chance for those maintainers to talk about
the quality of the kernels they are getting from the mainline and how
things could be made to work better.
Once upon a time, the kernel stood alone and presented services to the
system by way of the system call interface. In current systems, instead,
users see a view of the system which is created by a whole set of
utilities, including the C library, udev, HAL, and more. Interactions
between these low-level components and the kernel is not always as smooth
as it could be, and, despite the best efforts of the kernel development
community, kernel releases have been known to occasionally break utilities
like udev. The "greater kernel ecosystem" session will cover these issues
and the general question of making the system as a whole work better
together. Establishing better control over the user-space API is likely
to come up, though the problem remains difficult.
There is a half-hour session on developer relations. The kernel
development community is visibly growing, and that is generally a good
thing. Ensuring the continued health of kernel development requires
bringing in a steady stream of new developers - from all over the world.
This session will be the place to talk about how that can be done, and how
participation from under-represented parts of the world can be improved.
Andrew Morton gets an hour to pound the table on kernel quality and related
issues. There still appears to be a consensus among the developers that
the kernel is not getting buggier, but that view is not universally held.
Everybody agrees that fewer bugs would be a good thing, though. So topics
like bug tracking, fixing the reviewer shortage, possible stabilization
releases, and so on, are likely to come up in this session.
Documentation is, inevitably, on the agenda - everybody wants more of it,
but, somehow, it fails to just show up on its own. Last year there was
some talk of imposing documentation requirements on new patches, but few
people took the idea all that seriously. So maybe some different ideas for
improving the situation will come about this time around. Also on the list
may be the area of managing translations - an area of increasing interest -
and standardizing kernel
messaging.
Various other process-oriented questions have been swept into a session
late on the second day. Are big code cleanups worth it? How can we
improve our handling of large patches which affect a number of different
subsystems? How do we deal with problematic maintainers? And, in general,
is the kernel process going too fast? But perhaps the discussion will be
dominated by Andrew Morton's suggestion that the developers form a union
and demand a massive pay raise.
There are other sessions on the agenda as well; see the posted version for
the full list. Whenever a group of this nature comes together, interesting
things are bound to come out of it. Tune into LWN around September 6
for coverage from the event.
Comments (19 posted)
By Jonathan Corbet
August 28, 2007
Once upon a time, block device drivers implemented the same
file_operations structure used by char drivers - despite the fact
that block drivers are quite different and many of the
file_operations methods had no relevance to them. By the 2.4
release, though, the block driver API had been significantly reworked, and
struct file_operations was no longer used. Instead, block drivers
have a
block_device_operations structure containing many of the
driver's exported operations. "Many" because certain other operations,
including the ones which actually enqueue I/O requests, end up being stored
in the request queue structure instead.
When the move to block_device_operations was done, a number of
methods were carried over directly from the file_operations
vector with their prototypes unchanged. Doing things this way minimized the pain
for driver maintainers, but it led to some interesting interface
artifacts. For example, consider the open() method:
int (*open)(struct inode *ino, struct file *filp);
When a char device or an actual file is being opened, filp points
to the internal file structure used by the kernel to manage the
open file. If a user-space process opens a block device directly,
filp will be used in the same way. Most of the time, though,
block devices are opened by the kernel as a step toward mounting a
filesystem stored there. In that case, there is no associated file
structure. That's why a perusal of the source reveals code like this:
/*
* This crockload is due to bad choice of ->open() type.
* It will go away.
* For now, block device ->open() routine must _not_
* examine anything in 'inode' argument except ->i_rdev.
*/
struct file fake_file = {};
struct dentry fake_dentry = {};
fake_file.f_mode = mode;
fake_file.f_flags = flags;
fake_file.f_path.dentry = &fake_dentry;
fake_dentry.d_inode = bdev->bd_inode;
Al Viro (who is responsible for much of the current API) has taken a look at this problem and
others. In the case of open(),
there is very little of the information passed in the inode and
file structure pointers which is actually used by drivers. And
some of that is used in hazardous ways - any driver which depends on
anything in fake_file lasting beyond the open() call will
find itself in trouble. There are other issues with the API as well,
leading Al to propose some significant changes. The result, which is
almost certain to be merged when it is ready (possibly as soon as 2.6.24),
will be a cleaner block
driver API - at the cost of changes for every existing driver.
The first change will be to move some of the flags found in
f_flags over to f_mode, which is not subject to being
changed by fcntl() calls from user space. As part of the move,
drivers will be expected not to change those flags - or any other part of
the file structure. This change will enable a cleanup of some
code in the much-maligned floppy driver, which currently stores some
information in that structure at open() time.
The new open() prototype is projected to be:
int (*open)(struct block_device *bdev, mode_t mode);
Where mode has the usual read/write flags, but also some of the
other open()-time flags like O_NDELAY. This value will
not be changed by the drivers and will not necessarily exist in any sort of
file structure. It will be stored safely in an undisclosed
location by the kernel and will be available at release() time,
when some drivers will need access to those flags.
Speaking of release(), that function, too, currently has an old
prototype:
int (*release)(struct inode *ino, struct file *filp);
In this case, filp is often passed as NULL by the kernel,
forcing drivers to check the value and implement some sort of default
behavior in the lack of a file structure. But, sometimes, drivers
need to know about some of the flags which were provided at open()
time. So the new release() method will look something like:
int (*release)(struct gendisk *disk, mode_t mode);
The changes do not stop there. Al points out that there is a bit of
confusion in the ioctl() interface:
int (*ioctl)(struct inode *ino, struct file *filp, unsigned cmd,
unsigned long arg);
long (*unlocked_ioctl)(struct file *filp, unsigned cmd, unsigned long arg);
long (*compat_ioctl) (struct file *filp, unsigned cmd, unsigned long arg);
The different versions have different arguments - and even different return
types. Once again, drivers tend not to care about most of what can be
found in the inode and file structures - even when those
structures exist. So the new form of the ioctl() methods will be:
int (*ioctl)(struct block_device *bdev, mode_t mode, unsigned int cmd,
unsigned long arg);
int (*compat_ioctl)(struct block_device *bdev, mode_t mode, unsigned int cmd,
unsigned long arg);
Note that unlocked_ioctl() is gone: it is arguably past time to
get rid of the big kernel lock (BKL) in the block ioctl()
implementation. So any driver still using the locked version
(ioctl() in the old API) will be modified to take the BKL
internally. Any block driver which still requires the BKL is probably in
need of a more serious review, though.
As of this writing, there have been no arguments against the change. The word from Linus is:
From your description, I have no objections - everything sounds
good. My only concern is how painful the patch ends up being (and a
worry about whether this will affect a metric truck-load of
external modules? That said, I can't really see us worrying about
those)
Al claims to have a patch in progress and ready to be posted soon, and that
the amount of pain should be relatively small - for in-tree drivers,
anyway. For those maintaining out-of-tree block drivers, the writing is on
the wall: a significant API change is coming.
Comments (none posted)
By Jonathan Corbet
August 29, 2007
The
sysctl() system call allows a suitably-privileged application
to tweak various kernel parameters. It is a useful feature which, as it
happens, is almost never used. The reason for that is the existence of the
/proc/sys virtual directory hierarchy which exports the same
functionality in a form which is much easier to use. Callers of
sysctl() have been encouraged to use
/proc/sys instead
for a long time and the addition of new parameters to
sysctl() is
considered to be against the rules. One year ago,
sysctl() was
removed from the 2.6.19-rc
kernels, only to be restored before the final release.
sysctl() is part of the user-space ABI; it is supposed to continue
working forever. That is why the attempt to remove it was ultimately
rolled back. So it may be surprising to some to see a new removal attempt by Eric Biederman. His
latest patch adds a new deprecation warning and an entry in the feature
removal schedule putting the end of sysctl() in September, 2010.
Says Eric:
After adding checking to register_sysctl_table and finding a whole
new set of bugs. Missed by countless code reviews and testers I
have finally lost patience with the binary sysctl interface.
The binary sysctl interface has been sort of deprecated for years
and finding a user space program that uses the syscall is more
difficult then finding a needle in a haystack. Problems continue
to crop up, with the in kernel implementation. So since supporting
something that no one uses is silly, deprecate sys_sysctl with a
sufficient grace period and notice that the handful of user space
applications that care can be fixed or replaced.
Eric's claim is that this interface is so little-used that it is visibly
rotting. There is sufficiently little common code between the
sysctl() and /proc/sys implementations that it is easy
for the two to diverge. In the long term, he says, the kernel community
will do a better job of not breaking applications by getting rid of
sysctl() in favor of the interface which is actually used and
maintained.
The new patch has, predictably, drawn opposition from developers who do not
want to see the user-space ABI broken in this way. Alan Cox has also suggested that the deprecation warning
approach will not be successful in getting the few remaining users to
switch to /proc/sys:
The whole "whine a bit" process simply doesn't work when you are
trying to persuade people to move in a non-hobbyist context. They
don't want to move, the message is simply an annoyance, their
upstream huge package vendor won't change just to deal with it and
they'll class it as a regression from previous releases, an
incompatibility and file bugs until it goes away.
Andrew Morton, instead, is not opposed to
the patch:
I think it's worth a try. It might take two, three or five years,
who knows? If it turns out to be impractical then we we can just
change our minds later, no big loss.
While there is little disagreement with the policy that the user-space ABI
should never break, it does seem that there is room for discussion on how
that goal might best be met. Unused code has always had a tendency to
break accidentally, and sysctl() looks to be very close to being
entirely unused. One could, presumably, address this problem with some
sort of regression test suite - something the kernel could use more of in
general. But the maintenance of interfaces which of almost entirely
historical interest is not really helpful to Linux users. So, perhaps,
there needs to be a way to remove system calls which have fallen into
disuse for a long-enough period. Should this patch go through, we shall
see whether three years is sufficient warning for such a change or not.
Comments (17 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Memory management
Architecture-specific
Security-related
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>