|
|
Subscribe / Log in / New account

Kernel development

Brief items

Kernel release status

The current development kernel is still 2.5.31; Linus has not released a development kernel (as of this writing) since August 10.

Linus has not been idle, however; his BitKeeper repository (which may well be released as 2.5.32 by the time you read this) contains many changes. At the top of the list, of course, is the replacement of the IDE subsystem. Other stuff merged by Linus includes some NFS changes, the "scalable exit" patch from Ingo Molnar (see below) along with his other thread support improvements, an ACPI update, a set of page cache improvements from Andrew Morton, a new MTRR driver, more device model work, a new RTC driver, and a very long list of other fixes and updates.

The latest 2.5 status summary from Guillaume Boissiere came out on August 20.

The current stable kernel is 2.4.19. Marcelo released 2.4.20-pre4 on August 19; the biggest change in this prepatch is the addition of the JFS journaling filesystem.

The current prepatch from Alan Cox is 2.4.20-pre2-ac6. The "ac" series looks to be the testing area for new IDE patches for some time, and thus may be, at times, less stable than people have come to expect.

Comments (none posted)

Kernel development news

IDE - now what?

As covered on this week's front page, all of Marcin Dalecki's "IDE cleanup" work has been removed from the 2.5.32 kernel and replaced with the 2.4 "foreport." That leaves the IDE code in a state not that far removed from where it was when the 2.5 series started, and the Halloween freeze date is getting closer. What is going to happen to the IDE code now, and who will do it?

At the moment, nobody is stepping forward to be the next IDE maintainer. For the time being it looks like Jens Axboe and Alan Cox are willing to oversee new IDE work and filter it on its way to Linus - but they will not necessarily do a lot of that work themselves. Alan has laid down some conditions, though:

I want order to this. That means all the driver cleanup goes into 2.4-ac (or "2.4-ide" or some suitable branch) first where we can verify we aren't hitting 2.5 generic bugs and ide corruption is a meaningful problem report. It means someone (not me) is the appointed 2.5 person and handles stuff going to 2.5 (I'm happy to identify stuff that tests ok in 2.4 as candidates). It also means random patches not going past me.

If we can do it that way I'll do the job. If Linus applies random IDE "cleanup" patches to his 2.5 tree that don't pass through Jens and me then I'll just stop listening to 2.5 stuff.

In other words, the 2.4-ac tree becomes the development area for new IDE work before it heads into 2.5. And Alan doesn't want to have to contend with patches taking other paths into 2.5. (Alan has also posted the set of attributes an IDE maintainer should have for anybody who is interested in the job).

What is going to happen with the IDE code? A few people have requested that somebody pick up Marcin's work and finish the job, but nobody who is actually working with IDE seems to have much interest in that. Quoting Alan again:

Its easier to go back to functionally correct code and do the job nicely than to fix the 2.5.3x code. Right now I'm working on Andre's current code in 2.4.20pre2-ac* starting off with only provably identical transforms between AndreCode and C and documenting it

So it looks like the 2.4 IDE implementation is here to stay. Or, at least, something based on it - Andre Hedrick, as it turns out, has not been idle during this time. He has a whole set of patches - much of which is already in the -ac series - for nice things like Serial ATA, pluggable low-level transport drivers, modular chipset support, etc. At this point, it's hard to imagine this code not moving into 2.5 once it proves stable.

Linus has his own plans for the future of the IDE code. These plans involve making some relatively minor changes to the current IDE core, mostly around moving some functionality up toward the block layer. Once that's done, development on a new "IDE-TNG" driver would begin. The existing IDE code at that point would be mostly frozen and thus remain stable; new work would happen in the new, scary, dangerous "TNG" driver. Support for older hardware would be removed from the TNG driver, allowing a great deal of historical cruft to be cleaned out.

In retrospect, creating a new version of the IDE subsystem was the obvious way to carry out a major reworking of this code. You simply can not have a fundamental layer like IDE be unstable for months and expect to get a lot of other work done. The previous IDE transition (from the old "hd" driver) was handled in this manner. Had Marcin's work been done this way, he might well still be at it now.

As it is, the window of opportunity for major IDE work in 2.5 has closed. There is time for smaller cleanups and the addition of needed features, but nobody has any appetite for anything that would seriously destabilize IDE again this close to the freeze date.

Comments (none posted)

Making threads die quickly

Ingo Molnar's work to improve the kernel's support of threads was covered here last week. This week, Ingo has moved on to the final part of a thread's life cycle: the exit() call. It turns out that the Linux exit() implementation has some real scalability problems, which are described and fixed in this patch.

The cost of killing a process, it turns out, is proportional to the total number of processes running. In situations where thousands of tasks are running (and, remember, some threaded applications run thousands of threads) the exit() call can become truly expensive.

Why is this happening? When a process exits, the kernel must "reparent" all of its children to keep the process hierarchy consistent. This should be a straightforward job, since each process keeps a list of its children in the task_struct structure. Unfortunately, due to some weirdness in how the ptrace() system call is handled, that list is not sufficient. ptrace(), it seems, rearranges the process tree so that the process being traced becomes a child of the process doing the tracing. To find processes which have been temporarly relocated to a "foster parent," the exit() system call must iterate over all processes in the system. And that, of course, is where the scalability problems come in.

Ingo's solution is simply to maintain a separate list of all processes which are being debugged with ptrace() at any given time. That list will generally be quite short. When a process exits, it is now necessary to look at its list of children and the ptrace list, but at no other processes. No more scalability problems.

Comments (2 posted)

How random is random enough?

Oliver Xymoron posted a set of /dev/random patches this week, introducing them with:

I've done an analysis of entropy collection and accounting in current Linux kernels and founds some major weaknesses and bugs. As entropy accounting is only one part of the security of the random number device, it's unlikely that these flaws are compromisable, nonetheless it makes sense to fix them.

Entropy, of course, can be thought of as the amount of random data the kernel currently has available for the creation of random numbers. The entropy pool is filled by looking at (hopefully) random events as seen by the processor - such as the timing of device interrupts. Oliver's claim is that the kernel is vastly overestimating the amount of entropy it is accumulating, and thus handing out numbers that are not as random as expected.

Some of the trouble comes from over-optimistic assumptions of the amount of randomness really contained in interrupt timings. Simply put, the resolution of interrupt timing is not what the kernel thinks it is. Oliver also claims that interrupt timing is often observable or controllable by hostile users. The timing of network packets has long been considered suspect for this very reason; Oliver says that disk timing is subject to the same sort of manipulation. Oliver has also pointed out a bug in the way timing samples are merged into the entropy pool.

Finally, Oliver claims:

Worst of all, the accounting of entropy transfers between the primary and secondary pools has been broken for quite some time and produces thousands of bits of entropy out of thin air.

Interestingly, this last one may not be a real bug - read Ted Ts'o's explanation of why things are done this way for the details. Generating random numbers that are resistant to guessing is a difficult task.

Oliver's fixes have the result of greatly reducing the amount of entropy available to the system, and thus the number of random numbers that can be obtained from /dev/random. Linus doesn't like this aspect of the patch; he fears that making /dev/random difficult to use will just cause people to not use it.

Randomness is like security: if you make it too hard to use, then you're shooting yourself in the foot, since people end up unable to practically use it.

If /dev/random can not obtain enough entropy to be useful, says Linus, it's probably better to just get rid of it altogether.

This discussion has reached no real resolution as of this writing, and the entropy patches have not been merged. Some sort of fix will likely go in at some point, once a compromise between "proper" entropy accounting and usefulness has been reached.

Comments (1 posted)

Patches and updates

Kernel trees

Architecture-specific

Build system

Core kernel code

Ingo Molnar O(1) sys_exit(), threading, scalable-exit-2.5.31-A6 "<q>this patch is the next step in the journey to get top-notch threading support implemented under Linux.</q>" ?

Development tools

Device drivers

Documentation

Denis Vlasenko lk maintainers ?
Roger Gammans Re: Some JBD documenation ?

Filesystems and block I/O

Anton Altaparmakov NTFS 2.1.0 1/7: Add config option for writing "<q>Below is the 1st of 7 ChangeSets updating NTFS to 2.1.0, which you will get when you bk pull the ntfs-2.5 repository. Together they implement file overwrite support for NTFS.</q>" ?
Christoph Hellwig Updated XFS merge status ?

Memory management

Rik van Riel rmap 14 ?
Rik van Riel rmap 14a ?

Security-related

Oliver Xymoron (0/4) Entropy accounting fixes "<q>I've done an analysis of entropy collection and accounting in current Linux kernels and founds some major weaknesses and bugs.</q>" ?
Oliver Xymoron (2/4) Update input drivers ?
Oliver Xymoron (3/4) SA_RANDOM user fixup ?

Miscellaneous

Thomas Molina 2.5 Problem Report status ?
Rusty Russell list_for_each_entry "<q>Using two variables all the time is pissing me off</q>". ?

Page editor: Jonathan Corbet
Next page: Distributions>>


Copyright © 2002, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds