Brief items
The current development kernel is still 2.5.31; Linus has not
released a development kernel (as of this writing) since August 10.
Linus has not been idle, however; his BitKeeper repository (which may well
be released as 2.5.32 by the time you read this) contains many
changes. At
the top of the list, of course, is the replacement of the IDE subsystem.
Other stuff merged by Linus includes some NFS changes, the "scalable exit"
patch from Ingo Molnar (see below) along with his other thread support
improvements, an ACPI update, a set of page cache improvements from Andrew
Morton, a new MTRR driver, more device model work, a new RTC driver, and a
very long list of other fixes and updates.
The latest 2.5 status summary from Guillaume
Boissiere came out on August 20.
The current stable kernel is 2.4.19. Marcelo released 2.4.20-pre4 on August 19; the biggest
change in this prepatch is the addition of the JFS journaling filesystem.
The current prepatch from Alan Cox is 2.4.20-pre2-ac6. The "ac" series looks to be
the testing area for new IDE
patches for some time, and thus may be, at times, less stable than people
have come to expect.
Comments (none posted)
Kernel development news
As covered on this week's front page, all of Marcin Dalecki's "IDE cleanup"
work has been removed from the 2.5.32 kernel and replaced with the 2.4
"foreport." That leaves the IDE code in a state not that far removed from
where it was when the 2.5 series started, and the Halloween freeze date is
getting closer. What is going to happen to the IDE code now, and who will
do it?
At the moment, nobody is stepping forward to be the next IDE maintainer.
For the time being it looks like Jens Axboe and Alan Cox are willing to
oversee new IDE work and filter it on its way to Linus - but they will not
necessarily do a lot of that work themselves. Alan has laid down some conditions, though:
I want order to this. That means all the driver cleanup goes into
2.4-ac (or "2.4-ide" or some suitable branch) first where we can
verify we aren't hitting 2.5 generic bugs and ide corruption is a
meaningful problem report. It means someone (not me) is the
appointed 2.5 person and handles stuff going to 2.5 (I'm happy to
identify stuff that tests ok in 2.4 as candidates). It also means
random patches not going past me.
If we can do it that way I'll do the job. If Linus applies random
IDE "cleanup" patches to his 2.5 tree that don't pass through Jens
and me then I'll just stop listening to 2.5 stuff.
In other words, the 2.4-ac tree becomes the development area for new IDE
work before it heads into 2.5. And Alan doesn't want to have to contend
with patches taking other paths into 2.5. (Alan has also posted the set of attributes an IDE maintainer should
have for anybody who is interested in the job).
What is going to happen with the IDE code? A few people have requested
that somebody pick up Marcin's work and finish the job, but nobody who is
actually working with IDE seems to have much interest in that. Quoting Alan again:
Its easier to go back to functionally correct code and do the job
nicely than to fix the 2.5.3x code. Right now I'm working on
Andre's current code in 2.4.20pre2-ac* starting off with only
provably identical transforms between AndreCode and C and
documenting it
So it looks like the 2.4 IDE implementation is here to stay. Or, at least,
something based on it - Andre Hedrick, as it turns out, has not been idle
during this time. He has a whole set of patches - much of which is already
in the -ac series - for nice things like Serial ATA, pluggable low-level
transport drivers, modular chipset support, etc. At this point, it's hard
to imagine this code not moving into 2.5 once it proves stable.
Linus has his own plans for the future of the
IDE code. These plans involve making some relatively minor changes to the
current IDE core, mostly around moving some functionality up toward the
block layer. Once that's done, development on a new "IDE-TNG" driver would
begin. The existing IDE code at that point would be mostly frozen and thus
remain stable; new work would happen in the new, scary, dangerous "TNG"
driver. Support for older hardware would be removed from the TNG driver,
allowing a great deal of historical cruft to be cleaned out.
In retrospect, creating a new version of the IDE subsystem was the obvious
way to carry out a major reworking of this code. You simply can not have a
fundamental layer like IDE be unstable for months and expect to get a lot
of other work done. The previous IDE transition (from the old "hd" driver)
was handled in this manner. Had Marcin's work been done this way, he might
well still be at it now.
As it is, the window of opportunity for major IDE work in 2.5 has closed.
There is time for smaller cleanups and the addition of needed features, but
nobody has any appetite for anything that would seriously destabilize IDE
again this close to the freeze date.
Comments (none posted)
Ingo Molnar's work to improve the kernel's support of threads was covered
here
last week. This week, Ingo has moved on
to the final part of a thread's life cycle: the
exit() call. It
turns out that the Linux
exit() implementation has some real
scalability problems, which are described and fixed in
this patch.
The cost of killing a process, it turns out, is proportional to the total
number of processes running. In situations where thousands of tasks are
running (and, remember, some threaded applications run thousands of
threads) the exit() call can become truly expensive.
Why is this happening? When a process exits, the kernel must "reparent"
all of its children to keep the process hierarchy consistent. This should
be a straightforward job, since each process keeps a list of its children
in the task_struct structure. Unfortunately, due to some
weirdness in how the ptrace() system call is handled, that list is
not sufficient. ptrace(), it seems, rearranges the process tree
so that the process being traced becomes a child of the process doing the
tracing. To find processes which have been temporarly relocated to a
"foster parent," the exit() system call must iterate over all
processes in the system. And that, of course, is where the scalability
problems come in.
Ingo's solution is simply to maintain a separate list of all processes
which are being debugged with ptrace() at any given time. That
list will generally be quite short. When a process exits, it is now
necessary to look at its list of children and the ptrace list, but
at no other processes. No more scalability problems.
Comments (2 posted)
Oliver Xymoron posted
a set of /dev/random
patches this week, introducing them with:
I've done an analysis of entropy collection and accounting in
current Linux kernels and founds some major weaknesses and bugs. As
entropy accounting is only one part of the security of the random
number device, it's unlikely that these flaws are compromisable,
nonetheless it makes sense to fix them.
Entropy, of course, can be thought of as the amount of random data the
kernel currently has available for the creation of random numbers. The
entropy pool is filled by looking at (hopefully) random events as seen by
the processor - such as the timing of device interrupts. Oliver's claim is
that the kernel is vastly overestimating the amount of entropy it is
accumulating, and thus handing out numbers that are not as random as
expected.
Some of the trouble comes from over-optimistic assumptions of the amount of
randomness really contained in interrupt timings. Simply put, the
resolution of interrupt timing is not what the kernel thinks it is. Oliver
also claims that interrupt timing is often observable or controllable by
hostile users. The timing of network packets has long been considered
suspect for this very reason; Oliver says that disk timing is subject to
the same sort of manipulation. Oliver has also pointed out a bug in the
way timing samples are merged into the entropy pool.
Finally, Oliver claims:
Worst of all, the accounting of entropy transfers between the
primary and secondary pools has been broken for quite some time and
produces thousands of bits of entropy out of thin air.
Interestingly, this last one may not be a real bug - read Ted Ts'o's explanation of why things are done
this way for the details. Generating random numbers that are resistant to
guessing is a difficult task.
Oliver's fixes have the result of greatly reducing the amount of entropy
available to the system, and thus the number of random numbers that can be
obtained from /dev/random. Linus doesn't like this aspect of the patch; he fears
that making /dev/random difficult to use will just cause people to
not use it.
Randomness is like security: if you make it too hard to use, then
you're shooting yourself in the foot, since people end up unable to
practically use it.
If /dev/random can not obtain enough entropy to be useful, says
Linus, it's probably better to just get rid of it altogether.
This discussion has reached no real resolution as of this writing, and the
entropy patches have not been merged. Some sort of fix will likely go in
at some point, once a compromise between "proper" entropy accounting and
usefulness has been reached.
Comments (1 posted)
Patches and updates
Kernel trees
Build system
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
- Anton Altaparmakov: NTFS 2.1.0 1/7: Add config option for writing. "<span>Below is the 1st of 7 ChangeSets updating NTFS to 2.1.0, which you
will get when you bk pull the ntfs-2.5 repository. Together they implement
file overwrite support for NTFS.</span>"
(August 21, 2002)
Memory management
- Rik van Riel: rmap 14.
(August 16, 2002)
- Rik van Riel: rmap 14a.
(August 19, 2002)
Architecture-specific
Security-related
- Oliver Xymoron: (0/4) Entropy accounting fixes. "<span>I've done an analysis of entropy collection and accounting in current
Linux kernels and founds some major weaknesses and bugs.</span>"
(August 19, 2002)
Miscellaneous
- Rusty Russell: list_for_each_entry. "<span>Using two variables all the time is pissing me off</span>."
(August 21, 2002)
Page editor: Jonathan Corbet
Next page: Distributions>>