Stable kernel 2.6.25 released [LWN.net]

Stable kernel 2.6.25 released

Posted Apr 17, 2008 19:51 UTC (Thu) by willy (subscriber, #9762) [Link] (11 responses)

I know there's a lot of stuff in there, but you've not mentioned the TASK_KILLABLE work which
I think has a hell of an impact on end-users.

No longer must you reboot a machine because you typed 'ls' in an NFS-mounted directory after
you wandered out of range of your access point.  Now, just press ^C, ls will die and you get
your prompt back.

NFS is only the beginning.  Liam and I have covered the pieces that bug us.  Now it's really
up to users to tell us where their tasks are hanging.

$ ps -eo pid,stat,wchan:40,comm |grep D

will give us useful information.

TASK_KILLABLE

Posted Apr 17, 2008 20:09 UTC (Thu) by pkolloch (subscriber, #21709) [Link]

Wow, I look forward to having that on my machine. 

Finally... I can't tell you how much that annoyed me (and similar hangs) in the last 10
years...

Stable kernel 2.6.25 released

Posted Apr 17, 2008 20:09 UTC (Thu) by nix (subscriber, #2304) [Link]

For what it's worth, the sysadmins at work literally jumped for joy when 
they heard of this. (This is using the word 'literally' correctly, too, 
rather than the too-common usage meaning 'not literally': they jumped up 
and down.)

Stable kernel 2.6.25 released

Posted Apr 18, 2008 6:17 UTC (Fri) by alankila (guest, #47141) [Link]

That is an amazingly useful feature. Kudos.

TASK_KILLABLE

Posted Apr 18, 2008 14:42 UTC (Fri) by corbet (editor, #1) [Link]

TASK_KILLABLE has been on my radar - I mentioned it in a merge window summary early in the 2.6.25 cycle. Always meant to look at it a bit more, but it fell through the cracks somehow. That happens to me a lot.

Apologies, anyway. This is, indeed, a worthwhile development. The dreaded "stuck in D state" problem has been there for as long as I've worked with Unix-like systems - a long time. It's about time somebody made things work better.

Stable kernel 2.6.25 released

Posted Apr 19, 2008 7:41 UTC (Sat) by mkerrisk (subscriber, #1978) [Link] (6 responses)

> Now, just press ^C, ls will die and you get
> your prompt back.

Willy, you say ^C (i.e., SIGINT), but looking at fatal_signal_pending() the only signal that
seems to be matched is SIGKILL, so I'd have thought ^C wouldn't be sufficient.  What am I
missing?

^C vs SIGKILL

Posted Apr 19, 2008 13:27 UTC (Sat) by willy (subscriber, #9762) [Link] (5 responses)

You are entirely correct in that ^C by convention sends a SIGINT to the current foreground
process.  The kernel translates unhandled terminating signals into SIGKILL, so all I have to
do in my patchset is check whether SIGKILL is pending.  If the process has installed a handler
for SIGINT, pressing ^C will have no effect and you will have to resort to kill -9.  So I was
being sloppy when I said ^C, but most processes do not install a handler for SIGINT, and so
will receive a SIGKILL when the user presses ^C.

One notable task which does install a handler for SIGINT is bash, so trying to do filename
completion on a down NFS server will cause bash to only be killable by -9.  That's a user
experience I'd like to improve, but I haven't come up with a good solution yet.  I have some
bad solutions:
 - fork() in bash to do filename completion.  Who wouldn't love a fork() to happen every time
they press tab?  ;-)
 - Introduce new readdir() variants which indicate they're interruptible (not just killable),
then percolate that knowledge all the way down the stack.  IIRC, Linus already NACKed that
idea.

Thanks for asking the question.  Signal handling is a complex beast, and I initially thought
that I was going to have to do a more complex check for fatal_signal_pending(), but I just
implemented the check for SIGKILL first, and to my surprise ^C worked, so I dug a little
deeper and found the translation.  Rewarded for being lazy ... who doesn't love that ;-)

^C vs SIGKILL

Posted Apr 19, 2008 14:21 UTC (Sat) by mkerrisk (subscriber, #1978) [Link] (2 responses)

> The kernel translates unhandled terminating signals into SIGKILL

Where does that translation occur?  I couldn't see it.

^C vs SIGKILL

Posted Apr 19, 2008 14:31 UTC (Sat) by willy (subscriber, #9762) [Link] (1 responses)

I see two places where we send SIGKILL in kernel/signal.c.  The first is in
__group_complete_signal():

        if (sig_fatal(p, sig) && !(p->signal->flags & SIGNAL_GROUP_EXIT) &&
            !sigismember(&t->real_blocked, sig) &&
            (sig == SIGKILL || !(t->ptrace & PT_PTRACED))) {
[...]
                        do {
                                sigaddset(&t->pending.signal, SIGKILL);
                                signal_wake_up(t, 1);
                        } while_each_thread(p, t);

The second is in zap_other_threads():
                sigaddset(&t->pending.signal, SIGKILL);
                signal_wake_up(t, 1);

I'm pretty sure the first case is where it happens.

^C vs SIGKILL

Posted Apr 19, 2008 17:14 UTC (Sat) by mkerrisk (subscriber, #1978) [Link]

> I'm pretty sure the first case is where it happens.

Yes, looks right to me.  Thanks for the pointer.

^C vs SIGKILL

Posted Apr 21, 2008 7:18 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (1 responses)

I think leaving bash hung-up until a system administrator sends KILL to it is acceptable in
this circumstance. I'd say the same about server applications I've written which have a signal
handler only to write a log message explaining why they died (otherwise you have mysteries
where the server isn't running and no-one knows why... until they discover a division by zero
error weeks later and realise it had exited on SIGFPE)

The real frustration isn't that Ctrl-C doesn't work, but that nothing works, and you've fixed
that. Having Ctrl-C be effective in such circumstances most of the time is merely a welcome
bonus. Similarly making these processes able to be killed by the OOM killer is also a bonus.

Is this an example of genuine innovation from Linux (even if it is one that most people won't
understand) ? Or is it copied from some other Unix which I'm not familiar with ?

^C vs SIGKILL

Posted Apr 21, 2008 11:52 UTC (Mon) by willy (subscriber, #9762) [Link]

I think you only find it acceptable because it's such a huge improvement over what went before
;-)

This idea, as far as I'm aware, is the product of Linus' Big Brain back in 2002.  The strange
thing is that nobody bothered to do it before now.

Stable kernel 2.6.25 released

Posted Apr 18, 2008 1:35 UTC (Fri) by djabsolut (guest, #12799) [Link] (6 responses)

better kernel support for Intel and ATI R500 graphics chipsets

Pardon my ignorance, but what does that exactly mean? The patches in question seem to make the kernel aware of the device, but not much else. I'm asking this as I'm interested in the kernel / X.org driver disconnect -- is there a page/site which would explain the current state of graphics drivers within the kernel? (or to put it another way, why are graphics drivers being included in X.org instead of directly in the kernel?)

Stable kernel 2.6.25 released

Posted Apr 18, 2008 6:56 UTC (Fri) by dlang (guest, #313) [Link]

to answer your final question (why are drivers in X instead of the kernel)

it basicly boils down to historic reasons, it's always been done that way.

X works on many different systems with different kernels and it used the same drivers on all
of them.

there's work being done currently to define an interface between kernel drivers and X that
will shift this boundry significantly, but they will need to keep much of the old code around
to support other (and older) systems

Re: kernel/xorg disconnect

Posted Apr 18, 2008 7:34 UTC (Fri) by Duncan (guest, #6647) [Link] (4 responses)

> [W]hy are graphics drivers being included in
> X.org instead of directly in the kernel?

The following is based on what I've read as I've followed X events myself, 
not on any particular first-hand knowledge I have, as I most certainly 
don't, at least in this area!  If aspects or even the entire picture are 
wrong, please, someone correct me!

This is to a large degree historical legacy.  X11 thru xfree86 and 
therefore its in-practice successor xorg were/are designed to be a 
specific platform agnostic *ix graphics interface.  As such, formerly, the 
push was to keep for-the-most-part unified drivers in X, only putting in 
the various kernels enough functionality to expose the hardware interface, 
interrupts and etc, to the user-space X.  Back when X was more a 
networkable protocol with remote (from the user perspective) client and 
local server aspects and less a direct driver of the 3D hardware common 
today (to the point where 2D is legacy and now not necessarily included at 
all, the 3D hardware handles it), that made a lot of sense.

Today, as direct (and therefore local) 3D hardware interaction becomes 
more important, with extension support now and planned traditional X 
emulation on 3D OpenGL direct implementations a generation or two in the 
future, the latency and other issues related to all those kernel/userspace 
transitions now necessary to directly access that hardware are taking 
their toll, and the push is in the opposite direction.  Development into 
the future is pushing toward each platform implementing its own native 
drivers and code, with higher level common "stub" drivers in X.  As 
mentioned, this is likely to take the form of kernel drivers implementing 
OpenGL directly on the hardware, while exposing a common interface to 
software, both 2D/3D direct, likely mostly in the kernel, and a 
traditional 2D X, likely implemented mostly in X as a mini-driver 
interfacing with the kernel drivers.

Note that a specific goal of this transition is to maintain and actually 
improve X's traditional network transparency.  By design, it's not going 
away anytime soon.  One improvement will be that due to the layering, it 
should actually be possible (over a suitable link and with a certain loss 
in speed, but acceptable in certain applications none-the-less) to do 3D 
over the network as well, while that aspect is currently extremely 
limited.  The effect is likely to be somewhat like trying to run software 
3D today, limited and slower than direct hardware access 3D, but it'll be 
possible, where it basically isn't today.

However, that's a way off still, with a number of intermediate steps in 
the middle.  Linux and the BSDs (among others, Solaris, OSX...) will need 
to gradually increase their driver functionality in step with xorg's 
development (and xfree86 still exists too, AFAIK, with at least some of 
the BSDs having moved rather more slowly to xorg and only now defaulting 
to it, with both still options) and exposure of mini-drivers and a new 
common direct hardware access OpenGL API that the kernel drivers can all 
be written to implement on their side of the interface.  We're talking 
several years of effort, on a journey that's likely to have various 
unexpected twists and turns of its own, such that the eventual 
implementation may not look much like it has been described here at all, 
altho chances are it'll be fairly close and only in the journey will we 
find the necessary different paths that end up being taken.

Confusing to the newbie it certainly can be, but that's what I've gathered 
from the various articles I've read.  Hope it clarifies things a bit for 
you as it has for me.

Duncan

Re: kernel/xorg disconnect

Posted Apr 18, 2008 15:49 UTC (Fri) by johnkarp (guest, #39285) [Link]

GLX, which is the X11 extension for OpenGL support, has supported 
networked connections since the beginning. And it does work... when I was 
a student I ran a graphical application from my dorm room to a SGI 
workstation on the other side of campus, and it was quite useable.

Re: kernel/xorg disconnect

Posted Apr 18, 2008 23:01 UTC (Fri) by tialaramex (subscriber, #21167) [Link] (2 responses)

The kernel isn't going to do OpenGL. Doesn't want to do OpenGL. Doesn't need to do OpenGL.

OpenGL is huge. The kernel is interested only in mediating userspace access to your graphics
hardware. So it does not need to care whether you are sending co-ordinates mixed with color
information in the form of 16-bit "half precision" floating point values, which is something
you might do with OpenGL -- but instead only cares about whether your userspace program has
permission to write to this particular part of the card's RAM, or to twiddle some particular
register in the chipset.

And yes, like the other poster said, OpenGL already runs fine using GLX on a remote machine,
so long as you have enough bandwidth, and a good enough driver. nVidia's (sadly proprietary)
driver, plus a cheap 1000baseT network, is enough to play relatively recent 3D games on one
machine, with the display on another. I've seen Quake 3 played that way a few years back, and
one of the MMORPGs. Network transparency already applies just the same to 3D graphics as with
2D.

Re: kernel/xorg disconnect

Posted Apr 21, 2008 5:07 UTC (Mon) by djabsolut (guest, #12799) [Link] (1 responses)

The kernel isn't going to do OpenGL. Doesn't want to do OpenGL. Doesn't need to do OpenGL. OpenGL is huge. The kernel is interested only in mediating userspace access to your graphics hardware.

Do correct me if I'm wrong -- isn't the job of the kernel to not only mediate access, but also to abstract hardware? Allowing a user-space program to twiddle a particular register (chip specific, by implication) sounds like weaselly half-a-driver here and half-a-driver there approach. The second part of the driver allegedly lives in user-space, but for all intents and purposes it's an extension to the kernel -- this "user-space" program is directly banging the hardware.

Shouldn't the kernel provide a set of primitives on which a hardware-neutral OpenGL / Xorg is built in userspace ?

Re: kernel/xorg disconnect

Posted Apr 21, 2008 19:18 UTC (Mon) by Spudd86 (subscriber, #51683) [Link]

OpenGL is a massive abstraction, what might make sense is for the kernel to provide an
abstraction that is more low level and closer to the hardware than OpenGL that a generic
OpenGL implementation can be built on.

This is essentially what the Gallium3D people are working on making, although I can't recall
how much of that stuff is gonna be in kernel.

Modern 3D hardware can be driven primarily by writing to it's memory so the driver can
actually run in userspace for little or no penalty, possibly even requiring fewer transitions
from kernel to userspace and so actually being faster, this however applies slightly less to
older less general graphics hardware.

To find out lots about this sort of thing poke around on the nouveau wiki since the direction
taken by them is likely to be the direction taken by other open source drivers, and they are
moving some things into the kernel that are at the moment userspace only things.

dm-raid1 improvements

Posted Apr 19, 2008 7:31 UTC (Sat) by wolfgang.oertl (guest, #7418) [Link] (6 responses)

Another new feature worth mentioning is that drivers/md/dm-raid1.c (for mirroring) finally
does read balancing, not just reading from the first mirror (the older md implementation in
drivers/md/raid1.c has had this already).  The patch name is "dm raid1: handle read failures"
(2008-02-08).

dm-raid1 improvements

Posted Apr 19, 2008 16:15 UTC (Sat) by sbergman27 (guest, #10767) [Link] (5 responses)

What does this mean, exactly, for someone who set up a Fedora 8 server with software raid1
using the standard tools in anaconda?  Have I not been getting read balancing, but will get it
when F8 moves to 2.6.25?

dm-raid1 improvements

Posted Apr 20, 2008 8:10 UTC (Sun) by wolfgang.oertl (guest, #7418) [Link] (4 responses)

I'm not familiar with Fedora, but you can check whether your setup uses the device mapper or
the "md" drivers: type "mount" - if you see lines like /dev/mapper/some-thing then dm is used,
and you don't have read balancing yet. OTOH if /dev/md0 and similar is mounted, this is not an
issue.  Another option is to monitor drive activity with some GUI tool to see whether reads
are distributed to all drives of the raid, or have a look at /proc/partitions to check the
read counters.

dm-raid1 improvements

Posted Apr 20, 2008 10:46 UTC (Sun) by nix (subscriber, #2304) [Link] (3 responses)

Really?

nix@loki 10 /home/nix% df -Pk / Filesystem 1024-blocks Used Available Capacity Mounted on
/dev/raid/root 198344 47412 140692 26% /

Oh, I can't be using md then: odd, I didn't know they had RAID-5 md-raid yet.

Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : active raid5 sda6[0] hdc5[3] sdb6[1]
76807296 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3]
[UUU]
[...]

Hint: dm allows for a stack of block devices, and you can layer md both above it and (more usefully) below it. This particular system is using the fairly common configuration of LVM-atop-MD, thus is a dm-atop-md system.

Checking whether you are using the device mapper is not the same thing as checking whether you are using dm-raid.

dm-raid1 improvements

Posted Apr 20, 2008 17:59 UTC (Sun) by sbergman27 (guest, #10767) [Link] (2 responses)

Well, I am using the mapper.  And if I dd from the /dev/mapper/VolGroupxx/LogVolxx/ to
/dev/null and then do the same with /dev/sdax, which is the first member of the raid1 array, I
get about 61MB/sec in both cases.  So I guess I'm not getting read balancing in F8.  Or, come
to think of it, is the read balancing that fine-grained?  I recall somebody telling me, a
while back, that it only balanced reads from different processes, and not from the same
process, or something like that.

dm-raid1 improvements

Posted Apr 20, 2008 18:21 UTC (Sun) by nix (subscriber, #2304) [Link] (1 responses)

I can't imagine dm could tell which process was originating the request, 
even if it wanted to. It's all one by that point.

dm-raid1 improvements

Posted Apr 20, 2008 18:44 UTC (Sun) by sbergman27 (guest, #10767) [Link]

Seemed odd to me, too.  Another odd thing is that this kernel release, which I had not really
thought was anything all that special, actually has not one, but at least two bombshell
features.  How did I miss the loud outcry which surely occurred when read balance support in
raid1 was lost?  And why have both its return, and the new TASK_KILLABLE functionality gotten
so little notice?