LWN: Comments on "Interview with Con Kolivas (APC)"

About documentation...

anders.blomdell@control.lth.se — Thu, 07 Feb 2008 12:32:46 +0000

I had similar problems with sluggish performance when doing heavy disk I/O, turning off
Hyper-Threading (in the BIOS) on the processor put the system back to more normal behavior
(i.e. worst-case blocking for syslog dropped from 25 seconds to 9 milliseconds).

time it takes to get a project into the upstream kernel

anandsr21 — Thu, 02 Aug 2007 07:54:41 +0000

If you wanted to reach point A from point B when there is no known path, then evolution is the fastest way to find the path. There is nothing better. It will try all remotely possible paths and discard the bad ones. It will also find the most efficient path. I think evolution is the best way to go for Open Source Software development. Any thing else is wasting time. Resources on the other hand are meant to be wasted, as you couldn't control them anyway.

Interview with Con Kolivas (APC)

daenzer — Tue, 31 Jul 2007 15:38:12 +0000

> It's widely known that firefox, for example, works much better under
> Windows than Linux.

From what I've heard so far that seems more likely due to its inefficient use of X than due to the kernel though, FWIW.

kernel developers who care about the desktop

mingo — Mon, 30 Jul 2007 09:17:10 +0000

Con Kolivas wrote:

Have you tried using ionice? You'll find it has never worked.

hi Con - this is the first time i've seen you characterise ionice as "never working", so your statement is quite surprising to me - could you please elaborate on this criticism? Have you reported your problems with ionice to lkml?

I've used ionice myself and it works well within its boundaries.

time it takes to get a project into the upstream kernel

maney — Fri, 27 Jul 2007 23:42:42 +0000

That after billions of years of mindless tinkering something interesting results doesn't mean that it is "efficient".

Granted that efficient may not be the best possible world, but look at what you just said. Evolution got us from complete mindlessness to sapience - brains from nothing. Efficient is damning it with faint praise...

If there was an easy way to keep changes separate, that didn't imply intense maintaining efforts, none of this would happen.

And if pigs had wings, they would fly. (is that too blunt? if so, I think it is nonetheless exactly true: not requiring considerable effort is tantamount to asking for the rate of change to be turned down, and while there could be some good reasons for that, I don't think that making it easier for external patches to limp along without ever progressing towards being included (or rejected) is remotely one such.)

kernel developers who care about the desktop

conman — Thu, 26 Jul 2007 23:08:55 +0000

Have you tried using ionice? You'll find it has never worked.

Benchmarks, scheduler instrumentation, PowerTop

mingo — Thu, 26 Jul 2007 21:48:25 +0000

Yeah, it's on by default - got merged early in the .23 merge window and is included in the 2.6.23-rc1 kernel (and -git*).

Benchmarks, scheduler instrumentation, PowerTop

pheldens — Thu, 26 Jul 2007 16:33:59 +0000

Call me blind, but how do i turn on CFS in 23-rc1(-git*)?
I dont see the .config options anywhere, or is it on by default?

time it takes to get a project into the upstream kernel

rmstar — Thu, 26 Jul 2007 15:31:35 +0000

In Linux we reject _lots_ of code, and that's the only way to create a quality kernel. It's a bit like evolutionary selection: breathtakingly wasteful and incredibly efficient at the same time.

In general, evolutionary algorithms have never had a serious breakthrough because turning on your brain (as long as one is involved) tends to produce much better results. The bottom line is: evolutionary selection is just wasteful, period. That after billions of years of mindless tinkering something interesting results doesn't mean that it is "efficient".

Back to the linux kernel, it seems to me that the fundamental human problem behind kernel development is that stuff has to get "approved" and merged in. If there was an easy way to keep changes separate, that didn't imply intense maintaining efforts, none of this would happen. We would have dozens of schedulers and VMs, the best would be used most, progress would be very fast, and there would be less fights and frustration.

The fact that good, motivated people that have a positive impact are leaving in frustration is not good at all. Please stop rationalizing it.

desktop or server latency

mingo — Thu, 26 Jul 2007 14:39:36 +0000

Yeah, the discussion weered off to general kernel latencies.

As for your "xterm takes a long time to start during a dd", does CFQ (and, optionally, ionice) solve the problem?

kernel developers who care about the desktop

mingo — Thu, 26 Jul 2007 14:36:09 +0000

ionice will solve that problem for you - it has been available since 2.6.13.

Interview with Con Kolivas (APC)

rwmj — Thu, 26 Jul 2007 11:58:25 +0000

Well, I get similar problems on my Athlon machines :-(

I suspect that the problem may lie with SATA itself. It certainly
feels much worse than IDE did, but again just subjectively - I
haven't got any hard figures.

Rich.

Interview with Con Kolivas (APC)

nix — Thu, 26 Jul 2007 09:28:01 +0000

Um, what do you think hardware acceleration *is*? You're talking about software offloaded to a specialized coprocessor as if it weren't hardware acceleration, but of course it is: the coprocessor is often specialized to some degree, or has privileged access to hardware the CPU can't see and has nothing else to do so it can do things with harsh latency bounds.

Interview with Con Kolivas (APC)

bojan — Thu, 26 Jul 2007 02:45:11 +0000

> Until there's a compelling case that The Desktop is obviously better under Linux than Windows, there won't be the numbers, and until there are the numbers, the kernel devs won't be inclined to do the work to make it better.

People don't always select products based on "better" - sometimes cheaper is more important, other times more flexible and so on. So, hopefully, the acceptance will eventually rise to the point where companies will have people working full time on Linux features that are desktop related, just like they have now for the server stuff.

But, yeah, it is a chicken and egg to some extent, unfortunately. But I don't think it's because of "goodness". It probably has more to do with the fact that the vast majority of businesses have one or more Windows apps they absolutely cannot do without, so it keeps them tied to that platform.

kernel developers who care about the desktop

ras — Thu, 26 Jul 2007 01:52:37 +0000

Your point about IO & VM schedulers being the dominant cause for slowdowns rings true, and will become more so with the move to multiple CPU's. Most desktop machines have a load factor of less than 4, so with 4 CPU's every task that wants to run can have its own CPU and there is nothing the schedule.

The response time of a the machine when a disk bound is running task is a major issue. To the extent that in 2.6.8, you could literally kill a machine with a mke2fs because it flooded the block cache, invoking the OOM killer which then took out some important process or other. It doesn't do that with newer kernels, but it can still delay the start up of vi by a minute or two.

Think about it. A single mke2fs, nice'ed to 19, can bring a 4 CPU machine to its knees. You can tell the CPU scheduler you want mke2fs to only take a small percentage of the available CPU time, but apparently there is no way to tell the IO scheduler its not allowed hog 95% of memory by swamping the block cache. Yuk!

About documentation...

i3839 — Wed, 25 Jul 2007 23:41:31 +0000

I get compile errors with that patch, I'll send them by email.

About documentation...

i3839 — Wed, 25 Jul 2007 23:38:02 +0000

Interesting info. But nothing about the fields I asked about. ;-)

Anyway, if those knobs only appear with CONFIG_SCHED_DEBUG enabled, I think it's better to document them in the Kconfig entry than in that documentation file. That way people interested in it can find it easily, and if the debug option ever disappears the help file won't need to be updated. When deciding whether to enable an option people look at the Kconfig text, so give all info they need to know there.

sum_exec_runtime/sum_wait_runtime seems also interesting. The ratio is 1 to 1442 for ksoftirqd (it ran for 5 ms and it waited 7 seconds for that, ouch). While wait_runtime_overruns is 232 and zero underruns. (Sure those fields aren't swapped accidentally?)

events/0 info is also interesting, it has a se.block_max of 1.1 second, which seems suspiciously high.

se.wait_runtime inclused the time a task slept, right? Otherwise it should be zero for all tasks that are sleeping, and that isn't the case.

Another strange thing is that really a lot tasks have almost the same block_max of 1.818 or 1.1816 seconds. The lower digits are so close together that it seems like all tasks were blocked and unblocked at the same time. Oh wait, that is probably caused by resume from/suspend to ram.

Linus and the desktop

i3839 — Wed, 25 Jul 2007 22:57:15 +0000

Enabling composite can help a lot too. It improved a lot since last time I tried (running xcompmgr -a).

Hardware is cheap as dirt nowaday, but still can't find much motivation to upgrade my Duron 1300 and 256 MB ram. Listening to other people's complaints it almost seems that the more ram and the faster your cpu the slower the system is. ;-)

About documentation...

i3839 — Wed, 25 Jul 2007 22:50:32 +0000

Ok, I'll keep an eye on it. Running PREEMPT here.

I suppose the best way to track any anomalies down is by applying latency-tracing-v2.6.23-rc1-combo.patch at your homepage? WAKEUP_TIMING seems slightly redundant now. I'll enable it anyway.

desktop or server latency

jwb — Wed, 25 Jul 2007 22:35:24 +0000

I'm not sure how we went from I/O latency to CPU scheduler latency. The fact that an xterm takes a full minute to start has little or nothing to do with the CPU in my experience. It has to do with the fact that the process wants to mmap libXt, and the kernel puts that request at the back of some gigantic list of operations and doesn't call it back for tens of seconds. So fiddling with the CPU scheduler isn't going to change that problem.

desktop or server latency

mingo — Wed, 25 Jul 2007 22:29:13 +0000

Actually I have taken a deeper look and it seems that my kernel preemption configuration under feisty is the same as it was in edgy, CONFIG_PREEMPT_VOLUNTARY=y. Maybe the patch set applied is a bit different, or maybe desktop latency has just degraded from 2.6.17 to 2.6.20. Let us hope someone there is still willing to work on it.

Hm, could you try the -rt kernel? (2.6.22.1-rt8 would be a good pick. It also includes CFS.)

The -rt kernel has the latest round of latency fixes - not all are upstream yet. For the past 2-3 years we've been working hard on eliminating various sources of latencies. (and the scheduler was never a big factor for that.)

Since v2.6.12 or so we did over a 100 latency fixes alone (some of those were large kernel features like PREEMPT_VOLUNTARY or PREEMPT_BKL), as part of the -rt project. I maintained the low-latency patchset before that for years.

If -rt does not fix your problem then please report it to lkml and to me (or to rt-linux-users@vger.kernel.org), we are definitely interested in not yet fixed sources of latencies.

About documentation...

i3839 — Wed, 25 Jul 2007 22:21:28 +0000

se.sleep_max is more interesting if you want to get most out of dynticks I suppose. Currently I'm mainly interested in wait_max.

Yes, I noticed how multithreaded apps were handled, but forgot to take account for that in my grep. The numbers are in the same range (But I resetted X and FF, and I don't see X high in the list now, FF is though).

(I'm interested in this because I wonder where the strange keyboard/mouse behaviour I irregularly get comes from. Warping mouse, "sticking" keys, since 2.6.22 or so, but could be shaky hardware too. Pain to debug).

About documentation...

mingo — Wed, 25 Jul 2007 22:18:52 +0000

Ok. Perhaps the 100+ msecs ksoftirqd delay was during bootup. Or if you are running a !CONFIG_PREEMPT kernel such delays could happen too. But if you are running a CONFIG_PREEMPT kernel and ksoftirqd shows such large latencies even after resetting its counters, that would be an anomaly. (if so then please report it to me and/or to lkml in email.)

About documentation...

i3839 — Wed, 25 Jul 2007 22:03:47 +0000

ksoftirqd has prio 115 and X has prio 120. I didn't nice anything, so it's all default (all kernel threads at -5, user processes 0, except for pppd at -2 and udevd at -4).

About documentation...

i3839 — Wed, 25 Jul 2007 21:54:21 +0000

Yes, I figured that out by doing the only logical thing I could think of. :-)
Good interface.

Interview with Con Kolivas (APC)

NCunningham — Wed, 25 Jul 2007 21:45:41 +0000

Heh. I assume that's tongue in cheek. Distros seeking to get closer to mainline isn't at all the problem. It would be nice if they picked up our patches, but it's perfectly understandable that they should want to minimise the work involved in maintaining their kernels - I'm seeking to do that too.

No, the problem I have is that Andrew and Linus have stated flat out that they don't want to merge another implementation of Suspend to disk, but at the same time Andrew at least is jumping up and down over kexec based hibernation as if it's going to be the panacea. He's almost ready to merge the first patch that appears that's remotely a step towards the idea. That's what's getting to me.

About documentation...

mingo — Wed, 25 Jul 2007 21:23:21 +0000

If people are expected to ever use these knobs, it might be good to document what those wakeup and stat variants are, and the meaning of sched_features. When that's done all fields are easy to understand.

Yeah, i'll do that. _Normally_ you should not need to change any knobs - the scheduler auto-tunes itself. That's why they are only accessible under CONFIG_SCHED_DEBUG. (But it helps when diagnosing scheduler problems that you can tune various aspects of it without having to reboot the kernel.)

One other interesting field is sum_exec_runtime versus sum_wait_runtime: the accumulated amount of time spent on the CPU, compared to the time the task had to wait for getting on the CPU.

The "sum_exec_runtime/nr_switches" number is also interesting: it shows the average time ('scheduling atom') a task has spent executing on the CPU between two context-switches. The lower this value, the more context-switching-happy a task is.

se.wait_runtime is a scheduler-internal metric that shows how much out-of-balance this task's execution history is compared to what execution time it could get on a "perfect, ideal multi-tasking CPU". So if wait_runtime gets negative that means it has spent more time on the CPU than it should have. If wait_runtime gets positive that means it has spent less time than it "should have". CFS sorts tasks in an rbtree with this value as a key and uses this value to choose the next task to run. (with lots of additional details - but this is the raw scheme.) It will pick the task with the largest wait_runtime value. (i.e. the task that is most in need of CPU time.)

This mechanism and implementation is basically not comparable to SD in any way, the two schedulers are so different. Basically the only common thing between them is that both aim to schedule tasks "fairly" - but even the definition of "fairness" is different: SD strictly considers time spent on the CPU and on the runqueue, CFS takes time spent sleeping into account as well. (and hence the approach of "sleep average" and the act of "rewarding" sleepy tasks, which was the main interactivity mechanism of the old scheduler, survives in CFS. Con was fundamentally against sleep-average methods. CFS tried to be a no-tradeoffs replacement for the existing scheduler and the sleeper-fairness method was key to that.)

This (and other) design differences and approaches - not surprisingly - produced two completely different scheduler implementations. Anyone who has tried both schedulers will attest to the fact that they "feel" differently and behave differently as well.

Due to these fundamental design differences the data structures and algorithms are necessarily very different, so there was basically no opportunity to share code (besides the scheduler glue code that was already in sched.c), and there's only 1 line of code in common between CFS and SD (out of thousands of lines of code):

  * This idea comes from the SD scheduler of Con Kolivas:
  */
 static inline void sched_init_granularity(void)
 {
         unsigned int factor = 1 + ilog2(num_online_cpus());

This boot-time "ilog2()" tuning based on the number of CPUs available is a tuning approach i saw in SD and i asked Con whether i could use it in CFS. (to which Con kindly agreed.)

Solarix booting from ZFS?

akumria — Wed, 25 Jul 2007 21:16:48 +0000

Grub still needs to read the boot archive somehow.

Indeed. I believe they use Sun GPLv2 ZFS code to do so.

About documentation...

mingo — Wed, 25 Jul 2007 20:50:57 +0000

Anyway, when hunting for latency spikes, sluggish apps and similar creatures, I guess the se.wait_max and se.block_max are most interesting?

Yes. There's also se.sleep_max - that's the maximum time the task spent sleeping voluntarily. ('se' stands for 'scheduling entity' - a task here)

block_max stands for the maximum involuntary delay. (waiting for disk IO, etc.)

wait_max stands for the maximum delay that a task saw, from the point it got on the runqueue to the point it actually started executing its first instruction.

Note that for multithreaded apps such as firefox all the worker threads are not in /proc/[PID]/sched but in /proc/[PID]/task/[TID]/sched. Often firefox latencies are in those threads not in the main thread.

About documentation...

mingo — Wed, 25 Jul 2007 20:46:37 +0000

So assuming that these values are in nanoseconds, ksoftirqd waited at most 111 ms before it could finally run, and X 81 ms.

Yes, the values are in nanoseconds. What priority does it have? [the prio field in /proc/[PID]/sched file] If it's niced to +19 then a longer delay is possible because other, high-prio tasks might delay it.

About documentation...

mingo — Wed, 25 Jul 2007 20:43:53 +0000

any way to reset those stats?

Yeah: you can reset them on a per-task/thread basis by writing 0 to the /proc/[PID]/sched file. Then they'll go down to 0. (Unprivileged users can do it to, to their own tasks. root can reset it for everyone.)

You can reset it periodically as well if you want to sample/profile their typical sleep/block behavior.

About documentation...

i3839 — Wed, 25 Jul 2007 20:16:14 +0000

When searching for documentation about that /proc/<PID>/sched file, which I couldn't find, I noticed that sched-design.txt is outdated and could be removed, and that the new sched-design-CFS.txt is already lagging behind:

$ ls /proc/sys/kernel/sched_*
/proc/sys/kernel/sched_batch_wakeup_granularity_ns
/proc/sys/kernel/sched_child_runs_first
/proc/sys/kernel/sched_features
/proc/sys/kernel/sched_granularity_ns
/proc/sys/kernel/sched_runtime_limit_ns
/proc/sys/kernel/sched_stat_granularity_ns
/proc/sys/kernel/sched_wakeup_granularity_ns

But only sched_granularity_ns is documented.

If people are expected to ever use these knobs, it might be good to document what those wakeup and stat variants are, and the meaning of sched_features. When that's done all fields are easy to understand.

Interpreting and using /proc/<PID>/sched and /proc/sched_debug would also be much easier if they were documented, though as it's a debugging feature it's less important. But still.

Anyway, when hunting for latency spikes, sluggish apps and similar creatures, I guess the se.wait_max and se.block_max are most interesting?

A bit poking to get the top offenders turns up:

proc # grep se.wait_max */sched | sort -n -k 3 | tail -n 2
1182/sched:se.wait_max : 81381345
3/sched:se.wait_max : 111139352

proc # grep se.block_max */sched | sort -n -k 3 | tail -n 3
1182/sched:se.block_max : 3749201713
367/sched:se.block_max : 3938538101
721/sched:se.block_max : 4027921788

proc # ps 3 1182 367 721
PID TTY STAT TIME COMMAND
3 ? S< 0:00 [ksoftirqd/0]
367 ? S< 0:02 [kjournald]
721 ? Ss 0:00 /usr/sbin/syslogd -m 0
1182 ? Ss+ 17:25 X :0 -dpi 96 -nolisten tcp -br

So assuming that these values are in nanoseconds, ksoftirqd waited at most 111 ms before it could finally run, and X 81 ms.

And kjournald, syslogd and X blocked at most about 4 seconds on IO, which is a bit worrying, especially X as it's in both lists.

kjournald probably does huge IO request to get optimal throughput, but even then 4 seconds is bad.

syslogd does synchronous writes, so it's not that strange that it's top in the list. But it won't write much more than 100KB at once, so that's slightly scary too, but not too worrisome.

As for X, it doesn't seem to do much file IO, so it's probably blocking on something else. Video hardware? Unix sockets? Unclear, but 4s isn't healthy. Hopefully both cases happened at X startup and it's less serious than it looks (any way to reset those stats?).

Interview with Con Kolivas (APC)

drag — Wed, 25 Jul 2007 20:02:51 +0000

*shrug*

Think of your processor as if it's a black hole.

Wafers are getting bigger and purer. Fab processes are getting faster and smaller. You just have SO MANY TRANSISTORS. They got so many that they are throwing 4 or more cores on a single die.

Intel has a research processor that has 80 cores on it.

WTF is any desktop going to do with 80 cores? Sure 2 cores is great, and 4 is pretty good. 8 is so-so, but your looking at 32 or 64 cores you simply are not going to see any improvement in performance! So what is happening is that all the functionality of your computer is just going to get sucked into that processor, piece by piece. Your video card, your wifi, your north bridge, your sound midi, your modem, etc etc etc.

It's cheaper, faster, more reliable, more energy efficient.

> They caught up because of specialized hardware, specifically 3DFX cards and sound cards

Physics acceleration? It's a joke.

Sound card acceleration? It's dead and dying, killed by it's own patents. You can blame Creative for that one. Wait till their patents dry up then you'll see real innovation in realistic 3D sound. It'll be all software, though.

3d Acceleration? The movement is torwards cpu cores of different types, specialized for specific workloads.

State of the art "hardware acceleration" graphics nowadays for video graphics is you take proprietary shader language and you compile it into to binaries to run on your GPU. Sounds familar? Doesn't sound like 'hardware acceleration' to me, it sounds more like regular old software on a special cpu.

If it wasn't for the fact that ATI and Nvidia were such A-holes about their 'IP' we would be compiling are software to run on both the CPU and on the GPU. GCC would decide which would be faster and you would be able to use that massive amounts of memory bandwidth for something actually useful.

ATI and Nvidia are heading towards GPGPU. Intel is heading towards media optimized x86-like cores. Either way it will be faster and be useful for so much more then current video cards are used for.

Hardware raid? Software raid is faster... eventually it will be better.

---------------------

Modern OSes are bloated, no doubt about that. But the solution isn't hardware.. the solution is fixing the OS.

The Linux kernel is already kick-ass. Sure it has issues, but it's still better then OS X's kernel or Window's kernel or Solaris's kernel. It's best there is for what it does well. If Linux devs can figure out solutions to the remaining driver issues and fix userspace-to-kernel ABI/API breakage issues then there will be almost no reason to use anything else.

For this I think embedded development is helping a lot. You can spend all day banging your head against the Linux kernel, but it won't compare to fixing some memory usage issues with GTK in terms of positive impact.

Projects like Maemo for the Nokia N800 or OpenMoko for Neo1973 were you have a nearly full Linux install with X, networking, GTK/Gnome, and Linux work well on a hand-held device is hugely positive. I think things like that are eventually going to help improve desktop performance considerably....

If Gnome can be made to run well on a phone with 128megs of ram, 64 megs worth of flash drive, and a 300mhz ARM cpu then it's going to kick ass on a modern desktop. This sort of thing is probably the most important thing right now, I think.

Hopefully KDE4 will be everything they promise it will be...

pluggable I/O schedulers, pluggable CPU schedulers

mingo — Wed, 25 Jul 2007 19:55:01 +0000

Would it be possible to merge the I/O schedulers into one, and then expose the different behaviors as config knobs? Y'know, with "AS" and "CFQ" as particular groups of settings? Or is that just as bad if not worse?

You suggestion makes sense and i think it would likely result in fundamentally better code that gives us one codebase, but it still doesnt give us an IO scheduler that does the right thing no matter what we throw at it (the user would still have to turn that knob). So in that sense it would be little change from the current state of affairs.

The problem is not primarily the kernel-internal code duplication - we can handle such things pretty well, the kernel's nearly 8 million lines of code now and growing fast. We've got 3 SLAB implementations and that's not a problem because it was never made user (and application) visible.

The problem is the externally visible imperfection of the kernel's behavior, and when end-users learn to depend on it. If we try to remove such a knob we promised to users at a stage, those who are affected negatively complain (and rightfully so).

It should also be seen in perspective: these issues are not the end of the world, our I/O schedulers are pretty damn good already, and modularity is not an issue at all compared to some other kernel problems we are facing - but if code is submitted to an otherwise well-working subsystem these little factors are what make or break a particular patch's upstream acceptance.

desktop or server latency

man_ls — Wed, 25 Jul 2007 19:14:22 +0000

Still there are some problems which are kernel related, and it would seem that kernel devs should care about those. Windows XP (or marginally better) is not good enough for many people.

The picture that Kolivas draws in the article is that at the beginning 2.6 was not too good latency-wise, and e.g. music players were skipping all the time: it made Mac OS X look smooth in comparison. It is funny that an Australian anaesthetist was the one to call people's attention to it.

Actually I have taken a deeper look and it seems that my kernel preemption configuration under feisty is the same as it was in edgy, CONFIG_PREEMPT_VOLUNTARY=y. Maybe the patch set applied is a bit different, or maybe desktop latency has just degraded from 2.6.17 to 2.6.20. Let us hope someone there is still willing to work on it.

pluggable I/O schedulers, pluggable CPU schedulers

flewellyn — Wed, 25 Jul 2007 18:37:16 +0000

This may be a foolish question, but never let it be said that I'm not willing to ask those.

Would it be possible to merge the I/O schedulers into one, and then expose the different behaviors as config knobs? Y'know, with "AS" and "CFQ" as particular groups of settings?

Or is that just as bad if not worse?

desktop or server latency

mikov — Wed, 25 Jul 2007 17:15:29 +0000

Well, I have the standard Debian Etch kernel and to be honest I don't even really know how it is configured. Debian isn't known for the best desktop experience, though :-)

There is no doubt that interactivity could be improved, but it already is better than Windows XP, so I can understand why it is not a priority.

The kernel developers themselves use Linux on their own desktops, so we can't really say that they are oblivious to any problems. They don't consider the problems _that_ important, and I agree with them. I, for one, feel perfectly fine with my Linux desktop. Granted, there are many things that could be improved, but they are not all kernel related.

For example, the common GUI event loop paradigm simply cannot produce really responsive applications. What is needed is some sort of client-server architecture, with a real-time priority UI thread with a guaranteed response time, not doing anything besides processing input and rendering, and a back-end worker thread (or threads) doing all actual work. The division between the UI and back-end threads is not always very clear-cut, and it is always tempting to simplify things by putting them in the UI thread, so to make the rules simple, the UI thread must not invoke any syscalls except the carefully crafted UI library functions, and must have no loops or recursion at all.

Granted it would be hard to do, especially in Unix where threads have never been very popular. C may not be the best language for it - something higher level might be needed. However it would deliver more interactivity gains than any kernel improvements.

Interview with Con Kolivas (APC)

malor — Wed, 25 Jul 2007 16:22:29 +0000

I noticed a similar problem with OSX on the Mac Pro; when running a dd, the system slows to a dead stop. dd consumes all available I/O and the system essentially stops responding until it's done. And while I don't have dd on this Windows box, I've noticed that the system gets very, very slow when VMWare is creating an image.

They're different systems, but they're both using Intel chipsets, and because of that, I'm wondering if it might be something about Intel's SATA controllers. My Athlons never did this; they maintained much better responsiveness under load.

I also found that the Mac stayed useful if I didn't give dd a blocksize. Up to some amount, which I think was 256 but don't remember absolutely for sure, it maintained decent performance. It started slowing at X+1 (I think 257), and got worse linearly up until 512. It didn't get any worse after that, but it's hard to get any worse than total system lockout.

I haven't tested blocksizes under Windows, but between seeing both OSX and Windows do the same no-response-under-heavy-IO thing, and hearing your story, I'm wondering if Intel is doing something dumb with I/O.

If you're not on Intel, of course, that blows that idea out of the water. :)

Solarix booting from ZFS?

paulj — Wed, 25 Jul 2007 16:03:03 +0000

GRUB ZFS stage1.5 source code

It's pretty difficult to find, you have to go to cvs.opensolaris.org and search for 'zfs', and scroll down a bit. Really obscure..

;)

time it takes to get a project into the upstream kernel

msmeissn — Wed, 25 Jul 2007 15:41:18 +0000

You should try again with exec-shield.

The non-soft-NX parts look mergeable at least to my eyes.