Interview with Con Kolivas (APC) [LWN.net]

Interview with Con Kolivas (APC)

Posted Jul 24, 2007 21:31 UTC (Tue) by mgb (guest, #3226) [Link] (16 responses)

A big thank you to CK for identifying the problem. Now maybe Linus and his friends can start to work towards a solution.

I hope the kernel has been a fun playground for developers for the past few years. I've certainly enjoyed reading LWN's coverage. But most of the 2.6 developments have been of no benefit to regular users. And meanwhile the kernel's RAM requirements just keep growing.

Interview with Con Kolivas (APC)

Posted Jul 24, 2007 21:46 UTC (Tue) by jospoortvliet (guest, #33164) [Link] (14 responses)

Well, the work on IO was good for us, desktop users, that's for sure. And
CFS might turn out to be an advantage as well. Inotify is cool, and did
adaptive readahead get in yet? Of course I'd love to mention
swap-prefetch here, but it probably won't make it >:(

Indeed, http://kernelnewbies.org/Linux26Changes won't reveal much big,
infrastructure changes good for the desktop. Many drivers, though, which
we should be gratefull for... The realtime patches are usefull as well,
and there is suspend & friends. Powersaving for laptops (dynticks) and to
a lesser extend virtualisation and security improvements all benefit
users too. But there are 22 releases since 2.6.0, and I can't mention
more than 10 big desktop improvements :(

Interview with Con Kolivas (APC)

Posted Jul 24, 2007 23:27 UTC (Tue) by jwb (guest, #15467) [Link] (13 responses)

The I/O work was good for desktop users? Using the very latest release kernel on a really fast machine -- 4 CPUs and 4 GB of memory -- a streaming I/O load of 30-50MB per second from one disk to another causes latencies in excess of one full minute for random I/O requests. Starting xterm while this streaming I/O load is running takes more than 5 minutes.

Desktop performance on Linux is an absolutely joke once you really start to load it down. It is quite obvious that server throughput has been optimized and desktop performance has been forgotten. I wish I had the knowledge and the free time to work on the problem myself.

Interview with Con Kolivas (APC)

Posted Jul 25, 2007 3:33 UTC (Wed) by mikov (guest, #33179) [Link] (6 responses)

Ahem :-) Have you tried a similar experiment under Windows ? On the hardware I am typing now (which admittedly is very weak - Athlon XP 2000), the Linux desktop is visibly more responsive that Windows XP when doing practically anything. (With the exception of Internet browsing - Firefox is noticeably faster on Windows - this is very annoying really and forces me to try to use Konqueror under Linux as much as I can).

It could be that the kernel hackers don't have great incentive to improve desktop responsiveness, because it already is superior to the main competition.

Con's interview seems a little bit like complaining. He couldn't get his patches into the kernel, so he is sad and angry. That is understandable. However as far as I can tell, Ingo's CFS is actually superior to Con's staircase scheduler. I think the mature thing for Con to do would be to continue his involvement in the kernel. It isn't an all or nothing game.

(On the other hand, I see how it must be extremely hard and frustrating to do it on an 100% voluntary basis in your free time)

desktop or server latency

Posted Jul 25, 2007 8:07 UTC (Wed) by man_ls (guest, #15091) [Link] (5 responses)

It could be just different kernel configurations. I used to have Edgy Eft with desktop configuration (Preemptible Kernel (Low-Latency Desktop)) and my machine, similar to yours, worked fine for desktop usage. However when I upgraded to Feisty Fawn somehow my kernel got configured for server usage (No Forced Preemption (Server)) and now it feels noticeably slower.

I guess it is fine as long as both options work well. CK in the interview complains that kernel devs don't pay much attention to the desktop, but it is obvious that 2.6 has got much better in these few years, probably thanks in part to himself. It is a pity to see him go. Luckily it seems that there are a few devs left who do care, like Ingo Molnar.

desktop or server latency

Posted Jul 25, 2007 17:15 UTC (Wed) by mikov (guest, #33179) [Link] (4 responses)

Well, I have the standard Debian Etch kernel and to be honest I don't even really know how it is configured. Debian isn't known for the best desktop experience, though :-)

There is no doubt that interactivity could be improved, but it already is better than Windows XP, so I can understand why it is not a priority.

The kernel developers themselves use Linux on their own desktops, so we can't really say that they are oblivious to any problems. They don't consider the problems _that_ important, and I agree with them. I, for one, feel perfectly fine with my Linux desktop. Granted, there are many things that could be improved, but they are not all kernel related.

For example, the common GUI event loop paradigm simply cannot produce really responsive applications. What is needed is some sort of client-server architecture, with a real-time priority UI thread with a guaranteed response time, not doing anything besides processing input and rendering, and a back-end worker thread (or threads) doing all actual work. The division between the UI and back-end threads is not always very clear-cut, and it is always tempting to simplify things by putting them in the UI thread, so to make the rules simple, the UI thread must not invoke any syscalls except the carefully crafted UI library functions, and must have no loops or recursion at all.

Granted it would be hard to do, especially in Unix where threads have never been very popular. C may not be the best language for it - something higher level might be needed. However it would deliver more interactivity gains than any kernel improvements.

desktop or server latency

Posted Jul 25, 2007 19:14 UTC (Wed) by man_ls (guest, #15091) [Link] (3 responses)

Still there are some problems which are kernel related, and it would seem that kernel devs should care about those. Windows XP (or marginally better) is not good enough for many people.

The picture that Kolivas draws in the article is that at the beginning 2.6 was not too good latency-wise, and e.g. music players were skipping all the time: it made Mac OS X look smooth in comparison. It is funny that an Australian anaesthetist was the one to call people's attention to it.

Actually I have taken a deeper look and it seems that my kernel preemption configuration under feisty is the same as it was in edgy, CONFIG_PREEMPT_VOLUNTARY=y. Maybe the patch set applied is a bit different, or maybe desktop latency has just degraded from 2.6.17 to 2.6.20. Let us hope someone there is still willing to work on it.

desktop or server latency

Posted Jul 25, 2007 22:29 UTC (Wed) by mingo (guest, #31122) [Link] (2 responses)

Actually I have taken a deeper look and it seems that my kernel preemption configuration under feisty is the same as it was in edgy, CONFIG_PREEMPT_VOLUNTARY=y. Maybe the patch set applied is a bit different, or maybe desktop latency has just degraded from 2.6.17 to 2.6.20. Let us hope someone there is still willing to work on it.

Hm, could you try the -rt kernel? (2.6.22.1-rt8 would be a good pick. It also includes CFS.)

The -rt kernel has the latest round of latency fixes - not all are upstream yet. For the past 2-3 years we've been working hard on eliminating various sources of latencies. (and the scheduler was never a big factor for that.)

Since v2.6.12 or so we did over a 100 latency fixes alone (some of those were large kernel features like PREEMPT_VOLUNTARY or PREEMPT_BKL), as part of the -rt project. I maintained the low-latency patchset before that for years.

If -rt does not fix your problem then please report it to lkml and to me (or to rt-linux-users@vger.kernel.org), we are definitely interested in not yet fixed sources of latencies.

desktop or server latency

Posted Jul 25, 2007 22:35 UTC (Wed) by jwb (guest, #15467) [Link] (1 responses)

I'm not sure how we went from I/O latency to CPU scheduler latency. The fact that an xterm takes a full minute to start has little or nothing to do with the CPU in my experience. It has to do with the fact that the process wants to mmap libXt, and the kernel puts that request at the back of some gigantic list of operations and doesn't call it back for tens of seconds. So fiddling with the CPU scheduler isn't going to change that problem.

desktop or server latency

Posted Jul 26, 2007 14:39 UTC (Thu) by mingo (guest, #31122) [Link]

Yeah, the discussion weered off to general kernel latencies.

As for your "xterm takes a long time to start during a dd", does CFQ (and, optionally, ionice) solve the problem?

Interview with Con Kolivas (APC)

Posted Jul 25, 2007 11:49 UTC (Wed) by jbouzane (guest, #43125) [Link] (5 responses)

I didn't believe what you said, so I set out to prove it myself. I was actually rather surprised at the results, though I still think you're exaggerating.

The labels on the left say which IO scheduler was used. The first column below is the time taken for oowriter by itself to start. The next column is oowriter startup time with dd running copying a 4 GB DVD image from /dev/sda to /dev/sdc (SATA). The third column is the same test, except with the dd process running at nice -n +19

Anticipatory     13s    154s   135s
CFQ              11s    200s    90s

Note that for all tests over 95 seconds or so, the dd was run in a loop because it completed in about that amount of time.

The machine is an Intel Core 2 Duo 2.4 GHz with 2 GB of RAM and 2 Seagate 400 GB hard drives on a 2.6.21.5 SMP kernel with preemption (including BKL) and RT mutexes. All tests were run after clearing the page cache using /proc/sys/vm/drop_caches

The same tests with xterm instead of OpenOffice give:

Anticipatory      4s     25s    22s
CFQ               5s     50s    50s

So therefore, I think your 5 minute claim is wrong. However, I do see 10x slowdowns, sometimes 20x with CFQ.

pluggable I/O schedulers, pluggable CPU schedulers

Posted Jul 25, 2007 13:00 UTC (Wed) by mingo (guest, #31122) [Link] (2 responses)

So therefore, I think your 5 minute claim is wrong. However, I do see 10x slowdowns, sometimes 20x with CFQ.

Yes - and note that such problems are one of the reasons why some of the Linux IO subsystem maintainers are today (partly) regretting that they exposed pluggable I/O schedulers to user-space.

The IO subsystem now has two reasonably good IO schedulers: AS and CFQ, but the maintainers would like to use CFQ exclusively. The problem is, they cannot do that anymore: some key apps are still running best on AS and there's not enough user pressure to move this issue. So it will be very hard to create one good I/O scheduler that will work well out of the box, without the user having to tweak anything. (and most users and admins dont tweak their systems.) I/O schedulers did help development and prototyping.

In the CPU scheduler space the same question and maintainance issue comes up but with a tenfold magnitude: unlike disks, CPUs are a fundamentally more "shared" and a fundamentally more "stateless" resource (despite caching) for which we have to offer robust multitasking. Disks store information in a much more persistent way, workloads are bound to particular disks in a much more persistent way (than tasks are bound to CPUs) and disks are also a lot less parallel due to the fundamental physical limitations of rotating platters.

The default CPU scheduler has to be good enough for all purposes and we dont want to splinter our technology into too many "this app works best with that scheduler on that hardware and with that kernel config" niches.

So the technological case for pluggable CPU schedulers was never truly strong, and it's even weaker today now that we've got direct experience with pluggable I/O schedulers.

[Sidenote: i think Con made a small mistake in equating CFS's modularization to PlugSched: the modularization within CFS is of a fundamentally different type: it modularizes the scheduling policies, which are already distinct part of the ABI. (SCHED_FIFO, SCHED_RR, SCHED_OTHER, SCHED_BATCH, SCHED_IDLE) This was a nice internal cleanup to the scheduler. PlugSched never did that, it was always submitted as an additional complication that allows build and boot time switching to a completely different CPU scheduler, not as a cleanup to the already pretty complex scheduler code. I have recently suggested to the current PlugSched maintainer (Peter Williams) to rework PlugSched along similar lines - that will result in a lot cleaner approach. ]

pluggable I/O schedulers, pluggable CPU schedulers

Posted Jul 25, 2007 18:37 UTC (Wed) by flewellyn (subscriber, #5047) [Link] (1 responses)

This may be a foolish question, but never let it be said that I'm not willing to ask those.

Would it be possible to merge the I/O schedulers into one, and then expose the different behaviors as config knobs? Y'know, with "AS" and "CFQ" as particular groups of settings?

Or is that just as bad if not worse?

pluggable I/O schedulers, pluggable CPU schedulers

Posted Jul 25, 2007 19:55 UTC (Wed) by mingo (guest, #31122) [Link]

Would it be possible to merge the I/O schedulers into one, and then expose the different behaviors as config knobs? Y'know, with "AS" and "CFQ" as particular groups of settings? Or is that just as bad if not worse?

You suggestion makes sense and i think it would likely result in fundamentally better code that gives us one codebase, but it still doesnt give us an IO scheduler that does the right thing no matter what we throw at it (the user would still have to turn that knob). So in that sense it would be little change from the current state of affairs.

The problem is not primarily the kernel-internal code duplication - we can handle such things pretty well, the kernel's nearly 8 million lines of code now and growing fast. We've got 3 SLAB implementations and that's not a problem because it was never made user (and application) visible.

The problem is the externally visible imperfection of the kernel's behavior, and when end-users learn to depend on it. If we try to remove such a knob we promised to users at a stage, those who are affected negatively complain (and rightfully so).

It should also be seen in perspective: these issues are not the end of the world, our I/O schedulers are pretty damn good already, and modularity is not an issue at all compared to some other kernel problems we are facing - but if code is submitted to an otherwise well-working subsystem these little factors are what make or break a particular patch's upstream acceptance.

Interview with Con Kolivas (APC)

Posted Jul 25, 2007 16:22 UTC (Wed) by malor (guest, #2973) [Link] (1 responses)

I noticed a similar problem with OSX on the Mac Pro; when running a dd, the system slows to a dead stop. dd consumes all available I/O and the system essentially stops responding until it's done. And while I don't have dd on this Windows box, I've noticed that the system gets very, very slow when VMWare is creating an image.

They're different systems, but they're both using Intel chipsets, and because of that, I'm wondering if it might be something about Intel's SATA controllers. My Athlons never did this; they maintained much better responsiveness under load.

I also found that the Mac stayed useful if I didn't give dd a blocksize. Up to some amount, which I think was 256 but don't remember absolutely for sure, it maintained decent performance. It started slowing at X+1 (I think 257), and got worse linearly up until 512. It didn't get any worse after that, but it's hard to get any worse than total system lockout.

I haven't tested blocksizes under Windows, but between seeing both OSX and Windows do the same no-response-under-heavy-IO thing, and hearing your story, I'm wondering if Intel is doing something dumb with I/O.

If you're not on Intel, of course, that blows that idea out of the water. :)

Interview with Con Kolivas (APC)

Posted Jul 26, 2007 11:58 UTC (Thu) by rwmj (subscriber, #5474) [Link]

Well, I get similar problems on my Athlon machines :-(

I suspect that the problem may lie with SATA itself. It certainly
feels much worse than IDE did, but again just subjectively - I
haven't got any hard figures.

Rich.

Interview with Con Kolivas (APC)

Posted Jul 25, 2007 14:38 UTC (Wed) by renox (guest, #23785) [Link]

>A big thank you to CK for identifying the problem.

It's not like this is a new problem..
It has been known for a long time that the Linux kernel is much more tuned for enterprise needs than end-user needs.

Part of the problem is that it is a funding problem, another part is that it is hard to benchmark 'application X is slow' and show that this is a kernel issue..

Maybe this (bad) publicity will trigger the Linux developers to do something about this situation, but I doubt it.

Interview with Con Kolivas (APC)

Posted Jul 24, 2007 21:56 UTC (Tue) by laoguy (guest, #46414) [Link] (9 responses)

Yeah. Great interview.
There goes the last developer purely interested in the desktop.
Linus sure is starting to look like the whiny apologist for big
business that he really is.
Thanks Con.

Oh, please

Posted Jul 24, 2007 22:28 UTC (Tue) by JoeBuck (subscriber, #2330) [Link] (5 responses)

There are other developers interested in the desktop.

kernel developers who care about the desktop

Posted Jul 25, 2007 10:49 UTC (Wed) by mingo (guest, #31122) [Link] (4 responses)

Yes. I'm for one very much interested in the desktop, and i know that many other kernel developers are too - including Linus. During the past 3 years hundreds of patches (and a handful of major features) went from the -rt tree (that i'm maintaining together with Thomas Gleixner), into the upstream kernel. [ And there's still 380 patches to go as of v2.6.23-rc1-rt2 - some of them have been waiting for upstream integration nearly 4 years - ouch! :-/ ]

Most of those patches were inspired by problems that are primarily related to the desktop. One of the largest features of v2.6.21 (dynticks and high resolution timers) is a feature that is mainly relevant on the desktop (laptops in particular).

It never hurts if a kernel feature helps all categories of Linux use - but in general, just about anything sane that helps the desktop helps in the server-space too, if formulated generally enough. Inotify (now upstream) helps servers too. Dynticks (now upstream) helps servers too, high-resolution timers (now upstream) helps servers too, PREEMPT_BKL (getting rid of the big kernel spinlock, now upstream too) helps servers as well, etc.

In terms of desktop latencies, the CPU scheduler was never really a huge issue in practice, we had lots of much graver latency problems that were patiently tracked down and fixed. (and, as many have pointed it out already, I/O schedulers and the VM policies have a much bigger role in the everyday latencies that a desktop user sees - the CPU/task scheduler is a distant third in that aspect.)

But CPU schedulers do stir emotions: everyone who is interested in OSs has some sort of notion about what a CPU scheduler is, everyone is affected by it so everyone has an opinion about how it should (or should not) be done. Fixes for serious but otherwise dull latencies deep in the kernel are hard to understand so they raise a lot less emotions. (Which is a big benefit: it makes them all that much easier to merge ;-)

I can understand Con's frustration with the kernel patch process, and while i disagree with Con about some technological issues related to CPU scheduling (and conceded others, and not only in the SD/CFS scheduler discussion but in the past too: out of the 250+ modifications that were done to sched.c in the past 2 years alone [authored by over 80 different kernel hackers], 15 changes were from Con), i actually fully agree with Con's points about swap-prefetch and disagree with the MM maintainers' decision about swap-prefetch.

I'd not be surprised if the core problem that lies behind swap-prefetch is something that we wanted to see fixed on servers too.

In any case, regardless of the technological merits, it never feels good to have a patch rejected - i was upset about it myself whenever it happened, and i'm upset about it even today when it happens. Rejection of my patches still happens quite frequently. (And if anyone knows about some insider old boy's network that gets kernel patches past Linus easily then please let me know, i'd like to join! ;-)

kernel developers who care about the desktop

Posted Jul 26, 2007 1:52 UTC (Thu) by ras (subscriber, #33059) [Link] (3 responses)

Your point about IO & VM schedulers being the dominant cause for slowdowns rings true, and will become more so with the move to multiple CPU's. Most desktop machines have a load factor of less than 4, so with 4 CPU's every task that wants to run can have its own CPU and there is nothing the schedule.

The response time of a the machine when a disk bound is running task is a major issue. To the extent that in 2.6.8, you could literally kill a machine with a mke2fs because it flooded the block cache, invoking the OOM killer which then took out some important process or other. It doesn't do that with newer kernels, but it can still delay the start up of vi by a minute or two.

Think about it. A single mke2fs, nice'ed to 19, can bring a 4 CPU machine to its knees. You can tell the CPU scheduler you want mke2fs to only take a small percentage of the available CPU time, but apparently there is no way to tell the IO scheduler its not allowed hog 95% of memory by swamping the block cache. Yuk!

kernel developers who care about the desktop

Posted Jul 26, 2007 14:36 UTC (Thu) by mingo (guest, #31122) [Link] (2 responses)

ionice will solve that problem for you - it has been available since 2.6.13.

kernel developers who care about the desktop

Posted Jul 26, 2007 23:08 UTC (Thu) by conman (guest, #14830) [Link] (1 responses)

Have you tried using ionice? You'll find it has never worked.

kernel developers who care about the desktop

Posted Jul 30, 2007 9:17 UTC (Mon) by mingo (guest, #31122) [Link]

Con Kolivas wrote:

Have you tried using ionice? You'll find it has never worked.

hi Con - this is the first time i've seen you characterise ionice as "never working", so your statement is quite surprising to me - could you please elaborate on this criticism? Have you reported your problems with ionice to lkml?

I've used ionice myself and it works well within its boundaries.

Linus and the desktop

Posted Jul 24, 2007 22:55 UTC (Tue) by dowdle (subscriber, #659) [Link] (2 responses)

While we haven't heard too much from Linus in the media over the last couple of years... he always seemed to publicly say that he favored solutions that benefited all users not just a subset of users... and he always seemed to refer to desktop users as a group he wanted to protect from the performance problems that might be caused by giving enterprise users preference. I'm not sure if that was the case... or if that was how he came across on the lkml over the last few years.

I don't think it unreasonable for people to ask for proof of performance improvements... and not wanting to adopt something without such proof. Science is kinda like that.

-ck's comments do beg the question... what would we be doing with all of the speed if we actually had it? I mean, wouldn't hardware sales slow to a crawl if your machine seemed speedy for more than a year or two? We couldn't have that now could we?

Linus and the desktop

Posted Jul 25, 2007 4:12 UTC (Wed) by zlynx (guest, #2285) [Link] (1 responses)

I think that you can have most of the speed.

Use the console. In every case I can recall where network access was not involved or giant DB queries, stuff happens instantly.

Use X with accelerated drivers, use software written with the FLTK X toolkit. I am really impressed with FLTK. Programs written with it seem to open before I finish clicking the launch button.

But anyway, I am already in the situation of having more than enough computer since about 2003. My laptop has a 2.2GHz Athlon64 in it. I'd sorta like to upgrade, but the CPU really is more than enough.

Linus and the desktop

Posted Jul 25, 2007 22:57 UTC (Wed) by i3839 (guest, #31386) [Link]

Enabling composite can help a lot too. It improved a lot since last time I tried (running xcompmgr -a).

Hardware is cheap as dirt nowaday, but still can't find much motivation to upgrade my Duron 1300 and 256 MB ram. Listening to other people's complaints it almost seems that the more ram and the faster your cpu the slower the system is. ;-)

Interview with Con Kolivas (APC)

Posted Jul 24, 2007 21:58 UTC (Tue) by pjdc (guest, #6906) [Link] (19 responses)

These Kolivas comment threads are always great. I love the smell of nonsense in the morning!

Interview with Con Kolivas (APC)

Posted Jul 24, 2007 22:30 UTC (Tue) by NCunningham (guest, #6457) [Link] (3 responses)

I don't think it's nonsense at all. I'm only a step behind on the same path.

Interview with Con Kolivas (APC)

Posted Jul 25, 2007 8:57 UTC (Wed) by rsidd (subscriber, #2582) [Link] (2 responses)

That's sad to hear. But even if the powers-that-be on LKML are resistant to Con Kolivas's scheduler or your suspend2 or other such patches, why aren't distros taking them up? For many years distros have been shipping far-from-vanilla kernels, and one would imagine that suspend2 would be much in demand.

Interview with Con Kolivas (APC)

Posted Jul 25, 2007 9:37 UTC (Wed) by pjdc (guest, #6906) [Link] (1 responses)

The conspiracy goes deeper than you realise. The larger distros have adopted a cynical policy of "mainline convergence" to further their secret agenda of preventing Con's and Nigel's code from reaching the very people who need it most and keeping Linus king of the heap. If you want a picture of the future of Linux, imagine a boot stomping on a human face, forever.

Interview with Con Kolivas (APC)

Posted Jul 25, 2007 21:45 UTC (Wed) by NCunningham (guest, #6457) [Link]

Heh. I assume that's tongue in cheek. Distros seeking to get closer to mainline isn't at all the problem. It would be nice if they picked up our patches, but it's perfectly understandable that they should want to minimise the work involved in maintaining their kernels - I'm seeking to do that too.

No, the problem I have is that Andrew and Linus have stated flat out that they don't want to merge another implementation of Suspend to disk, but at the same time Andrew at least is jumping up and down over kexec based hibernation as if it's going to be the panacea. He's almost ready to merge the first patch that appears that's remotely a step towards the idea. That's what's getting to me.

Interview with Con Kolivas (APC)

Posted Jul 24, 2007 23:39 UTC (Tue) by drag (guest, #31333) [Link] (14 responses)

Well the hardware vs software stuff seems nonsensical. Software is easier to update, it's cheaper, and is much much more flexible then any hardware. This is just how it goes. Bugs in software can be fixed, bugs in hardware (generally) cannot.

Maybe he is daydreaming back to the days of the Amiga with lots of specialized hardware to take over multi-media style tasks. Using these specialized chips and proccessors Amiga was able to provide functionality and performance that wasn't matched until years later.

But the truth is that everything is sinking into the CPU. It's Moore's law. There are simply more transistors in today's cpus then entire computers used to have. Not very old computers either.

The way things are going is that in a decade or so the fastest desktop system is going to be about the size of paperback novel and will have over a hundred cpu cores of various types on a single die. Some more generic, others more specialized. It'll fit in the palm of your hand. The major issue would be what to do with all the I/O ports. It'll run full-tilt on batteries for at least 24 hours before sleeping.

And, on average, new computers will be as cheap as a pair of shoes.

But the other stuff is pretty ok on.

Desktop has been ignored compared to enterprise. This is because this is were the Linux market it at right now. I think it will change, but...

Maybe what people like CK should do is just give up on the kernel. Go and improve userspace.

Linux userspace is the place that needs massive amounts of work. Also keep in mind that the Linux kernel, is just that, a KERNEL. Kernels can be replaced. Userspace is were the important stuff is for the desktop.

So do improvements, setup benchmarks, do what ever that makes you have fun. Be a bee in the Linux kernel developer's ear... I think that if you get the point were you can quantify problems, present them in metrics and numbers, then they are much easier to act on.

If it is still not looking good.. then maybe a fork is in order. Maybe it's not a good idea to have the server and the desktop kernel be the same kernel. There is no rule that says it has to be.

I don't know. But there are always alternatives.

Interview with Con Kolivas (APC)

Posted Jul 25, 2007 0:30 UTC (Wed) by zooko (guest, #2589) [Link] (6 responses)

I'm pretty interested in OpenSolaris. I installed Nexenta CP on my new server, and it automatically detected my two hard drives and installed ZFS, mirrored on the two drives, for everything including boot! That was cool.

I've since had a lot of fun reporting bugs [1]. I haven't been in such a "target rich environment" for bug reporting since the good old days of Debian circa 1996!

[1] https://launchpad.net/nexenta

Solarix booting from ZFS?

Posted Jul 25, 2007 1:33 UTC (Wed) by ncm (guest, #165) [Link] (5 responses)

Last I heard, solarix wouldn't boot from a ZFS filesystem; the ZFS-via-FUSE-on-linux people were gloating that they had the only system that would boot from ZFS. Does solarix boot direct from ZFS now? (And, if so, does that mean somebody wrote Forth code that understands ZFS layout?)

Solarix booting from ZFS?

Posted Jul 25, 2007 1:48 UTC (Wed) by zooko (guest, #2589) [Link] (4 responses)

Yes, my Nexenta CP install boots from a ZFS filesystem. Or else it pulls some kind of fast one on me, and leaves me with a running system that *claims* that the only filesystem present is ZFS...

Just kidding. It really is ZFS boot:

http://mail.opensolaris.org/pipermail/opensolaris-discuss...

This is Solaris/x86, not Solaris/Sparc, and I think it doesn't use the Forth bootloader, but I'm not sure.

Solarix booting from ZFS?

Posted Jul 25, 2007 1:57 UTC (Wed) by pjdc (guest, #6906) [Link] (3 responses)

Solaris x86 uses grub and an initrd-like arrangement ("boot archive"), so I guess they just sling the zfs junk in there.

Solarix booting from ZFS?

Posted Jul 25, 2007 2:47 UTC (Wed) by pjdc (guest, #6906) [Link] (2 responses)

Of course that can't be the whole story. Grub still needs to read the boot archive somehow. The Solaris 10 box I have access to doesn't seem to have a /boot/grub/zfs_stage1_5, but I do get a whopping three hit on Google for "zfs_stage1_5".

Solarix booting from ZFS?

Posted Jul 25, 2007 16:03 UTC (Wed) by paulj (subscriber, #341) [Link]

GRUB ZFS stage1.5 source code

It's pretty difficult to find, you have to go to cvs.opensolaris.org and search for 'zfs', and scroll down a bit. Really obscure..

;)

Solarix booting from ZFS?

Posted Jul 25, 2007 21:16 UTC (Wed) by akumria (guest, #7773) [Link]

Grub still needs to read the boot archive somehow.

Indeed. I believe they use Sun GPLv2 ZFS code to do so.

Interview with Con Kolivas (APC)

Posted Jul 25, 2007 1:51 UTC (Wed) by bojan (subscriber, #14302) [Link] (3 responses)

> Desktop has been ignored compared to enterprise. This is because this is were the Linux market it at right now.

Yeah, money talks, as always. If/when we see more acceptance of Linux on the desktop, things may indeed change.

Interview with Con Kolivas (APC)

Posted Jul 25, 2007 7:39 UTC (Wed) by ms (subscriber, #41272) [Link] (2 responses)

Yeah, money talks, as always. If/when we see more acceptance of Linux on the desktop, things may indeed change.

Chicken and Egg though. It's widely known that firefox, for example, works much better under Windows than Linux. Until there's a compelling case that The Desktop is obviously better under Linux than Windows, there won't be the numbers, and until there are the numbers, the kernel devs won't be inclined to do the work to make it better.

Interview with Con Kolivas (APC)

Posted Jul 26, 2007 2:45 UTC (Thu) by bojan (subscriber, #14302) [Link]

> Until there's a compelling case that The Desktop is obviously better under Linux than Windows, there won't be the numbers, and until there are the numbers, the kernel devs won't be inclined to do the work to make it better.

People don't always select products based on "better" - sometimes cheaper is more important, other times more flexible and so on. So, hopefully, the acceptance will eventually rise to the point where companies will have people working full time on Linux features that are desktop related, just like they have now for the server stuff.

But, yeah, it is a chicken and egg to some extent, unfortunately. But I don't think it's because of "goodness". It probably has more to do with the fact that the vast majority of businesses have one or more Windows apps they absolutely cannot do without, so it keeps them tied to that platform.

Interview with Con Kolivas (APC)

Posted Jul 31, 2007 15:38 UTC (Tue) by daenzer (subscriber, #7050) [Link]

> It's widely known that firefox, for example, works much better under
> Windows than Linux.

From what I've heard so far that seems more likely due to its inefficient use of X than due to the kernel though, FWIW.

Interview with Con Kolivas (APC)

Posted Jul 25, 2007 4:35 UTC (Wed) by russell (guest, #10458) [Link] (2 responses)

Moore's law applies not only to CPUs. Moore's law also applies to specialised hardware as well. As for software, the opposite of Moore's law seems to apply. It gets slower and more bigger as time goes by. So don't say software will save us. It's only slowed us down so far, but fortunately hardware has kept pace ( just ). PC's did not catch up to the Amiga because the CPU became faster. They caught up because of specialized hardware, specifically 3DFX cards and sound cards.

Interview with Con Kolivas (APC)

Posted Jul 25, 2007 20:02 UTC (Wed) by drag (guest, #31333) [Link] (1 responses)

*shrug*

Think of your processor as if it's a black hole.

Wafers are getting bigger and purer. Fab processes are getting faster and smaller. You just have SO MANY TRANSISTORS. They got so many that they are throwing 4 or more cores on a single die.

Intel has a research processor that has 80 cores on it.

WTF is any desktop going to do with 80 cores? Sure 2 cores is great, and 4 is pretty good. 8 is so-so, but your looking at 32 or 64 cores you simply are not going to see any improvement in performance! So what is happening is that all the functionality of your computer is just going to get sucked into that processor, piece by piece. Your video card, your wifi, your north bridge, your sound midi, your modem, etc etc etc.

It's cheaper, faster, more reliable, more energy efficient.

> They caught up because of specialized hardware, specifically 3DFX cards and sound cards

Physics acceleration? It's a joke.

Sound card acceleration? It's dead and dying, killed by it's own patents. You can blame Creative for that one. Wait till their patents dry up then you'll see real innovation in realistic 3D sound. It'll be all software, though.

3d Acceleration? The movement is torwards cpu cores of different types, specialized for specific workloads.

State of the art "hardware acceleration" graphics nowadays for video graphics is you take proprietary shader language and you compile it into to binaries to run on your GPU. Sounds familar? Doesn't sound like 'hardware acceleration' to me, it sounds more like regular old software on a special cpu.

If it wasn't for the fact that ATI and Nvidia were such A-holes about their 'IP' we would be compiling are software to run on both the CPU and on the GPU. GCC would decide which would be faster and you would be able to use that massive amounts of memory bandwidth for something actually useful.

ATI and Nvidia are heading towards GPGPU. Intel is heading towards media optimized x86-like cores. Either way it will be faster and be useful for so much more then current video cards are used for.

Hardware raid? Software raid is faster... eventually it will be better.

---------------------

Modern OSes are bloated, no doubt about that. But the solution isn't hardware.. the solution is fixing the OS.

The Linux kernel is already kick-ass. Sure it has issues, but it's still better then OS X's kernel or Window's kernel or Solaris's kernel. It's best there is for what it does well. If Linux devs can figure out solutions to the remaining driver issues and fix userspace-to-kernel ABI/API breakage issues then there will be almost no reason to use anything else.

For this I think embedded development is helping a lot. You can spend all day banging your head against the Linux kernel, but it won't compare to fixing some memory usage issues with GTK in terms of positive impact.

Projects like Maemo for the Nokia N800 or OpenMoko for Neo1973 were you have a nearly full Linux install with X, networking, GTK/Gnome, and Linux work well on a hand-held device is hugely positive. I think things like that are eventually going to help improve desktop performance considerably....

If Gnome can be made to run well on a phone with 128megs of ram, 64 megs worth of flash drive, and a 300mhz ARM cpu then it's going to kick ass on a modern desktop. This sort of thing is probably the most important thing right now, I think.

Hopefully KDE4 will be everything they promise it will be...

Interview with Con Kolivas (APC)

Posted Jul 26, 2007 9:28 UTC (Thu) by nix (subscriber, #2304) [Link]

Um, what do you think hardware acceleration *is*? You're talking about software offloaded to a specialized coprocessor as if it weren't hardware acceleration, but of course it is: the coprocessor is often specialized to some degree, or has privileged access to hardware the CPU can't see and has nothing else to do so it can do things with harsh latency bounds.

Con hit the nail on the head and ignored it

Posted Jul 25, 2007 12:22 UTC (Wed) by dion (guest, #2764) [Link] (16 responses)

Con discovered what the Powertop people did; that anything that can be quantified will be improved.

What I don't understand is that he didn't keep improving the benchmarks a seeing that they were too hard to use.

It seems to me that the first step to get improved interactivity on the desktop is to write a benchmark that illuminates the problem.

Benchmarks, scheduler instrumentation, PowerTop

Posted Jul 25, 2007 13:37 UTC (Wed) by mingo (guest, #31122) [Link] (15 responses)

Con discovered what the Powertop people did; that anything that can be quantified will be improved.

Benchmarks like Con's interbench (a latency measurement test-suite) are very useful because they provide standardized metrics of system behavior and allow the precise tracking of them.

Instrumentation of latencies is an orthogonal thing, and it allows users to be involved much more directly - as PowerTop has proven.

Incidentally, i happened to be the one who wrote most of the kernel instrumentation code that enabled PowerTop (/proc/timer_stats, introduced upstream as part of dynticks) - and i'm quite impressed about the user-space tool, the phenomenon and the momentum that Arjan managed to build ontop of what i originally envisioned to be a raw, obscure, kernel-developers-only debugging feature.

So in CFS i've added various pieces of instrumentation to allow a powertop-like tool to be built: the scheduler now tracks worst-case blocking, worst-case sleeping latencies, how many times a task context-switched, etc.

Those stats are already used to report CFS regressions, but in addition to that a new tool could be written (lets call it "latencytop" :-) which would display the longest latencies that various tasks are observing. (or the tasks with the largest context-switch rate - etc.)

Enable CONFIG_SCHED_DEBUG for the new scheduler instrumentation code (all included in v2.6.23-rc1), the new stats are in /proc/[PID]/sched. These stats were very useful while developing CFS, nothing beats users being able to report: "my kmail just saw a 10.5 seconds latency while I built a kernel - Ingo you suck!".

About documentation...

Posted Jul 25, 2007 20:16 UTC (Wed) by i3839 (guest, #31386) [Link] (12 responses)

When searching for documentation about that /proc/<PID>/sched file, which I couldn't find, I noticed that sched-design.txt is outdated and could be removed, and that the new sched-design-CFS.txt is already lagging behind:

$ ls /proc/sys/kernel/sched_*
/proc/sys/kernel/sched_batch_wakeup_granularity_ns
/proc/sys/kernel/sched_child_runs_first
/proc/sys/kernel/sched_features
/proc/sys/kernel/sched_granularity_ns
/proc/sys/kernel/sched_runtime_limit_ns
/proc/sys/kernel/sched_stat_granularity_ns
/proc/sys/kernel/sched_wakeup_granularity_ns

But only sched_granularity_ns is documented.

If people are expected to ever use these knobs, it might be good to document what those wakeup and stat variants are, and the meaning of sched_features. When that's done all fields are easy to understand.

Interpreting and using /proc/<PID>/sched and /proc/sched_debug would also be much easier if they were documented, though as it's a debugging feature it's less important. But still.

Anyway, when hunting for latency spikes, sluggish apps and similar creatures, I guess the se.wait_max and se.block_max are most interesting?

A bit poking to get the top offenders turns up:

proc # grep se.wait_max */sched | sort -n -k 3 | tail -n 2
1182/sched:se.wait_max : 81381345
3/sched:se.wait_max : 111139352

proc # grep se.block_max */sched | sort -n -k 3 | tail -n 3
1182/sched:se.block_max : 3749201713
367/sched:se.block_max : 3938538101
721/sched:se.block_max : 4027921788

proc # ps 3 1182 367 721
PID TTY STAT TIME COMMAND
3 ? S< 0:00 [ksoftirqd/0]
367 ? S< 0:02 [kjournald]
721 ? Ss 0:00 /usr/sbin/syslogd -m 0
1182 ? Ss+ 17:25 X :0 -dpi 96 -nolisten tcp -br

So assuming that these values are in nanoseconds, ksoftirqd waited at most 111 ms before it could finally run, and X 81 ms.

And kjournald, syslogd and X blocked at most about 4 seconds on IO, which is a bit worrying, especially X as it's in both lists.

kjournald probably does huge IO request to get optimal throughput, but even then 4 seconds is bad.

syslogd does synchronous writes, so it's not that strange that it's top in the list. But it won't write much more than 100KB at once, so that's slightly scary too, but not too worrisome.

As for X, it doesn't seem to do much file IO, so it's probably blocking on something else. Video hardware? Unix sockets? Unclear, but 4s isn't healthy. Hopefully both cases happened at X startup and it's less serious than it looks (any way to reset those stats?).

About documentation...

Posted Jul 25, 2007 20:43 UTC (Wed) by mingo (guest, #31122) [Link] (1 responses)

any way to reset those stats?

Yeah: you can reset them on a per-task/thread basis by writing 0 to the /proc/[PID]/sched file. Then they'll go down to 0. (Unprivileged users can do it to, to their own tasks. root can reset it for everyone.)

You can reset it periodically as well if you want to sample/profile their typical sleep/block behavior.

About documentation...

Posted Jul 25, 2007 21:54 UTC (Wed) by i3839 (guest, #31386) [Link]

Yes, I figured that out by doing the only logical thing I could think of. :-)
Good interface.

About documentation...

Posted Jul 25, 2007 20:46 UTC (Wed) by mingo (guest, #31122) [Link] (4 responses)

So assuming that these values are in nanoseconds, ksoftirqd waited at most 111 ms before it could finally run, and X 81 ms.

Yes, the values are in nanoseconds. What priority does it have? [the prio field in /proc/[PID]/sched file] If it's niced to +19 then a longer delay is possible because other, high-prio tasks might delay it.

About documentation...

Posted Jul 25, 2007 22:03 UTC (Wed) by i3839 (guest, #31386) [Link] (3 responses)

ksoftirqd has prio 115 and X has prio 120. I didn't nice anything, so it's all default (all kernel threads at -5, user processes 0, except for pppd at -2 and udevd at -4).

About documentation...

Posted Jul 25, 2007 22:18 UTC (Wed) by mingo (guest, #31122) [Link] (2 responses)

Ok. Perhaps the 100+ msecs ksoftirqd delay was during bootup. Or if you are running a !CONFIG_PREEMPT kernel such delays could happen too. But if you are running a CONFIG_PREEMPT kernel and ksoftirqd shows such large latencies even after resetting its counters, that would be an anomaly. (if so then please report it to me and/or to lkml in email.)

About documentation...

Posted Jul 25, 2007 22:50 UTC (Wed) by i3839 (guest, #31386) [Link] (1 responses)

Ok, I'll keep an eye on it. Running PREEMPT here.

I suppose the best way to track any anomalies down is by applying latency-tracing-v2.6.23-rc1-combo.patch at your homepage? WAKEUP_TIMING seems slightly redundant now. I'll enable it anyway.

About documentation...

Posted Jul 25, 2007 23:41 UTC (Wed) by i3839 (guest, #31386) [Link]

I get compile errors with that patch, I'll send them by email.

About documentation...

Posted Jul 25, 2007 20:50 UTC (Wed) by mingo (guest, #31122) [Link] (1 responses)

Anyway, when hunting for latency spikes, sluggish apps and similar creatures, I guess the se.wait_max and se.block_max are most interesting?

Yes. There's also se.sleep_max - that's the maximum time the task spent sleeping voluntarily. ('se' stands for 'scheduling entity' - a task here)

block_max stands for the maximum involuntary delay. (waiting for disk IO, etc.)

wait_max stands for the maximum delay that a task saw, from the point it got on the runqueue to the point it actually started executing its first instruction.

Note that for multithreaded apps such as firefox all the worker threads are not in /proc/[PID]/sched but in /proc/[PID]/task/[TID]/sched. Often firefox latencies are in those threads not in the main thread.

About documentation...

Posted Jul 25, 2007 22:21 UTC (Wed) by i3839 (guest, #31386) [Link]

se.sleep_max is more interesting if you want to get most out of dynticks I suppose. Currently I'm mainly interested in wait_max.

Yes, I noticed how multithreaded apps were handled, but forgot to take account for that in my grep. The numbers are in the same range (But I resetted X and FF, and I don't see X high in the list now, FF is though).

(I'm interested in this because I wonder where the strange keyboard/mouse behaviour I irregularly get comes from. Warping mouse, "sticking" keys, since 2.6.22 or so, but could be shaky hardware too. Pain to debug).

About documentation...

Posted Jul 25, 2007 21:23 UTC (Wed) by mingo (guest, #31122) [Link] (1 responses)

If people are expected to ever use these knobs, it might be good to document what those wakeup and stat variants are, and the meaning of sched_features. When that's done all fields are easy to understand.

Yeah, i'll do that. _Normally_ you should not need to change any knobs - the scheduler auto-tunes itself. That's why they are only accessible under CONFIG_SCHED_DEBUG. (But it helps when diagnosing scheduler problems that you can tune various aspects of it without having to reboot the kernel.)

One other interesting field is sum_exec_runtime versus sum_wait_runtime: the accumulated amount of time spent on the CPU, compared to the time the task had to wait for getting on the CPU.

The "sum_exec_runtime/nr_switches" number is also interesting: it shows the average time ('scheduling atom') a task has spent executing on the CPU between two context-switches. The lower this value, the more context-switching-happy a task is.

se.wait_runtime is a scheduler-internal metric that shows how much out-of-balance this task's execution history is compared to what execution time it could get on a "perfect, ideal multi-tasking CPU". So if wait_runtime gets negative that means it has spent more time on the CPU than it should have. If wait_runtime gets positive that means it has spent less time than it "should have". CFS sorts tasks in an rbtree with this value as a key and uses this value to choose the next task to run. (with lots of additional details - but this is the raw scheme.) It will pick the task with the largest wait_runtime value. (i.e. the task that is most in need of CPU time.)

This mechanism and implementation is basically not comparable to SD in any way, the two schedulers are so different. Basically the only common thing between them is that both aim to schedule tasks "fairly" - but even the definition of "fairness" is different: SD strictly considers time spent on the CPU and on the runqueue, CFS takes time spent sleeping into account as well. (and hence the approach of "sleep average" and the act of "rewarding" sleepy tasks, which was the main interactivity mechanism of the old scheduler, survives in CFS. Con was fundamentally against sleep-average methods. CFS tried to be a no-tradeoffs replacement for the existing scheduler and the sleeper-fairness method was key to that.)

This (and other) design differences and approaches - not surprisingly - produced two completely different scheduler implementations. Anyone who has tried both schedulers will attest to the fact that they "feel" differently and behave differently as well.

Due to these fundamental design differences the data structures and algorithms are necessarily very different, so there was basically no opportunity to share code (besides the scheduler glue code that was already in sched.c), and there's only 1 line of code in common between CFS and SD (out of thousands of lines of code):

  * This idea comes from the SD scheduler of Con Kolivas:
  */
 static inline void sched_init_granularity(void)
 {
         unsigned int factor = 1 + ilog2(num_online_cpus());

This boot-time "ilog2()" tuning based on the number of CPUs available is a tuning approach i saw in SD and i asked Con whether i could use it in CFS. (to which Con kindly agreed.)

About documentation...

Posted Jul 25, 2007 23:38 UTC (Wed) by i3839 (guest, #31386) [Link]

Interesting info. But nothing about the fields I asked about. ;-)

Anyway, if those knobs only appear with CONFIG_SCHED_DEBUG enabled, I think it's better to document them in the Kconfig entry than in that documentation file. That way people interested in it can find it easily, and if the debug option ever disappears the help file won't need to be updated. When deciding whether to enable an option people look at the Kconfig text, so give all info they need to know there.

sum_exec_runtime/sum_wait_runtime seems also interesting. The ratio is 1 to 1442 for ksoftirqd (it ran for 5 ms and it waited 7 seconds for that, ouch). While wait_runtime_overruns is 232 and zero underruns. (Sure those fields aren't swapped accidentally?)

events/0 info is also interesting, it has a se.block_max of 1.1 second, which seems suspiciously high.

se.wait_runtime inclused the time a task slept, right? Otherwise it should be zero for all tasks that are sleeping, and that isn't the case.

Another strange thing is that really a lot tasks have almost the same block_max of 1.818 or 1.1816 seconds. The lower digits are so close together that it seems like all tasks were blocked and unblocked at the same time. Oh wait, that is probably caused by resume from/suspend to ram.

About documentation...

Posted Feb 7, 2008 12:32 UTC (Thu) by anders.blomdell@control.lth.se (guest, #50377) [Link]

I had similar problems with sluggish performance when doing heavy disk I/O, turning off
Hyper-Threading (in the BIOS) on the processor put the system back to more normal behavior
(i.e. worst-case blocking for syslog dropped from 25 seconds to 9 milliseconds).

Benchmarks, scheduler instrumentation, PowerTop

Posted Jul 26, 2007 16:33 UTC (Thu) by pheldens (guest, #19366) [Link] (1 responses)

Call me blind, but how do i turn on CFS in 23-rc1(-git*)?
I dont see the .config options anywhere, or is it on by default?

Benchmarks, scheduler instrumentation, PowerTop

Posted Jul 26, 2007 21:48 UTC (Thu) by mingo (guest, #31122) [Link]

Yeah, it's on by default - got merged early in the .23 merge window and is included in the 2.6.23-rc1 kernel (and -git*).

Interview with Con Kolivas (APC)

Posted Jul 25, 2007 12:22 UTC (Wed) by ctpm (guest, #35884) [Link] (6 responses)

I just wish people would stop touting these conspiracy theories as if
this was something personal against CK.

IMHO, what we saw was simply an evolutionary process at work; maybe the
Fan Club who argues that SD should have been included in mainline instead
of CFS should start working to prove it with _benchmarks_ and presenting
_technical_ details about it -- instead of making all this drama which,
I'm sure, will be a lovely starting point for the FUD generators used by
the proprietary software companies.

Cláudio

Interview with Con Kolivas (APC)

Posted Jul 25, 2007 13:47 UTC (Wed) by freggy (guest, #37477) [Link] (5 responses)

Please RTFA.

Con Kolivas has been maintaining swap prefetching for more than a year. Users kept on saying that it had a positive effect on their workload, and gave examples of cases where it helped. Because Linux maintainers remained sceptical, Con developed a theoretical benchmark, which proofs that swap prefetching can have very positive effects.

Now maintainers are again discussing if it could be useful, are asking users to give examples where it helps, and propose to modify user space programs without actually saying how. If you want examples where it helps: go get them in the mailing list archives, users have been posting them for more than a year. You want numbers? Con wrote a benchmark. Several other developers have supported inclusion in mainline. Now continuing to discuss this again and again, is just ignorant, and personally I think it's hypocritical behaviour.

As a Linux user, I perfectly understand Con's frustration.. This whole discussion madness, is driving us crazy, and does not fix the problem for which we have a working solution right now.

time it takes to get a project into the upstream kernel

Posted Jul 25, 2007 14:26 UTC (Wed) by mingo (guest, #31122) [Link] (4 responses)

Con Kolivas has been maintaining swap prefetching for more than a year.

Btw., while I support the upstream integration of the swap-prefetch code (i ran Con's testapp, reported the numbers, reported regressions, tested fixes, reviewed the code, gave my Ack, etc.), i'd like to point out that the above characterisation is quite unfair towards lots of other kernel developers who currently have one feature or another queued for upstream integration.

Lots of features had to wait several years and go through lots of iterations before they were merged. Some are still not merged as of today. Some were rejected and abandoned.

Let me give you three examples for major piece of code i wrote and which never went upstream: the 4G/4G VM feature, exec-shield and Tux. I have written the 4G/4G feature more than 4 years ago, and it's quite a bit of code:

60 files changed, 1942 insertions(+), 706 deletions(-)

Compare this to swap-prefetch:

23 files changed, 857 insertions(+), 6 deletions(-)

So it's in the same ballpark in terms of complexity.

4G:4G was a major effort on my part, but the VM maintainers (and Linus) rejected it (fundamentally) for a number of reasons. In hindsight, they were more correct than wrong, but i sure was upset about it.

exec-shield i wrote more than 4 years ago too, that too was rejected by the VM maintainers. (for reasons i still dont agree with :-) Bits of it went upstream, but a fair chunk didnt. Was i upset about the decision? Sure i was, that is natural, when someone spends a lot of time on a project.

But there are other examples as well: the KDB patchset was first posted around 1998, 9 years ago. It was rejected numerous times and it is still not upstream. The scalable pagecache patches have been in the works for a long time as well, and they are still not upstream - although they were written by one of the VM maintainers (Nick Piggin), the same people who are currently not convinced about swap-prefetch (yet). There are countless other examples. (In fact a number of times we rejected code from Linus too - it happened not once that he has sent out some idea-patch which was then rejected and someone wrote something better.)

In Linux we reject _lots_ of code, and that's the only way to create a quality kernel. It's a bit like evolutionary selection: breathtakingly wasteful and incredibly efficient at the same time.

time it takes to get a project into the upstream kernel

Posted Jul 25, 2007 15:41 UTC (Wed) by msmeissn (subscriber, #13641) [Link]

You should try again with exec-shield.

The non-soft-NX parts look mergeable at least to my eyes.

time it takes to get a project into the upstream kernel

Posted Jul 26, 2007 15:31 UTC (Thu) by rmstar (guest, #3672) [Link] (2 responses)

In Linux we reject _lots_ of code, and that's the only way to create a quality kernel. It's a bit like evolutionary selection: breathtakingly wasteful and incredibly efficient at the same time.

In general, evolutionary algorithms have never had a serious breakthrough because turning on your brain (as long as one is involved) tends to produce much better results. The bottom line is: evolutionary selection is just wasteful, period. That after billions of years of mindless tinkering something interesting results doesn't mean that it is "efficient".

Back to the linux kernel, it seems to me that the fundamental human problem behind kernel development is that stuff has to get "approved" and merged in. If there was an easy way to keep changes separate, that didn't imply intense maintaining efforts, none of this would happen. We would have dozens of schedulers and VMs, the best would be used most, progress would be very fast, and there would be less fights and frustration.

The fact that good, motivated people that have a positive impact are leaving in frustration is not good at all. Please stop rationalizing it.

time it takes to get a project into the upstream kernel

Posted Jul 27, 2007 23:42 UTC (Fri) by maney (subscriber, #12630) [Link]

That after billions of years of mindless tinkering something interesting results doesn't mean that it is "efficient".

Granted that efficient may not be the best possible world, but look at what you just said. Evolution got us from complete mindlessness to sapience - brains from nothing. Efficient is damning it with faint praise...

If there was an easy way to keep changes separate, that didn't imply intense maintaining efforts, none of this would happen.

And if pigs had wings, they would fly. (is that too blunt? if so, I think it is nonetheless exactly true: not requiring considerable effort is tantamount to asking for the rate of change to be turned down, and while there could be some good reasons for that, I don't think that making it easier for external patches to limp along without ever progressing towards being included (or rejected) is remotely one such.)

time it takes to get a project into the upstream kernel

Posted Aug 2, 2007 7:54 UTC (Thu) by anandsr21 (guest, #28562) [Link]

If you wanted to reach point A from point B when there is no known path, then evolution is the fastest way to find the path. There is nothing better. It will try all remotely possible paths and discard the bad ones. It will also find the most efficient path. I think evolution is the best way to go for Open Source Software development. Any thing else is wasting time. Resources on the other hand are meant to be wasted, as you couldn't control them anyway.