Con Kolivas returns with a new scheduler [LWN.net]

Overloaded name

Posted Sep 1, 2009 15:53 UTC (Tue) by proski (subscriber, #104) [Link] (2 responses)

There are already two filesystems called bfs - one in SCO and one in BeOS. Calling a scheduler "BFS" would only add to the confusion.

Overloaded name

Posted Sep 1, 2009 17:38 UTC (Tue) by mattdm (subscriber, #18) [Link]

But do those stand for the same thing? I doubt it!

Overloaded name

Posted Sep 1, 2009 19:23 UTC (Tue) by jengelh (guest, #33263) [Link]

Hm. When I read BFS I had to think of the similarly-named filesystem (befs) and BFQ (CFQ-like I/O scheduler). Mind ye people where you pick names from! TLAs are of limited space, much like IPv4, but worse. Pick a name. Like the AES (ick, TLA!) contestants used to - Rijndael, Serpent, etc.

Looking for the discussion

Posted Sep 1, 2009 16:09 UTC (Tue) by alex (subscriber, #1355) [Link] (2 responses)

So I fire up my mail client and go to read the discussion on lkml after the announcement of this new scheduler. The mailing list seems curiously quiet and my searches don't seem to show anything up. Could it be this is currently just another out of tree patch and out of tree development?

I applaud anyone who want to wade into the dragon infested layer that is the scheduler. However I wish people would learn the lessons about the perils of not engaging lkml as early as possible.

Looking for the discussion

Posted Sep 1, 2009 16:14 UTC (Tue) by glikely (subscriber, #39601) [Link] (1 responses)

From Con's FAQ: "Because it's designed in such a way that mainline would never be interested in adopting it, which is how I like it."

It doesn't sound like Con has any intention of trying to get it merged into mainline, or posting it to the LKML.

Looking for the discussion

Posted Sep 1, 2009 16:16 UTC (Tue) by jonth (guest, #4008) [Link]

Pre-emptive storming off - pure genius!

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 16:23 UTC (Tue) by Baylink (guest, #755) [Link] (3 responses)

That's not storming off. Hell, that's not even a really decent Goodbye, Cruel List.

You want storming?

http://images.baylink.com/jra/buhbye.txt

*That's* storming off. Well, ok, politely. :-)

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 23:55 UTC (Tue) by coriordan (guest, #7544) [Link] (1 responses)

tl;dr

What part should I search for for the storming off bit?

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 15:19 UTC (Wed) by Baylink (guest, #755) [Link]

"you are none of you human beings"?

I said it was polite... :-)

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 7:15 UTC (Wed) by Los__D (guest, #15263) [Link]

Looks more like being kicked, for post off-topic for the 1000th time to me.

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 16:29 UTC (Tue) by kerick (subscriber, #53036) [Link] (35 responses)

Am I correct that this scheduler is essentially worthless for any recent ( >=k8 ) AMD and i5/7 due to its' lack of NUMA support? I used to use ck kernels all the time and would be disappointed if he has specifically targetted every processor but the ones I have.

It says "lower spec"

Posted Sep 1, 2009 16:48 UTC (Tue) by ncm (guest, #165) [Link] (18 responses)

A scheduler optimized for netbooks and cellphones would not be such a bad thing.

It says "lower spec"

Posted Sep 1, 2009 16:52 UTC (Tue) by alex (subscriber, #1355) [Link] (17 responses)

It's not going to get far on cellphones and notebooks without dynamic ticks. Spake the FAQ:

"Configure your kernel with 1000Hz, preempt ON and disable dynamic ticks."

It says "lower spec"

Posted Sep 1, 2009 18:34 UTC (Tue) by drag (guest, #31333) [Link] (14 responses)

Ya.. he needs to fix that. Otherwise it's very interesting.

Until the Linux kernel gets everything all situated out I think one of the big things that distros can probably do to improve user experience is provide Desktop/Multimedia optimized kernels.

Normally I am a big fan of 'do one thing and get it right before worrying about features' approach, but it's irritating to have to recompile my kernel to get proper responsiveness because the kernels from Debian are all optimized for server use.

$ grep -i preempt /boot/config-2.6.*
/boot/config-2.6.26-2-686:CONFIG_PREEMPT_NOTIFIERS=y
/boot/config-2.6.26-2-686:CONFIG_PREEMPT_NONE=y
/boot/config-2.6.26-2-686:# CONFIG_PREEMPT_VOLUNTARY is not set
/boot/config-2.6.26-2-686:# CONFIG_PREEMPT is not set
/boot/config-2.6.30-1-686:# CONFIG_PREEMPT_RCU is not set
/boot/config-2.6.30-1-686:# CONFIG_PREEMPT_RCU_TRACE is not set
/boot/config-2.6.30-1-686:CONFIG_PREEMPT_NOTIFIERS=y
/boot/config-2.6.30-1-686:CONFIG_PREEMPT_NONE=y
/boot/config-2.6.30-1-686:# CONFIG_PREEMPT_VOLUNTARY is not set
/boot/config-2.6.30-1-686:# CONFIG_PREEMPT is not set

I mean, seriously.. How long has Preempt support been around? *cry*

It says "lower spec"

Posted Sep 1, 2009 22:30 UTC (Tue) by cortana (subscriber, #24596) [Link] (13 responses)

Has anyone ever filed a bug requesting that these options be enabled?

It says "lower spec"

Posted Sep 2, 2009 1:54 UTC (Wed) by N0NB (guest, #3407) [Link] (12 responses)

About a year or two ago I filed a Debian bug report asking for a desktop enabled version of the kernel and essentially received a "thanks but no thanks" closure message. :-/

Desktop Debian = Ubuntu

Posted Sep 2, 2009 4:55 UTC (Wed) by zlynx (guest, #2285) [Link] (9 responses)

I don't think that I know anyone who runs Debian on the desktop anymore. It seems most run Debian on servers and Ubuntu on the desktop, if they like Debian'ish systems.

Desktop Debian = Ubuntu

Posted Sep 2, 2009 5:43 UTC (Wed) by jordanb (guest, #45668) [Link] (3 responses)

Try meeting some more people. It's a whole world out there.

Desktop Debian = Ubuntu

Posted Sep 2, 2009 7:39 UTC (Wed) by patrick_g (subscriber, #44470) [Link] (1 responses)

>>> Try meeting some more people. It's a whole world out there.

What is your solution? Recompiling the kernel and lose all the advantages of a distribution kernel? Use a server optimized kernel on your laptop?

Desktop Debian = Ubuntu

Posted Sep 2, 2009 13:00 UTC (Wed) by nye (subscriber, #51576) [Link]

IIRC a couple of years back Con Kolivas had a feature to allow switching the scheduler at runtime, which was deemed a pointless feature and rejected.

Desktop Debian rocks

Posted Sep 2, 2009 8:32 UTC (Wed) by man_ls (guest, #15091) [Link]

Alternatively, try finding out some hard data. For example, Debian's own popularity contest. Taking base admin packages as a baseline (83696), of which over 95% are regularly used, you may find that about 56% use X11 libraries regularly, about half installed desktop-base and 26% use metacity. GNOME is even more popular with 56% having installed gnome-keyring and 32% using it regularly. KDE is less popular with only 23% installing kdebase-data and 10% using any package regularly (interesting, didn't know there was such a disparity with GNOME), while XFCE shows up at 3.7%.

You might say that these are servers being admin'd graphically. Let us see typically desktop-y applications: quick browsing shows regular users of Firefox (iceweasel really) at 33%, libgstreamer at 27%, evince at 26%, libgphoto2 and openoffice both at 25%. To put these figures in perspective, Apache is at 44% and Samba at 27%.

There are lots of bias in the sample: only utter geeks would install popularity-contest, and only properly connected machines will show up. I would counter that both things pretty much describe Debian's audience. IMHO saying that 50% of Debian users have it as a desktop is a good estimation.

Desktop Debian = Ubuntu

Posted Sep 2, 2009 11:54 UTC (Wed) by N0NB (guest, #3407) [Link]

I prefer Debian over Kubuntu as my Kubuntu partition is still infested with GNOME nonsense. At least on Debian I can eliminate the GNOME stuff (or not install it in the first place) without breaking the distribution. That said, right now desktop effects with compositing are working much better in Karmic than Sid, both are up-to-date.

Desktop Debian = Ubuntu

Posted Sep 2, 2009 20:41 UTC (Wed) by branden (guest, #7029) [Link] (3 responses)

"I don't think that I know anyone who runs Debian on the desktop anymore."

Nice to meet you. I'm Branden.

Desktop Debian = Ubuntu

Posted Sep 2, 2009 20:47 UTC (Wed) by zlynx (guest, #2285) [Link] (1 responses)

But do I know you?

I mean, when I said it, I meant people I actually know in person and I know what they're running.

Now, as I don't make a habit of going around asking people if they're running Ubuntu or Debian or what, my sample size is about 5 people.

One of those I know used to run a Debian laptop and I *think* he is running Ubuntu now but possibly not. I know the other 4 are running Ubuntu.

Desktop Debian = Ubuntu

Posted Sep 2, 2009 21:01 UTC (Wed) by alex (subscriber, #1355) [Link]

Hey I suggested my other half install Debian on her old laptop as she wanted experiment with free software. I was slightly worried she would have issues with the wireless but it all installed and ran fine. I think Debian makes a fantastic desktop OS if your not concerned about the latest whizz bang graphics effects or media players. It's stable and lightweight and very developer friendly.

Desktop Debian = Ubuntu

Posted Sep 3, 2009 13:10 UTC (Thu) by nix (subscriber, #2304) [Link]

Former DPLs need not apply ;}

(of course I'm running Debian on the desktop too, and adminning it remotely for my mum, who's in the same boat, and all she wants is something that Just Works and isn't virally infestable...)

It says "lower spec"

Posted Sep 2, 2009 9:45 UTC (Wed) by cortana (subscriber, #24596) [Link] (1 responses)

Indeed, I should have searched before I posted.

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=311185 -- as of 2007, more testing for stability required; 'realtime patch' also mentioned, I have no idea if that still exists for current kernels or whether it's been merged

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=496871 -- request for benchmarks (along with "please stop waffling", great way to interact with users, kernel team...)

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=539209 -- filed recently (July 09) but no reply

Does anyone actually have any benchmarks demonstrating the efficacy of the pre-emption options?

It says "lower spec"

Posted Sep 2, 2009 16:57 UTC (Wed) by drag (guest, #31333) [Link]

It's hard to benchmark.

Technically a preemptive kernel will be slower then a non-preemptive one in most complex benchmarks. This is because going from process to process rapidly means more context changes and thus you lose out on cpu memory cache and all that.

But since the desktop is idle 99% of the time then it's easy to make the justification that it's worth it to say "Ya it takes a couple milliseconds longer to open a webpage, but this way I can do it without getting my music interrupted or keep my game/movie framerate high."

Intel developed a tool called latencytop that can be used to identify processes that are hogging the system and can causing usability or deterministic time problems.

Remember the point to having 'realtime' performance is not to make things _faster_ per say.. it's to make things more deterministic. So you know how long it will take to get something done. On a very hard-ish realtime system you can say "It's going to take a maximum of 30msec to accomplish X task" and you can depend on it. On a typical Linux server system it may take 5-10msec most of the time to do the same amount of work, but if something else is going on then it may take 500msec or more; You can't tell how long something is going to take, even though it's likely to get done faster on average.

This sort of trade off is what you need to keep your video smooth, games fast, music interrupt free, scientific measurements accurate, robotic assembly machines from zapping the wrong parts of a chassis, etc etc. Anytime you need to interact with the real world....

So ya.. benchmarks are very difficult and are skirting the issue.

It says "lower spec"

Posted Sep 2, 2009 13:41 UTC (Wed) by guus (subscriber, #41608) [Link] (1 responses)

Notice the line directly above the note about configuring for 1000Hz and disabling tickless:

"THESE ARE OPTIONAL FOR LOWEST LATENCY. YOU DO NOT NEED THESE!"

So with tickless kernels, you get a slightly less low latency, but it would work fine.

It says "lower spec"

Posted Sep 2, 2009 15:12 UTC (Wed) by rsidd (subscriber, #2582) [Link]

There is no changelog to the FAQ so I'm not sure, but I think that line is new. Ditto for the NUMA line. I'm sure Con Kolivas uses reasonably contemporary machines, even if he disdains optimising for machines with 4096 CPUs.

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 17:01 UTC (Tue) by intgr (subscriber, #39733) [Link] (11 responses)

I can't see why it wouldn't work well on a typical single-processor multi-core k8 or i7 chip. One CPU package only contains a single memory controller, so all the memory is uniform.

Am I missing something here?

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 1:52 UTC (Wed) by lethal (guest, #36121) [Link] (10 responses)

Not quite so simple. There are plenty of single-CPU systems with NUMA characteristics. Memory-only NUMA nodes are becoming fairly common place, both in commodity and especially in embedded platforms. Indeed, many Linux-based cellphones are shipping with NUMA enabled by default today -- and more recently also on the microcontroller side, albeit without page migration.

Any scheduler that fails to take issues like NUMA, SMP, dynamic ticks, etc. in to account while claiming to be "looking forward" will remain nothing but a toy scheduler for an insular workload. All of these have effectively become common place, to the extent that simply discounting them out of hand reads a lot more like looking back than forward, especially given that many low memory systems depend on all of these capabilities.

In any event there is anything wrong with out of tree pet projects, especially for trying out new things. If this new scheduler improves things for certain workloads, then hopefully someone will step up to work with upstream and improve things there incrementally.

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 15:01 UTC (Wed) by jzbiciak (guest, #5246) [Link] (9 responses)

Any scheduler that fails to take issues like NUMA, SMP, dynamic ticks, etc. in to account while claiming to be "looking forward" will remain nothing but a toy scheduler for an insular workload

I believe what Con meant by "forward looking" when he described his scheduler is that it doesn't adjust time slices based on recent history (ie. "backward looking") of a task's run/sleep behavior. From Con's post:

I feel the scheduler should being forward looking only (not calculating sleep)

His explicit rejection of NUMA and high-CPU-count machines makes it clear he's only really interested in what constitutes a "typical desktop" today. From Con's post:

Machines with NUMA will probably suck a lot with this because it pays no deference to locality of the NUMA nodes when deciding what cpu to use.

I imagine the loss of raw MIPS due to lack of NUMA awareness on even a high end personal desktop isn't too great either. Modern desktop machines are NUMA, but they're fairly mild NUMA from what I gather. (Why else would my Opteron's BIOS offer me the option to interleave my memory across all nodes? That'd be disaster on a more extreme NUMA architecture, but it supposedly provides a performance boost on older non-NUMA-aware OSes.)

In fact, I'd further imagine the average user would trade actual increased responsiveness for a few % loss on peak benchmark performance numbers. At the very least, I imagine Con might. :-)

All that said, the place where I experience the greatest loss in responsiveness is Firefox, not the rest of my desktop, and Firefox is just a single thread at the moment. Con, can you come fix Firefox? ;-)

Con Kolivas returns with a new scheduler

Posted Sep 3, 2009 2:37 UTC (Thu) by charris (guest, #13263) [Link] (8 responses)

I find that gnome terminals will ignore my bluetooth keyboard for up to several seconds. I don't know if this is because of the bluetooth driver, X, or the scheduler, but it sure is annoying. And this on a 3GHz quad core system.

Con Kolivas returns with a new scheduler

Posted Sep 3, 2009 13:15 UTC (Thu) by nix (subscriber, #2304) [Link] (4 responses)

It's certainly not the bluetooth driver or X, because I see this on the console of a one-die four-core Nehalem (Core i7) with no X running at all and a PS/2 keyboard. It's lightly loaded by these standards (load average 8.1, which as it's hyperthreaded is pretty much equivalent to 'everything loaded but not much'), and surely isn't swapping (12Gb RAM, 6Gb *free*, not even used for cache). Yet I see three-second pauses in keyboard activity.

I'll turn on latencytop and see what it says, but the pauses are fairly rare so it might be interesting to interpret.

Con Kolivas returns with a new scheduler

Posted Sep 3, 2009 13:39 UTC (Thu) by jzbiciak (guest, #5246) [Link]

I've only gotten multi-second pauses like that when I had a hard drive going bad, or I was swapping rather furiously.

With that much RAM, I wonder if VM housekeeping itself could cause such lags. 12Gb would have 3 million 4K pages. If you are running at 3GHz and did something that averaged 3000 cycles/page across all 3 million pages, that's 3 seconds.

Con Kolivas returns with a new scheduler

Posted Sep 3, 2009 17:31 UTC (Thu) by charris (guest, #13263) [Link] (2 responses)

I also see "stuck" keys where a letter will repeat until the buffer is full. I see this on various hardware with various keyboards, so it isn't an actual stuck key. Do you see that also?

Con Kolivas returns with a new scheduler

Posted Sep 3, 2009 18:14 UTC (Thu) by jzbiciak (guest, #5246) [Link]

That sounds more like a dropped event somewhere such as dropped interrupt or the like. I haven't experienced that on any Linux box so far.

Stuck keys.

Posted Sep 4, 2009 2:47 UTC (Fri) by ncm (guest, #165) [Link]

I've seen on 2.6.29. Unplugging and re-plugging the (USB) keyboard didn't fix it. Only switching to console and back did it. I was inclined to blame the new X input system. It hasn't happened since I upgraded X and also went to 2.6.30, so who knows?

Input pauses

Posted Sep 4, 2009 4:09 UTC (Fri) by ncm (guest, #165) [Link] (2 responses)

I see input-event stalls for up to six seconds in Firefox (Debian Iceweasel, actually), routinely. To get it I just have to leave it running for a week or so, with a few dozen pages, some of which self-update.

I doubt the scheduler would help; during those six seconds, one CPU is pegged at 100%. This is with nothing swapped. I suspect it's doing a garbage collection scan during each pause. I'd welcome suggestions for how to discover what's going on.

Google Chrome is coming along too slowly.

Input pauses

Posted Sep 4, 2009 4:32 UTC (Fri) by jzbiciak (guest, #5246) [Link]

I know Firefox's stalls are Firefox's fault, not the Linux scheduler's. I was begging Con to do interactivity work on Firefox. ;-)

(As in, contribute to the Mozilla team.)

Input pauses

Posted Sep 4, 2009 8:17 UTC (Fri) by Cato (guest, #7643) [Link]

I've had these pauses for several seconds sometimes in Firefox 3.5.2, but they seem to be associated with large writes e.g. un-tarring a huge file at the same time - so I suspect it's the well known "Firefox 3.0 + ext3 + fsync" issue, reported in http://lwn.net/Articles/284126/ and elsewhere.

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 17:51 UTC (Tue) by Velmont (guest, #46433) [Link] (1 responses)

Worthless? In the FAQ it seems pretty reasonable. Would be fun to try. I also used -ck kernels in the good old days

For years we've been doing our workloads on linux to have more work than we had CPUs because we thought that the "jobservers" were limited in their ability to utilise the CPUs effectively (so we did make -j6 or more on a quad core machine for example). This scheduler proves that the jobservers weren't at fault at all, because make -j4 on a quad core machine with BFS is faster than *any* choice of job numbers on CFS.

Con Kolivas returns with a new scheduler

Posted Sep 7, 2009 10:09 UTC (Mon) by jlokier (guest, #52227) [Link]

The real reason we used to use "make -j MORE_THAN_NR_CPUS" was to get useful work done when a process waits on I/O. Con's scheduler doesn't help with that: if gcc foo.c needs to read foo.c from disk, you must have more jobs than CPUs, or you have an idle CPU.

What's changed is nowadays most of us have enough RAM to keep the whole kernel source and object files in cache, so there's no I/O delay after the first compile of the day.

I'm surprised Con's scheduler compiles more quickly than CFS, and that has to be worth looking into.

Con Kolivas returns with a new scheduler

Posted Sep 4, 2009 9:31 UTC (Fri) by DavidG (guest, #60628) [Link]

FWIW: for AMD, only Opterons have NUMA and AFAIK Intel i7 don't have NUMA either.
So you're save to use BFS.

Con Kolivas returns with a new scheduler

Posted Sep 5, 2009 3:28 UTC (Sat) by realnc (guest, #60393) [Link]

"BIG NUMA machines will probably suck a lot with this because it pays no deference to locality of the NUMA nodes when deciding what cpu to use. It just keeps them all busy. The so-called "light NUMA" that constitutes commodity hardware these days seems to really like BFS."

The point is that big NUMA machines aren't used for Desktops ;)

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 17:50 UTC (Tue) by realnc (guest, #60393) [Link] (1 responses)

I missed the guy. The way the kernel devs treated him in the past was awful. It made me hate the kernel devs for the first time.

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 18:03 UTC (Tue) by bjacob (guest, #58566) [Link]

Don't remember about that specifically, but I agree that at the very least, Con Kolivas' work hasn't gotten the attention it deserved at the time. I distinctly remember running linux 2.6.4-ck1 and 2.6.7-ck4 and these were the most responsive desktop settings I've ever had. I could start a heavy compilation job, and concurrently a heavy disk I/O job, and still have perfect desktop responsiveness. I wish that the kernel that's shipped in my opensuse 11.2 were able to do the same, but it's not.

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 18:07 UTC (Tue) by einstein (guest, #2052) [Link]

Sure, bfs is admittedly not for the top 500 supercomputer crowd, but it sounds compelling for the typical gaming/multimedia oriented linux desktop user who wants low latency and good performance even on modest hardware.

Thanks Con man, I will check this out!

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 18:15 UTC (Tue) by lseubert (guest, #4168) [Link] (1 responses)

Welcome back Con. He makes a wry, amusing point with this one:

Great XKCD cartoon about hacking on kernel features used by very few people.

Is Con planning on making BFS available through git?

Con Kolivas returns with a new scheduler

Posted Sep 4, 2009 7:49 UTC (Fri) by shane (subscriber, #3335) [Link]

From the linked-to text:

GIT repository?

Sorry, it's not the right tool for me so it's not worth me investing the time
in setting one up.

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 18:49 UTC (Tue) by zooko (guest, #2589) [Link]

Here is a better link to document Con Kolivas's departure from mainline kernel dev: http://apcmag.com/why_i_quit_kernel_developer_con_kolivas... . It is an interesting, thought-provoking interview that more developers should read.

I'm delighted to see that he's producing kernel patches again.

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 19:08 UTC (Tue) by flewellyn (subscriber, #5047) [Link]

So, what's the actual algorithm? I'd be very interested in that.

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 19:22 UTC (Tue) by hingo (guest, #14792) [Link] (22 responses)

http://imgs.xkcd.com/comics/supported_features.png

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 20:08 UTC (Tue) by jordanb (guest, #45668) [Link] (21 responses)

As funny as that may be, it's worth observing that People With Money (the HPC crowd) are willing to put up lots of it to make sure Linux runs well on large systems. It's not geekery that drives HPC features in Linux, it's a real economic need.

And the reason why the "flash experience" is poor in Free Software is because it's a proprietary format that has to be reverse-engineered at great effort. Why the proprietary Adobe player is crappy on Linux (and OSX)... is presumably because Adobe doesn't see it to be in their best interest to offer any more than token support for non-Wintel platforms.

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 21:56 UTC (Tue) by drag (guest, #31333) [Link] (11 responses)

That would be nice if it's entirely true... but it's not.

Linux has a number of serious deficits in terms of video drivers and API support on it's desktop that makes what Adobe does on Linux very very difficult. It's much harder for Adobe to get good and consistent performance on Linux then it is on Windows or even OS X due to a number of reasons.

-------------------------------

Here is a example.

Look at DirectX vs OpenGL + SDL. SDL and OpenGL is about the closest we have to DirectX.

As you know gaming development is now dominated by DirectX. Now.. why would developers choose DirectX over OpenGL? OpenGL is fine and can take advantage of hardware just as well when you add vendor extensions and OpenGL is cross platform. So there has to be other things that the developers find valuable over DirectX.

Now you can point out developer tools and hype/marketting and that sort of thing, which is true. However I am convinced that one of the major reasons why DirectX displaced OpenGL as the API of choice for gaming developers is because of _constistancy_.

Even in Windows each vendor has it's own OpenGL stack. Sure you get a generic one from Microsoft, but when you install proprietary drivers then you end up using their proprietary stack.

Intel, ATI, Nvidia.. all have different OpenGL implementations. And vendor extensions are different from one to another. They have different performance profiles, different effects, and so on and so forth.

So in effect when your program for OpenGL on Windows your not targetting 'windows'. Windows is a like the Genus. Each video card provides you a different species of platform.

Were as with DirectX, Microsoft forces each vendor to have a consistent hardware platform and provides the unified software stack. When you program for DirectX there is not going to be much difference between end user's computers. This dramatically lowers the cost, effort, and support overhead of making games.

So in this way one of biggest reasons DirectX wins is not because of the techology, it's because of the developers have a easier time with it.

Now fast forward to Linux....

Linux is extremely schizophrenic. It's MUCH worse then anything you'd ever face in Windows.

Open source drivers vs nvidia drivers vs ati drivers vs open source nvidia drivers vs open source ati drivers. For a huge part of the target audiance you can't even depend on there being _ANY_ acceleration at all. And what acceleration is present, all except for proprietary nvidia users, is very inconsistant, buggy, and universally slow.

API support is spotty and incomplete. The only people have any sort of constistancy and performance approaching what you can get in Windows is proprietary Nvidia drivers... and even people using nvidia hardware they have to go through extra steps to install the drivers otherwise (until very recently) they lacked any sort of 3D acceleration at all.

Look at the discusson that went on here with Gnome 3.0 and Gnome shell. You have half of everybody bitching that requiring any sort of acceleration at all is just going to break their machines and make everything perform like shit, make it buggy, use to much resources, etc etc. When in fact when you take advantage of acceleration it _SHOULD_ things _faster_ and more efficient... A composited desktop is more efficient and faster then one that is not, for example. (it's just a better design.) Yet the common user experience on Linux is almost exactly opposite.

Any computer made in the last 13 years or so has relatively decent acceleration. Enough for many tasks. Any computer made in the last 7 years or so should have the capabilities to run simple and older OpenGL games and whatnot with accepetable performance. Even IGP stuff.

Yet it's very likely that a machine sold _yesterday_, when Linux is installed, will lack the ability to play Quakelive/Quake3 with good performance.. which is based on a _open_source_ game engine that was originally released for Linux 10 years ago.

This makes doing any sort of graphics work, if your goal is to support anybody other then just proprietary Nvidia users, just pure hell on Linux.

This is primarially why Adobe Flash on Linux sucks and is why you'll see the majority of open source and indie game development is happenning on Windows.

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 2:13 UTC (Wed) by PaulWay (guest, #45600) [Link] (9 responses)

All of this is an amusing, but somewhat biased viewpoint.

DirectX has been changing constantly with every release. Not only do new features get added but they've changed how things worked and removed old stuff. I've had endless problems trying to get old games to run on versions of DirectX later than the one they were designed for. And this still doesn't address the issues of getting the best performance out of the hardware - Windows game designers often have to write special code to work with different graphics and sound hardware because of 'special' features, just as hardware drivers have been known to add in workarounds for some games' broken behaviour...

So realistically this isn't actually dissimilar from the 'chaos' you describe in Linux. Sound and video 'just works' on Windows or OS X is just a myth. I've seen far too many Windows developers harp on about how difficult it must be to program for different kernel versions, sound drivers, etc and gloss over huge swags of 'custom' code. OpenGL makes this look like a piece of nice, stable cake by comparison.

And then you've got the fact that Microsoft has been shown to be redoing its APIs just so that it can get a development lead for its own products at the cost of its competitors. I'd rather deal with the Linux developer community, arguments and stormings-off and outright refusals and all, than be told outright "we've just redesigned the fundamental API that your product uses, and we're releasing a product that competes with yours that uses that API, and its coming out at the same time that you're getting the API specs."

Have fun,

Paul

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 5:01 UTC (Wed) by zlynx (guest, #2285) [Link] (8 responses)

Every version of DirectX back to freaking *5* is still supported. I can play Master of Orion 2, a 1996 game, on Windows 7 13 years later.

Microsoft is the *master* of backward compatibility.

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 5:44 UTC (Wed) by PaulWay (guest, #45600) [Link] (1 responses)

*shrugs* Well, I can't play Battlezone (1996, DirectX 5) or Battlezone 2 (1997, DirectX 7) on my Windows XP (DirectX 9c) machine. Both games bork out complaining that DirectX is not installed. So I guess those masters of compatibility missed something somewhere.

Have fun,

Paul

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 8:07 UTC (Wed) by zlynx (guest, #2285) [Link]

I won't say MS is perfect at it. I will say that I think they are the best at backward compatibility.

For DirectX, I can say that my older games seem to work fine.

A Google search shows me people playing all sorts of Battlezone on XP, Vista and Vista64. It looks like there are patches.

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 8:03 UTC (Wed) by trochej (guest, #35052) [Link] (5 responses)

There are companies that are much better at backwards compatibility than Microsoft. I use a kernel module compiled specifically for Sun Solaris system released eight years ago. I load it in the kernel of the latest development release of OpenSolaris. It just works and it is guaranteed. You can link code compiled with the latest release of compiler suite with libraries produced by years older release. If you can't, it is a bug and you fill it. Heck, I can link my SunStudio code with gcc produced code, to some extent and it was designed that way. So you really can do this. I suspect that there are companies even better at this than Sun.

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 14:34 UTC (Wed) by jordanb (guest, #45668) [Link] (4 responses)

FWIU IBM z/OS is compatible with every OS in its linage back to the OS/360 from the 1960s.

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 16:15 UTC (Wed) by zlynx (guest, #2285) [Link] (2 responses)

AFAIK, IBM does it with virtualization. When running 1960's software it is running the 1960's OS on simulated 1960's hardware.

Con Kolivas returns with a new scheduler

Posted Sep 3, 2009 4:56 UTC (Thu) by k8to (guest, #15413) [Link] (1 responses)

Isn't that kind of aside from the point? I mean how they do it is pretty much their choice. They do it.

Con Kolivas returns with a new scheduler

Posted Sep 3, 2009 13:28 UTC (Thu) by zlynx (guest, #2285) [Link]

I was just commenting on how they do it. I am not saying anything bad about it.

Con Kolivas returns with a new scheduler

Posted Sep 6, 2009 11:46 UTC (Sun) by trasz (guest, #45786) [Link]

z/OS keeps full compatibility only for applications that don't mess with system internals. If your program does something strange (although supported by IBM), it's possible that it will cease working in few releases from now. Of course there is a whole process of phasing out features, so it won't be a nasty surprise to you or your customers - but still, they are definitely not keeping 30 years of backward compatibility for that.

Welcome To The Jungle

Posted Sep 3, 2009 13:09 UTC (Thu) by gouyou (guest, #30290) [Link]

It reminds me of this blog post from the Adobe/Linux/Flash guy: Welcome To The Jungle.

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 21:57 UTC (Tue) by rrdharan (subscriber, #41452) [Link] (8 responses)

> Why the proprietary Adobe player is crappy on Linux
> (and OSX)... is presumably because Adobe doesn't see
> it to be in their best interest to offer any more than
> token support for non-Wintel platforms.

It's interesting that Linux users like to complain about the poor quality of proprietary software implementations for their platform. What vendor actually makes *good* proprietary software for Linux? All I ever see are complaints about Nvidia, complaints about VMware, complaints about Adobe, etc.

It's possible that all of the developers at all these companies are stupid or incompetent, but I'd put forth the claim that it's simply really hard to do quality *proprietary* software for Linux. Things change way too fast, APIs break all the time, and you pretty much have to rely on the community being able to fix your code, which they can't do, because it's proprietary.

It's at least a more reasonable and less offensive position to come out and just say that proprietary software sucks (for all the usual reasons/ethos/ideology etc.), but it's annoying to see the complaints directed at these vendors with the implicit assumption that they could somehow do better while retaining their current software distribution, development, and revenue models. I don't think they can.

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 23:17 UTC (Tue) by drag (guest, #31333) [Link] (4 responses)

> It's possible that all of the developers at all these companies are stupid or incompetent, but I'd put forth the claim that it's simply really hard to do quality *proprietary* software for Linux. Things change way too fast, APIs break all the time, and you pretty much have to rely on the community being able to fix your code, which they can't do, because it's proprietary.

In this specific case I am talking about.. the state of Linux graphics support.. makes life hell for both open source and proprietary drivers equally. It's all graphics, games, media playback, that suffers. Not just proprietary.

Thats why I mentioned the situation with Gnomeshell and it's clutter dependency (which requires opengl) and people complaining about it. It should be a slam-dunk good thing, but it isn't because Linux graphics right now sucks.

To be fair things are improving quite a bit and I am hoping that Gallium design will bring the one biggest feature that Linux requires: constistantly.

They did it for Wifi with mac80211 and Network-Manager. They are getting there with PulseAudio... but right now graphics and GPU support is the major thing lacking.

I've mentioned it here before that GPU support is now a hard requirement for desktop. In the future it's only going to get worse if Linux does not improve.

Already Intel is introducing it's PineView stuff for it's next generation graphics stuff. That integrates the GPU into the CPU die. And guess what? I think it's PowerVR-based. So for some highly mobile Intel platforms you will require proprietary drivers just to use your CPU fully. ATI and Nvidia are going that direction also.

Nvidia currently sucks, but if your a gamer on Linux or you need to do heavy 3D stuff on Linux it's plainly obvious that anybody not using Nvidia proprietary drivers are second class citizens. The developers are all using Nvidia (more or less). Even with fully open source software.

I just feel that although Flash sucks a lot of it's problems are in fact Linux problems and they are getting blamed for things that are beyond their control.

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 23:25 UTC (Tue) by dlang (guest, #313) [Link] (1 responses)

as near as I can tell ATI (AMD) is moving away from proprietary drivers, not towards them.

Nvidia has always been proprietary and seems to have no interest in budging

so if Intel is moving from open to proprietary the score is one moving each direction and one standing still. hardly a decisive movement in any direction.

if Intel is in fact not moving to proprietary drives (and given their history with linux and involvement with X.org development, this would seem like an odd thing for them to do), then the result is two of the big three moving to open drivers, and one staying with proprietary drivers.

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 4:12 UTC (Wed) by pabs (subscriber, #43278) [Link]

Anyone with an nVidia card should be helping out the nouveau project:

http://nouveau.freedesktop.org/

Here is where intel stumbled WRT free graphics drivers:

http://mjg59.livejournal.com/111853.html

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 5:48 UTC (Wed) by patrick_g (subscriber, #44470) [Link] (1 responses)

>>> Intel is introducing it's PineView stuff for it's next generation graphics stuff. That integrates the GPU into the CPU die. And guess what? I think it's PowerVR-based

Are you sure about that ?

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 19:58 UTC (Wed) by drag (guest, #31333) [Link]

No I am not.

But I think that it's PowerVR core, like the Poulsbo/GMA 500 stuff which required proprietary drivers for 3D support.

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 23:46 UTC (Tue) by jwb (guest, #15467) [Link] (1 responses)

What, you really think so? VMWare makes a phenomenally good product for
Linux. There's plenty of other good proprietary software, like Eagle CAD,
Google Earth, etc. Even Adobe's other product, Reader, works well on Linux.

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 5:06 UTC (Wed) by zlynx (guest, #2285) [Link]

I agree.

And Nvidia's Linux drivers are awesome. Rebuild your kernel without the 4K tiny stacks that were forced down everyone's throats, and their drivers will run great.

They always did for me.

The open source nv driver on the other hand, could make a snail look fast.

ISVs

Posted Sep 2, 2009 9:12 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

No, I think these vendors really do just suck.

They suck on other platforms too. Microsoft's biggest obstacle for most of its existence has been its ISVs. DOS for example, was supposed to be (somewhat) platform independent, so that Microsoft could escape IBM and bring all the user's software with it. But ISVs wrote lots of software that depended on the raw PC hardware, and it was stuck there until the PC clone. I've used a machine that wasn't a PC clone, lots of programs just don't work, or don't work correctly.

It's actually got worse - as so often the user isn't the customer. More and more, ISVs produce software with sponsorship, advertising injection, or just plain malware included. The customer wants eyeballs, or machine cycles, the users are just stuck with it. So the ISV no longer even cares if users hate the software, it's more important to make it hard to remove than to make it worth keeping.

Con's previous exit

Posted Sep 1, 2009 22:29 UTC (Tue) by MarkWilliamson (guest, #30166) [Link] (2 responses)

My immediate reaction was that "storming off" seemed a bit unfair but looking at the e-mail in question I suppose it does constitute an angry exit.

The thing with that, though, was the context. It seemed like Con's patches were getting a rough ride for reasons that weren't entirely clear to me when following the discussion. There were some technical discussions around his patch but various other objections seemed somewhat undeserved. If I had to pick an example, it would be Linus's insistence that Con was denying the existence of a bug when Con was IIRC merely doubtful that the scheduler should effectively be hard coded to "nice" certain processes. A debate about whether it was pragmatically needed would have been sensible but that sadly didn't happen.

Given the situation, Con's exit didn't seem like the fit of pique that "storming off" usually implies but instead like a good developer who has been frustrated by an unclear situation after making a good faith effort to work with the community and get stuff people find useful into upstream - like we *always* say that they should. It was a bit stormy in the end but it did seem like he tried to make a go of it.

It's good to see Con is producing kernel code again. I've never used his patches but it is good to know that he's still producing them as lots of people seem to have benefited from them.

Con's previous exit

Posted Sep 1, 2009 22:41 UTC (Tue) by corbet (editor, #1) [Link] (1 responses)

Here's what I wrote about it at the time. The story is not so simple and, while I thought (and think) that Con's departure was a real loss, I also feel that he brought some of his problems onto himself. There is more to working with a community - especially a large community - than just posting patches.

Con's previous exit

Posted Sep 2, 2009 2:40 UTC (Wed) by MarkWilliamson (guest, #30166) [Link]

I think I might have been slightly misunderstanding the e-mail you linked actually - even when I read it the first time I thought it would be something about the scheduler argument whereas it seems to be about swap prefetch. Is it perhaps the case that Con had already decided to leave over the scheduler debacle, with the linked e-mail constituting a "sorry guys, I've already left the community" message when swap prefetch came up again in discussion?

When I read through them at the time I thought the discussions over the scheduler were badly handled by some developers and that the outcome for Con was not really deserved. The regression I remember him arguing over seem like a rather justified thing to discuss further since it was effectively a case of whether policy should be encoded in the kernel. On the other hand, my understanding of the swap prefetch situation was that there was a lack of hard evidence in favour of it, so the handling of that situation from upstream seemed far more balanced. I think it's fair enough not to tamper with a tricky subsystem unless there's a compelling motivation for it. I'm surprised that it wasn't possible to concoct useful benchmarks for the situation, really, since it didn't actually sound that esoteric.

I did remember the scheduler thing as the straw that broke the camel's back, though, rather than the swap prefetch patch?

Maybe this is all academic anyhow. If Con's doing stuff that's interesting to him and may benefit some users, there's no harm in that - plurality like this is a major benefit of the open source model, so I guess there's not really anything to be unhappy about!

Con Kolivas returns with a new scheduler

Posted Sep 1, 2009 23:42 UTC (Tue) by kjamc1982 (guest, #59655) [Link] (1 responses)

I am trying out the patch right now. Do not know if anything is really better
yet. What could I do run some benchmarks or what to test it out.

Interactivity tests

Posted Sep 2, 2009 10:23 UTC (Wed) by man_ls (guest, #15091) [Link]

Try running a CPU-intensive task while you do something that requires real time operation, and see if the patch helps. As a quick test: a kernel compilation, and watching a movie in mplayer with frame dropping disabled. I get "Your system is too SLOW to play this!" all the time, and that is with desktop preemption enabled. The kernel compilation itself should be faster if CPU time is shared more effectively between tasks, I think.

Another one: run a CPU-hogging task (such as this, but openssl speed should be enough) and a kernel compilation; time should be divided evenly between them. Something else which requires high interactivity: ab (apache benchmark) against your own machine. For the ultimate test: run kernel compilation, openssl speed, ab and mplayer in parallel and see how they fight it out.

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 2:31 UTC (Wed) by modernjazz (guest, #4185) [Link] (6 responses)

In my own tiny personal corner of the universe, the clear, slam-dunk
scheduler problem to solve is to preserve some bit of interactivity in
cases where your machine starts going into swap---enough, say, to allow
the user to click on the window close "x" in the upper right corner and
have it actually kill the process within a reasonable amount of time.
Currently, if I don't notice that a process is starting to swap within the
first 20 seconds or so, I basically never get my computer back until I hit
the power button, even if I'm willing to wait an hour (unless by good
fortune the OOM killer happens to nuke the process that is causing the
problem, which is a hit-and-miss occurrence). So in practice, one runaway
user-space process can easily crash the machine; not exactly what one
probably hopes from *nix. I understand that this is a somewhat challenging
problem, because the various bits of X and my desktop environment needed
to respond to my mouse click are spending all their time sitting on the
swap partition rather than executing in main memory.

But here's hoping anyway: should this patch help with that?

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 2:49 UTC (Wed) by MarkWilliamson (guest, #30166) [Link] (2 responses)

"maybe" ? ;-)

A CPU scheduler change *may* help directly in the case where the processes you care most about are not getting the CPU they need. Indirectly, if it gets the processes you care about to run more frequently then they should find their frequently-used memory areas being swapped out less. And that should also reduce the problems you're worried about.

But, as you rightly noted, it's a complex business with lots of processes talking to lots of others. The kernel doesn't have a way of knowing what memory a process will need next, it can only guess based on past behaviour. It's quite a simple problem if you think of it as "Interactive processes I care about" vs "Background processes". But what if an interactive process grows really large or spawns many children or if you're simply running too many large interactive processes. Something has got to be sacrificed and it's not always going to be immediately obvious which interactive process ought to get the chop.

Keeping X and your window manager well supplied should be a given though. I wonder if anybody has experimented with putting them into the realtime scheduling class. It'd be a grim hack and I don't like the idea of it but ... could be interesting to see what effect it has!

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 5:10 UTC (Wed) by zlynx (guest, #2285) [Link] (1 responses)

It's a memory management problem and not really anything to do with the CPU scheduler. Although I suppose it could try to assign seconds-long time slices to processes that are swapping instead of milliseconds.

I had very good results when I used to run a Linux laptop by memlocking some critical daemons into RAM. Stuff like X and some Gnome daemons. I used a GDB script to attach to the process and force it to call memlock with ALL and FUTURE flags.

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 8:48 UTC (Wed) by modernjazz (guest, #4185) [Link]

Thanks for the tips, I'll have to give this a try.

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 11:57 UTC (Wed) by niner (subscriber, #26151) [Link] (1 responses)

Just use ulimits so processes cannot fill your swap. This immediately
takes care of that problem. There's also a sysrq key that invokes the OOM
killer. So if you notice a run away process, you could use that while the
system still gets some work done.

Con Kolivas returns with a new scheduler

Posted Sep 4, 2009 2:38 UTC (Fri) by Tronic (guest, #59702) [Link]

Except that it is not really possible to prevent OOM with ulimits. The
limits are per-process and thus a larger number of processes can easily eat
all your RAM. Setting nproc and memory limits so that running out of RAM is
impossible is otherwise unfeasible.

This scenario is very real - just last week I managed to crash our master
server by accidentally running about 100 Valgrind-debugged web services at
the same time. The machine went completely unresponsive for a long time, so
finally we went to the server room to reset it the hard way (even the
console would not respond at all). Linux still doesn't handle OOM situations
properly, it seems.

Con Kolivas returns with a new scheduler

Posted Sep 17, 2009 10:05 UTC (Thu) by daenzer (subscriber, #7050) [Link]

Do you have any form of kernel preemption enabled? IME that helps a lot for interactivity vs. swap. But of course, there's only so much it can do in the face of a runaway process which eats memory like there's no tomorrow.

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 3:25 UTC (Wed) by k8to (guest, #15413) [Link]

Don Quixote rides again.

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 6:49 UTC (Wed) by sitaram (guest, #5959) [Link] (1 responses)

all this is fine, but...

I just found out this guy is an anesthetist. That's awesome! I've been in the IT field for 23+ years now and I'd find it hard to make that kind of contribution in my own field, leave alone some other field that I was not even trained in.

Wow. Double wow, actually...

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 15:36 UTC (Wed) by jzbiciak (guest, #5246) [Link]

I'm sure there's a light-hearted joke about his scheduler not looking at a task's recent sleeping behavior here. *chuckle*

Con Kolivas returns with a new scheduler

Posted Sep 2, 2009 20:06 UTC (Wed) by xorbe (guest, #3165) [Link]

linux scheduler is my #1 complaint, can't wait to take a rev of his work for a spin in a few weeks.

Con Kolivas returns with a new scheduler

Posted Sep 3, 2009 0:07 UTC (Thu) by smadu2 (guest, #54943) [Link] (4 responses)

New addition to the list of things that are not so good in Linux kernel/OS, 1. Sound 2. Graphics 3. Filesystem and now 4. Scheduler.

Con Kolivas returns with a new scheduler

Posted Sep 3, 2009 5:02 UTC (Thu) by k8to (guest, #15413) [Link] (3 responses)

Please don't complain about that which you don't understand.

The Sound and Graphics have arguable unpleasantries (though all platforms do).

The scheduler isn't optimal for some workloads, but it's also not bad. Compare it to the windows or Mac scheduler both of which regularly do really unpleasant things for both server and workstation workloads.

The filesytems are 100% fine.

Con Kolivas returns with a new scheduler

Posted Sep 3, 2009 10:17 UTC (Thu) by Cato (guest, #7643) [Link] (1 responses)

I don't agree the filesystems are currently 100% fine - I've lost a lot of data on ext3, most likely due to the default use of write caching in the drive (not a filesystem issue but the default is dangerous for use with most filesystems) combined with a lack of journal checksumming (added in ext4). Also, until btrfs is stable, we won't have disk block level checksums, as in ZFS, which is the one thing that really makes me want to use OpenSolaris.

See http://lwn.net/Articles/350072/ and in particular the link to the University of Wisconsin paper linked within the thread - they looked at reiserfs, ext3 and JFS, and how they cope with disk failures.

Con Kolivas returns with a new scheduler

Posted Sep 3, 2009 11:53 UTC (Thu) by hppnq (guest, #14462) [Link]

From what I've read about your problems with filesystem X eating your data on platform Y, you are likely to be a victim on any combination of X and Y.

Just make proper backups and do not assume that any filesystem is going to keep your data, especially if you stress it enough.

Con Kolivas returns with a new scheduler

Posted Sep 3, 2009 17:02 UTC (Thu) by smadu2 (guest, #54943) [Link]

OK.

Con Kolivas returns with a new scheduler

Posted Sep 4, 2009 11:10 UTC (Fri) by realnc (guest, #60393) [Link] (10 responses)

I've just installed a bfs (204) patched kernel on my Core 2 Duo. I fired up LMMS and started a CPU hungry project and there's not a single sound skip/pop/crackle. Smooth. It was giving lots of sound artifacts (due to under-runs) before. At the same time I started playing notes in the Fluidsynth synthesizer (utilizing a 1.8GB sound font!), and there's virtually no lag between the key press and when the note actually plays anymore.

Then I started a kernel build (-j2), fired up mplayer on a 1080p HD video, and continued working in LMMS.

Still smooth. That was totally killing LMMS before (too much lag). I felt like I was using the MS Windows version of LMMS. I now realize that the current scheduler totally sucks for Desktops.

IMHO, what Desktop Linux needs desperately right now are more people like Kolivas. Dude, if you're reading this: thank you. I hope people will contribute to BFS. But I guess the kernel devs will not even take a look at it since it doesn't "scale". (The sound cracks/pops and lag in my music authoring tool won't "scale" either, but it seems they don't care very much; you see, LMMS is not MySQL. A 1% regression there and they start running to fix it. But who cares about me making music on Linux, right?)

Con Kolivas returns with a new scheduler

Posted Sep 6, 2009 10:50 UTC (Sun) by kragil (guest, #34373) [Link] (9 responses)

I have to agree completely!

On my EeePC Linux (Fedora) hangs, skips and freezes. Really annoying. XP and OSX86 don't. I really wish some Netbook distro would integrate this scheduler ASAP or Linus would come to his senses and make the scheduler pluggable. I don't need fair behavior or 4096 CPU support on my netbook. I want it to react to my commands and play my media like it was supposed to be played, but I guess servers don't need that and so most of the kernel devs just don't give a shit.

Con Kolivas returns with a new scheduler

Posted Sep 6, 2009 17:14 UTC (Sun) by jbh (guest, #494) [Link] (8 responses)

If your eeepc has an SSD, skips/freezes are more likely to come IO stalls
than from the CPU scheduler. Particularly if it's the 16GB variant of
900/900a, those are known to have very poor performance. In that case there
are some mount options and other tunables that can make a large difference
(disabling barriers and so on, let me know if you need a full list).

Or is this a conventional hard drive model?

Con Kolivas returns with a new scheduler

Posted Sep 6, 2009 21:43 UTC (Sun) by kragil (guest, #34373) [Link] (7 responses)

Thank you.

It is a 901 with a fast 4GB SSD drive and a slow 8GB SSD. I have done some tweaking (Firefox ramdisk, ext4, noatime,noload) but maybe you have the right answers for my problems (I am using KDE4 on Fedora 11)

I will try 2.6.31 as soon as it is ready. I guess it might also help.

Con Kolivas returns with a new scheduler

Posted Sep 7, 2009 8:56 UTC (Mon) by jbh (guest, #494) [Link] (6 responses)

Right! You probably have most things covered. Make sure you use the fast
ssd for the system of course, and save the 8GB for data. Three more things:
- barrier=0 mount option
- elevator=deadline boot option
- tune2fs -o journal_data_writeback

The first one (mount -o barrier=0) is subjectively the one that makes the
most real difference. Ext4 has barriers enabled by default, as opposed to
ext3.

I have a slow-only 16GB 900. With your setup, it might be effective to put
all journals on the fast disk (external journal), but I guess that's if you
actually enjoy tweaking :)

Finally it's possible to replace that slow SSD with a faster one from
RunCore for example. But that doesn't explain why windows does better
(which surprises me a bit as I've heard others complain that windows is
very bad at least on the 16BG --- but maybe it's ok if it's installed on
the fast 4GB disk).

Con Kolivas returns with a new scheduler

Posted Sep 8, 2009 20:11 UTC (Tue) by realnc (guest, #60393) [Link] (3 responses)

You don't need an SSD. You need BFS. Cheaper and works wonders.

Con Kolivas returns with a new scheduler

Posted Sep 9, 2009 7:43 UTC (Wed) by jbh (guest, #494) [Link] (2 responses)

Huh? I'm not sure what you mean, I already have an SSD. That's the problem: it's very slow, random writes <100kB/s. And I strongly doubt that BFS can magically fix that.

What CAN fix it is a combination of: (i) write less, (ii) don't wait for writes. That's the point of the tuning. Nothing to do with process scheduling.

Con Kolivas returns with a new scheduler

Posted Sep 9, 2009 8:41 UTC (Wed) by realnc (guest, #60393) [Link] (1 responses)

This was related to the original comment by kragil you replied to. An SSD won't fix the problems described in this thread.

Con Kolivas returns with a new scheduler

Posted Sep 9, 2009 8:47 UTC (Wed) by jbh (guest, #494) [Link]

Ah, then we're in full agreement. An SSD will in fact *cause* the problems
described in this thread.

;-)

Con Kolivas returns with a new scheduler

Posted Sep 11, 2009 14:54 UTC (Fri) by SEMW (guest, #52697) [Link] (1 responses)

> - barrier=0 mount option
> - elevator=deadline boot option
>- tune2fs -o journal_data_writeback

Regarding the Deadline i/o scheduler, according to Wikipedia, "The kernel docs suggest this is the preferred scheduler for database systems, especially if you have TCQ aware disks, or any system with high disk performance.". Since an ssd on an eeepc is neither a database system, nor TCQ aware (TCQ being meaningless for SSDs), nor has high disk performance, what property makes it good for that workload?

Con Kolivas returns with a new scheduler

Posted Sep 11, 2009 15:45 UTC (Fri) by jbh (guest, #494) [Link]

Good question! The desired property here is mainly "not CFQ", since the
default CFQ scheduler seems to behave badly (wrt IO latency) with cheap
SSDs. See for example http://forum.eeeuser.com/viewtopic.php?id=23580 .

The noop scheduler is also sometimes recommended, but the linked thread
suggests it may suffer from IO starvation since it doesn't do any balancing
between processes.

[ it also recommends the following, which I haven't tried:
echo 1 > /sys/block/sda/queue/iosched/fifo_batch
]

Con Kolivas returns with a new scheduler

Posted Sep 9, 2009 15:19 UTC (Wed) by kmike (guest, #5260) [Link] (1 responses)

I'm kind of late to the party, but I was just reading the linked thread at LKML. Just check out these numbers (post by Jens Axboe):

BFS210 runs on the laptop (dual core intel core duo). With make -j4
running, I clock the following latt -c8 'sleep 10' latencies:

-rc9

        Max                17895 usec
        Avg                 8028 usec
        Stdev               5948 usec
        Stdev mean           405 usec

        Max                17896 usec
        Avg                 4951 usec
        Stdev               6278 usec
        Stdev mean           427 usec

        Max                17885 usec
        Avg                 5526 usec
        Stdev               6819 usec
        Stdev mean           464 usec

-rc9 + mike

        Max                 6061 usec
        Avg                 3797 usec
        Stdev               1726 usec
        Stdev mean           117 usec

        Max                 5122 usec
        Avg                 3958 usec
        Stdev               1697 usec
        Stdev mean           115 usec

        Max                 6691 usec
        Avg                 2130 usec
        Stdev               2165 usec
        Stdev mean           147 usec

-rc9 + bfs210

        Max                   92 usec
        Avg                   27 usec
        Stdev                 19 usec
        Stdev mean             1 usec

        Max                   80 usec
        Avg                   23 usec
        Stdev                 15 usec
        Stdev mean             1 usec

        Max                   97 usec
        Avg                   27 usec
        Stdev                 21 usec
        Stdev mean             1 usec

One thing I also noticed is that when I have logged in, I run xmodmap
manually to load some keymappings (I always tell myself to add this to
the log in scripts, but I suspend/resume this laptop for weeks at the
time and forget before the next boot). With the stock kernel, xmodmap
will halt X updates and take forever to run. With BFS, it returned
instantly. As I would expect.

So the BFS design may be lacking in the scalability end (which is
obviously true, if you look at the code), but I can understand the
appeal of the scheduler for "normal" desktop people.

Amazing difference. I'm tempted to try this thing out.

Con Kolivas returns with a new scheduler

Posted Sep 9, 2009 20:28 UTC (Wed) by realnc (guest, #60393) [Link]

Ingo posted a new version of latt.c that when used, reports better values than BFS.

Normally I'd say this smells fishy since it comes from Ingo, but I'm sure other devs would have objected if he would modify latt.c in any malicious way.