|
|
Subscribe / Log in / New account

BFS vs. mainline scheduler benchmarks and measurements

From:  Ingo Molnar <mingo-AT-elte.hu>
To:  Con Kolivas <kernel-AT-kolivas.org>, linux-kernel-AT-vger.kernel.org
Subject:  BFS vs. mainline scheduler benchmarks and measurements
Date:  Sun, 6 Sep 2009 22:59:52 +0200
Message-ID:  <20090906205952.GA6516@elte.hu>
Cc:  Peter Zijlstra <a.p.zijlstra-AT-chello.nl>, Mike Galbraith <efault-AT-gmx.de>
Archive‑link:  Article

hi Con,

I've read your BFS announcement/FAQ with great interest:

    http://ck.kolivas.org/patches/bfs/bfs-faq.txt

First and foremost, let me say that i'm happy that you are hacking 
the Linux scheduler again. It's perhaps proof that hacking the 
scheduler is one of the most addictive things on the planet ;-)

I understand that BFS is still early code and that you are not 
targeting BFS for mainline inclusion - but BFS is an interesting 
and bold new approach, cutting a _lot_ of code out of 
kernel/sched*.c, so it raised my curiosity and interest :-)

In the announcement and on your webpage you have compared BFS to 
the mainline scheduler in various workloads - showing various 
improvements over it. I have tried and tested BFS and ran a set of 
benchmarks - this mail contains the results and my (quick) 
findings.

So ... to get to the numbers - i've tested both BFS and the tip of 
the latest upstream scheduler tree on a testbox of mine. I 
intentionally didnt test BFS on any really large box - because you 
described its upper limit like this in the announcement:

-----------------------
|
| How scalable is it?
|
| I don't own the sort of hardware that is likely to suffer from 
| using it, so I can't find the upper limit. Based on first 
| principles about the overhead of locking, and the way lookups 
| occur, I'd guess that a machine with more than 16 CPUS would 
| start to have less performance. BIG NUMA machines will probably 
| suck a lot with this because it pays no deference to locality of 
| the NUMA nodes when deciding what cpu to use. It just keeps them 
| all busy. The so-called "light NUMA" that constitutes commodity 
| hardware these days seems to really like BFS.
|
-----------------------

I generally agree with you that "light NUMA" is what a Linux 
scheduler needs to concentrate on (at most) in terms of 
scalability. Big NUMA, 4096 CPUs is not very common and we tune the 
Linux scheduler for desktop and small-server workloads mostly.

So the testbox i picked fits into the upper portion of what i 
consider a sane range of systems to tune for - and should still fit 
into BFS's design bracket as well according to your description: 
it's a dual quad core system with hyperthreading. It has twice as 
many cores as the quad you tested on but it's not excessive and 
certainly does not have 4096 CPUs ;-)

Here are the benchmark results:

  kernel build performance:
     http://redhat.com/~mingo/misc/bfs-vs-tip-kbuild.jpg     

  pipe performance:
     http://redhat.com/~mingo/misc/bfs-vs-tip-pipe.jpg

  messaging performance (hackbench):
     http://redhat.com/~mingo/misc/bfs-vs-tip-messaging.jpg  

  OLTP performance (postgresql + sysbench)
     http://redhat.com/~mingo/misc/bfs-vs-tip-oltp.jpg

Alas, as it can be seen in the graphs, i can not see any BFS 
performance improvements, on this box.

Here's a more detailed description of the results:

| Kernel build performance
---------------------------

  http://redhat.com/~mingo/misc/bfs-vs-tip-kbuild.jpg     

In the kbuild test BFS is showing significant weaknesses up to 16 
CPUs. On 8 CPUs utilized (half load) it's 27.6% slower. All results 
(-j1, -j2... -j15 are slower. The peak at 100% utilization at -j16 
is slightly stronger under BFS, by 1.5%. The 'absolute best' result 
is sched-devel at -j64 with 46.65 seconds - the best BFS result is 
47.38 seconds (at -j64) - 1.5% better.

| Pipe performance
-------------------

  http://redhat.com/~mingo/misc/bfs-vs-tip-pipe.jpg

Pipe performance is a very simple test, two tasks message to each 
other via pipes. I measured 1 million such messages:

   http://redhat.com/~mingo/cfs-scheduler/tools/pipe-test-1m.c

The pipe test ran a number of them in parallel:

   for ((i=0;i<$NR;i++)); do ~/sched-tests/pipe-test-1m & done; wait

and measured elapsed time. This tests two things: basic scheduler 
performance and also scheduler fairness. (if one of these parallel 
jobs is delayed unfairly then the test will finish later.)

[ see further below for a simpler pipe latency benchmark as well. ]

As can be seen in the graph BFS performed very poorly in this test: 
at 8 pairs of tasks it had a runtime of 45.42 seconds - while 
sched-devel finished them in 3.8 seconds.

I saw really bad interactivity in the BFS test here - the system 
was starved for as long as the test ran. I stopped the tests at 8 
loops - the system was unusable and i was getting IO timeouts due 
to the scheduling lag:

 sd 0:0:0:0: [sda] Unhandled error code
 sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
 end_request: I/O error, dev sda, sector 81949243
 Aborting journal on device sda2.
 ext3_abort called.
 EXT3-fs error (device sda2): ext3_journal_start_sb: Detected aborted journal
 Remounting filesystem read-only

I measured interactivity during this test:

   $ time ssh aldebaran /bin/true
   real  2m17.968s
   user  0m0.009s
   sys   0m0.003s

A single command took more than 2 minutes.

| Messaging performance
------------------------

  http://redhat.com/~mingo/misc/bfs-vs-tip-messaging.jpg  

Hackbench ran better - but mainline sched-devel is significantly 
faster for smaller and larger loads as well. With 20 groups 
mainline ran 61.5% faster.

| OLTP performance
--------------------

http://redhat.com/~mingo/misc/bfs-vs-tip-oltp.jpg

As can be seen in the graph for sysbench OLTP performance 
sched-devel outperforms BFS on each of the main stages:

   single client load   (   1 client  -   6.3% faster )
   half load            (   8 clients -  57.6% faster )
   peak performance     (  16 clients - 117.6% faster )
   overload             ( 512 clients - 288.3% faster )

| Other tests
--------------

I also tested a couple of other things, such as lat_tcp:

  BFS:          TCP latency using localhost: 16.5608 microseconds
  sched-devel:  TCP latency using localhost: 13.5528 microseconds [22.1% faster]

lat_pipe:

  BFS:          Pipe latency: 4.9703 microseconds
  sched-devel:  Pipe latency: 2.6137 microseconds [90.1% faster]

General interactivity of BFS seemed good to me - except for the 
pipe test when there was significant lag over a minute. I think 
it's some starvation bug, not an inherent design property of BFS, 
so i'm looking forward to re-test it with the fix.

Test environment: i used latest BFS (205 and then i re-ran under 
208 and the numbers are all from 208), and the latest mainline 
scheduler development tree from:

  http://people.redhat.com/mingo/tip.git/README

Commit 840a065 in particular. It's on a .31-rc8 base while BFS is 
on a .30 base - will be able to test BFS on a .31 base as well once 
you release it. (but it doesnt matter much to the results - there 
werent any heavy core kernel changes impacting these workloads.)

The system had enough RAM to have the workloads cached, and i 
repeated all tests to make sure it's all representative. 
Nevertheless i'd like to encourage others to repeat these (or 
other) tests - the more testing the better.

I also tried to configure the kernel in a BFS friendly way, i used 
HZ=1000 as recommended, turned off all debug options, etc. The 
kernel config i used can be found here:

  http://redhat.com/~mingo/misc/config

( Let me know if you need any more info about any of the tests i
  conducted. )

Also, i'd like to outline that i agree with the general goals 
described by you in the BFS announcement - small desktop systems 
matter more than large systems. We find it critically important 
that the mainline Linux scheduler performs well on those systems 
too - and if you (or anyone else) can reproduce suboptimal behavior 
please let the scheduler folks know so that we can fix/improve it.

I hope to be able to work with you on this, please dont hesitate 
sending patches if you wish - and we'll also be following BFS for 
good ideas and code to adopt to mainline.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



to post comments

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 3:13 UTC (Mon) by sbergman27 (guest, #10767) [Link] (17 responses)

Could Ingo really not find a more appropriate machine for testing BFS? It would be more interesting, I would think, to see its performance on a netbook doing netbook-like things. I don't think BFS was really intended for kernel compiles on an octo-core.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 3:41 UTC (Mon) by Baylink (guest, #755) [Link] (4 responses)

He said 'hyperthreaded'.

Doesn't that make it a 16-core?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 4:05 UTC (Mon) by Tracey (guest, #30515) [Link]

Baylink asked about 16 core.

We are looking to build a relatively inexpensive machine using a motherboard that can use two quad cores. This is pretty much what Ingo did his tests on(i7/i5s are hyperthreaded).

This machine will be used for compiling and desktop use.

There are wonderful people putting time and energy into the handhelds. Maybe these parties aretalking about fixing two completely different things then?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 12:11 UTC (Mon) by nix (subscriber, #2304) [Link] (2 responses)

Yes, I'd say it does. The shared caches should affect scheduling decisions (but of course in BFS they don't :) ), and other shared resources (that is to say, pretty much all of them) would affect speed directly, but still you have to schedule 16 entities at once. They're just not symmetrical entities anymore. (In fact if it was dual-die you're into NUMA land, which means BFS is bound not to work well on it as it has no NUMA-awareness by design. I suspect my single-die quad Nehalem would work much better with it.)

[updated, quad core results] BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 12:28 UTC (Mon) by mingo (guest, #31122) [Link] (1 responses)

Today i've measured and posted single-socket non-NUMA quad-core results as well:

"[quad core results] BFS vs. mainline scheduler benchmarks and measurements "

As the graphs show it, the quad results are similar to the 8-core results. So it wasnt NUMA or 16 cpus that made the difference.

Btw., you'd be wrong to treat an 8 core box with HyperThreading as a 16 core box. The physical resources are in essence that of an 8 core one - it's just more spreadable.

BFS should have no design disadvantage from HyperThreading, as siblings share the cache.

[updated, quad core results] BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 14:13 UTC (Mon) by nix (subscriber, #2304) [Link]

Interesting.

(With regard to hyperthreading you are of course right that the physical resources are those of the physical cores, but surely unless you are HT-aware you will get lower performance on an HT system than otherwise, because you won't know to e.g. schedule threads of the same process on the same physical core if possible, to maximize cache sharing. But I know you know all this as the current mainline scheduler does it :) )

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 7:03 UTC (Mon) by kragil (guest, #34373) [Link] (11 responses)

Ingo lives in another world. His ridiculously large jpgs killed my Netbook. He is obviously one of those detached from reality kernel devs Con is talking about.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 7:07 UTC (Mon) by quotemstr (subscriber, #45331) [Link] (10 responses)

I hope that's snark.

If not, well, it's ridiculous. Need everyone cater to the needs of your deliberately underpowered machine? Are you really claiming that using a high resolution for a graph intended for the consumption of a relatively small group of technical people constitutes a detachment from the real user base? Ingo can rightfully assume that anyone who reads LKML has the ability to resize a raster image.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 7:35 UTC (Mon) by k8to (guest, #15413) [Link]

"deliberately underpowered"?

...

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 7:58 UTC (Mon) by kragil (guest, #34373) [Link] (8 responses)

Underpowered?

I still think having 1 GB of RAM should be more than enough to look at pictures on the internet :P
But if you have a few apps running and starting to load such a large jpg (really stupid format for graphs btw) it will start to swap and essentially lock for hours for most users ( after a minute of trying I was able to crtl-alt-2 and kill the browser, hail to Fedoras responsiveness!)

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 9:58 UTC (Mon) by jospoortvliet (guest, #33164) [Link] (7 responses)

darn, I hadn't even noticed they were so big... Used konqi, middleclick on
em and in the tab gwenview automatically resized them to fit the window.
Only after reading this thread I had a closer look and saw their size...
Really huge indeed. I guess 3 gb ram saved me :D

fixed those JPGs

Posted Sep 7, 2009 12:42 UTC (Mon) by mingo (guest, #31122) [Link] (6 responses)

Oops - good point. I typoed '600' as '6000' and Gimp was too fast for me to notice.

I fixed all the jpgs - they now have standard size, 50K apiece, 1024 pixels width.

[ One more proof that i'd make a sucky web artist i guess ;-) ]

worse than it should be

Posted Sep 7, 2009 13:42 UTC (Mon) by job (guest, #670) [Link] (5 responses)

Let me explain what Firefox did with the initially large pictures: It allocated memory. That practically kills a normal PC with Linux today. My machine took four whole minutes until I could ctrl-alt-f1 and kill the offending process.

I understand VM pressure is high, but why can't normal apps get at least a small timeslice now and then even in these extreme situations?

It would be a bit discouraging to say the least if this was a desktop users first impression of Linux; that it "hangs" (sort of) if you click on a large picture in your web browser.

worse than it should be

Posted Sep 7, 2009 14:18 UTC (Mon) by mingo (guest, #31122) [Link] (1 responses)

What happens during big VM pressure rarely depends on the process scheduler. If you monitor your system during such situations you'll see there's plenty of idle CPU time - just nobody is able to make progress because everyone will be swapping around small fragments.

[ Or if there's a lot of CPU time used, it's all kswapd's ;-) ]

worse than it should be

Posted Sep 7, 2009 20:40 UTC (Mon) by alankila (guest, #47141) [Link]

compcache seems to really help about with making progress. Even if you are swapping, it's so fast that you can actually use your system: start new terminals, run top, etc. Even if the task is in an allocation frenzy and ends up OOM-killed, it does so with relatively little disk activity.

I really have a love affair with it compcache---to the point that I have given up all other types of swap and am now married to this single solution. It can also help with large images, especially those that are mostly single color. I imagine those pages compress very, very well...

worse than it should be

Posted Sep 7, 2009 15:42 UTC (Mon) by epa (subscriber, #39769) [Link] (2 responses)

Oh, the ironing. Maybe 'open huge image in Firefox' should be added to the set of kernel benchmarks? The measurement would be how long it takes for some other process to allocate and use a mere twenty megabytes while Firefox is thrashing around.

worse than it should be

Posted Sep 8, 2009 8:12 UTC (Tue) by mjthayer (guest, #39183) [Link] (1 responses)

I wonder why other processes have to suffer so much at all for Firefox's memory allocation? It ought to be possible (*) to limit the rate at which a process can cause pages belonging to other processes to be swapped out, and to ensure that the other processes never go below a certain threshold of physical pages (possibly a lower threshold for processes that are rarely used, but even they are probably in memory in the first place for a reason).

(*) Yes, I know someone still has to do it. If no one else does, and I don't get told at once on LWN why this is such a bad idea, perhaps I will have a look at if I ever have a free minute...

worse than it should be

Posted Sep 16, 2009 20:06 UTC (Wed) by oak (guest, #2786) [Link]

Put Firefox to a container group of its own and set a limit on the active
pages it can have?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 3:53 UTC (Mon) by Tracey (guest, #30515) [Link] (42 responses)

Ingo's message seemed very polite considering the history of all of this. Ingo also took time to test and post the outcome of the tests, something the kernel devs seem to demand now-a-days more patches.

I have a lot of respect for both Con and Ingo, and hope that something constructive happens from this.

I test and use kernels that are built using the real time patches from Ingo and Thomas. The primary reason for my interest is to get better multimedia response on desktop machines. This is something that Con seems to be trying to fix.

Ignoring the history of all of this, it seems that all of these people are working towards the same goals. I'm hoping that everyone involved can work together on this and not create the friction that the press likes to feed off so well.

My view is that if the kernel needs more then one scheduler to optimize it for both the desktop and server, then it should be done. I feel that there is a bit too much stubbornness on everyone's part on this.

On the other hand, I've been using the real time patched kernels for years and know that they slowly make it into the mainline kernel; so maybe in the end a separate, low latency desktop scheduler won't be needed.

I just hope that these parties can work together, share ideas and code, and acknowledge what each is contributing.

A part of me feels that in the future will see most if not all of the real time tree will make it into the kernel and the need for a separate desktop scheduler won't be needed. If that happens, I hope that Con's work and ideas that are incorporated into the kernel are acknowledged with all do respect.

I hope that Ingo's offering an olive branch on here.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 4:34 UTC (Mon) by nash (guest, #50334) [Link] (41 responses)

If you follow up Con's response... I don't think we'll see any constructive discussions going on here. Which is a shame.

http://thread.gmane.org/gmane.linux.kernel/886319

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 4:52 UTC (Mon) by flewellyn (subscriber, #5047) [Link] (29 responses)

I begin to understand why Con's prior work was not included in mainline...

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 5:08 UTC (Mon) by tajyrink (subscriber, #2750) [Link] (27 responses)

I think Con's response was precisely reiterating what he was already accusing kernel devs about - everything has to scale. And additionally using things like compiling kernel and piping messages as "benchmarks".

How about perceived user experience blind tests when using Firefox on a netbook? (because it's hard to benchmark responsiveness to user interaction vs. completion time)

I think there would be demand for a new tool that tracks any user interaction with the time it takes to have a proper response. Probably not really generally doable, but a set of benchmarks that can be used to test this would be beneficial.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 5:32 UTC (Mon) by flewellyn (subscriber, #5047) [Link] (26 responses)

All good points, but Con's attitude was terrible. Ingo's really not, I don't think, trying to drive him away, or invade his space, or anything. Just trying to work with him.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 6:31 UTC (Mon) by nash (guest, #50334) [Link] (24 responses)

To be fair on Con, you could argue there was a bit of trolling going on Ingo's side, with the choice of hardware and the like.

However it was a troll with real numbers associated.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 6:38 UTC (Mon) by flewellyn (subscriber, #5047) [Link] (23 responses)

A dual quad-core system with hyperthreading? That's hardly a bad choice of hardware for the "desktop system" scale. That's the standard CPU setup for a Mac Pro, for instance.

Granted, it's the upper end of the "desktop system" scale, but he did say as much.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 7:51 UTC (Mon) by drag (guest, #31333) [Link] (20 responses)

There is not one single person I know that owns a dual-socket desktop.

It's just not a desktop machine. There is a very good reason for this... it dramatically increases the cost of the board and doubling the cost of cpu for pretty much no good reason. It's not going to benefit you in any way for surfing the internet or playing games or even processing media.

The only people who would benefit from a system like that is for compiling software, long render batch jobs, and the like. That is just not a typical desktop workload.

The mainstream desktop system is very obvious to me.

Core2Duo Intel laptop, Dual core AMD desktop, single core Atom processor. Those are the cpus that your going to see on a typical Linux system.

I know lots of people that P4 machines, a few people still using P3 laptops, a bunch of Core2Duo laptops, and a bunch of people owning netbooks for various reasons (high mobility, secondary computer, regular laptops are too expensive, etc).

Dual-socket Quad-core systems? That's just not the target audiance for the most part.

----

That being said I don't think it would make a big deal. Ingor's testing is probably going to reflect accurate performance for machines less powerful then that one. But I can't be sure about that. It would of really had more impact if the tests were carried out on a dual core machine.

That and the point of the BFS is to make things more friendly and more interactive. That is hard to benchmark and having something that very responsive to user input would probably be slightly less efficient overall even though users would actually prefer it.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 7:58 UTC (Mon) by dlang (guest, #313) [Link] (15 responses)

today Intel is selling single socket systems with 6 real cores + hyperthreading (simulating 12 cores)

they have in their roadmap to be selling single socket systems with 8 real cores in less than a year.

so what today is a two socket 'business only' system is next summer's (or next christmas') power user system

just like about a year ago the only people with 8 cores were the high-end 4 socket systems, and the only people with 4 cores were dual socket server systems.

nowdays it's common for single socket systems to have 4 cores (+ hyperthreading)

Yes he did pick a system at the high end of that BFS claims to support (and I would like to see how it fares with 4 or so cores in use), but at the same time, the benchmark numbers weren't a matter of a couple percentage points of difference, on one benchmark the time went from < 4 seconds to > 40 seconds, 10x worse.

that doesn't mean that BFS is junk, just that it's not finished, but utterly dismissing (and ignoring) the results is not a good start for discussions.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 8:15 UTC (Mon) by kragil (guest, #34373) [Link] (10 responses)

Most of the machines in the real world are single cores.

Most of the machines sold today are dual cores ( real or only with HT like Atom ).

Most people still don't buy big desktops with quad-cores, they buy cheap laptops/netbooks.

It will take a long time before most computers sold will have more than 16 cores as the computers that were sold the last 4 years are perfectly capable of doing everything a non-gamer/non-kernel dev needs.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 8:44 UTC (Mon) by dlang (guest, #313) [Link] (1 responses)

actually, if you want to start playing the 'most of the computers are.. ' games

most of the computers in use in the world are 8 bit cpu's

most of the new computers sold each year are _still_ 8 bit cpu's (by a smaller margin than in prior years, true, but stil the winner)

so by that argument, both linux and windows are completely irrelevant since neither of them will run on themajority of computers around or being sold.

what Con should have done was to respond that 16 (simulated) cores is too many for the current stage of BFS code, and told Ingo that with X cores it is still solidly in it's sweet spot. Ingo could then go back and run the tests again to see what results he gets.

if with 4 cores his benchmarks still show the machine completely locking up, Con would then need to look at BFS to see why it's so bad for some workloads (which is exactly what he lambastes the kernel scheduler for)

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 9:43 UTC (Mon) by stijn (subscriber, #570) [Link]

Clearly the game is "most of the desktop computers are …". This makes the first half of your response rather moot (and detracts from the rest). Admittedly my own (this) response has little to offer except nitpicking, but I care about the particular nit where no effort is made to understand someones position. It accounts for about 99.9% of flame wars.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 10:02 UTC (Mon) by xav (guest, #18536) [Link] (7 responses)

Most computers will shortly be smartphones running some kind of linux kernel.
And they'll be very picky about reactivity.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 10:24 UTC (Mon) by kragil (guest, #34373) [Link] (6 responses)

That is totally OK _with me_.

I think Linux has a lot cruft that is only useful on supercomputers/monster X-cores and in future some kernel devs want to see.

Optimising for smartphones/smartbooks/MIDs/netbooks is really needed and the benchmarks should be very very different like response time to clicks under load or frame drops while playing video etc... at the moment the Linux desktop freezes and skips way too often.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 12:53 UTC (Mon) by mingo (guest, #31122) [Link] (5 responses)

at the moment the Linux desktop freezes and skips way too often.

We take such problems seriously - please post to lkml about this, with the scheduler maintainers (Peter Zijstra and me) Cc:-ed.

We have many good tools that can get to the bottom to such skipping, if there are people willing to report problems and willing to trace latencies and test patches.

Both Peter Zijstra and me have and test on low-spec systems as well. I've got a 833 MHz Pentium-3 laptop that i (auto-)reboot new kernels into about 10 times every day with new -tip kernels. Peter has a 1.2 GHz Pentium-mobile laptop for interactivity testing. My daily desktop is a dual-core box - not some big honking server machine.

But ... we can only fix the scheduler if you help out too and report your interactivity problems on lkml.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 13:20 UTC (Mon) by k3ninho (subscriber, #50375) [Link] (3 responses)

Are you in the process of testing BFS on your 'low end' PIII laptop?

How many people report bugs of stuttering, lockups and hangs anyway? I'd forgive you for thinking it's not a problem because the CFS and Deadline schedulers have been good for me and my home-use workload.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 14:36 UTC (Mon) by mingo (guest, #31122) [Link] (2 responses)

Are you in the process of testing BFS on your 'low end' PIII laptop?

Not likely - it took 8+ hours to do the quad core tests and a single kernel build iteration takes 1-2 hours on this box.

But that box is perfect for audio skipping problems. Right now it can play an mp3 stutter-free while a make -j3 job is running on it. That's roughly in line with what i'd expect from that box.

How many people report bugs of stuttering, lockups and hangs anyway? I'd forgive you for thinking it's not a problem because the CFS and Deadline schedulers have been good for me and my home-use workload.

We have on the order of one such bugreport per kernel cycle (3 months). They generally get fixed if they are reported and if the reporter reacts to feedback and further testing requests.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 18:57 UTC (Mon) by drag (guest, #31333) [Link]

Be sure to throw PulseAudio in there. :)

It'll end up being required desktop component since it's the only system developed so far that can handle hotplugging audio devices _and_ network audio in effective manner. This means on the fly audio configuration changes, which means that USB headsets for VoIP and gaming, bluetooth audio devices, and usb docking stations (etc etc) which are now increasingly common cannot be handled in a sane manner without PA's ability to do on the fly reconfiguration.

Then you'll need to do some graphical benchmarks. Maybe some of those things from Mesa or whatever. Their little things. Just stuff that runs for a few seconds at a time. Those phoronix folks have their benchmark suite and maybe that would be usefull for you guys.

The point for interactivity, as I see it, is adapting to changing workloads. Playing a mp3 + doing a kernel compile is fairly static and the system has time to adapt to it, and whatnot. The system should have a "peaky" workload with occasional high loads and whatnot.

Not that I experienced many problems with the modern kernel compiled with preemption enabled. At least nothing that stands out in my mind right now.

Measuring on down-to-earth hardware

Posted Sep 7, 2009 22:30 UTC (Mon) by man_ls (guest, #15091) [Link]

Not likely - it took 8+ hours to do the quad core tests and a single kernel build iteration takes 1-2 hours on this box.
Pity. It would be interesting to run your benchmarks on a PIII, even if it takes 5 days; or tune them to last less. Just about any current netbook would do too. Any takers?

As a socratic exercise: just what would it prove if BFS performed better than CFS? And then, what would we learn if the reverse happened and CFS bested BFS?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 9, 2009 14:33 UTC (Wed) by k8to (guest, #15413) [Link]

I think the idea of 'normal users' going to LKML with their problems is unworkable. However, I am willing to give it a try with my next interactivity stall. I expect to give up rapidly if faced with derision or brush-off.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 8:43 UTC (Mon) by iive (guest, #59638) [Link] (3 responses)

I'm not CPU expert or kernel expert, so feel free to correct me.

However I do have the feeling that hyperthreading is the reason of these suboptimal benchmarks. The BFS scheduler could have been made with the assumption that each core runs at same speed, so it would finish X work for Y time on any core. In hyperthreading this is not true, as both threads share same core. In general the CPUs have more computational units than could be used in any given moment. So the second h-thread is "lurking" behind and reusing the free units when first h-thread could not utilize them. This is why HT on P4 gave only 30% boost in best case.

This could also explain why only some people with Intel CPU notice issues, while others don't.

I also wonder how many of the stock CFS heuristics are tuned for HT scheduling and how many special cases are there.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 18:13 UTC (Tue) by jzbiciak (guest, #5246) [Link] (2 responses)

I wonder if it might be a different effect. My dual dual-core Opteron box (4 CPUs across 2 chips) dynamically scales the frequency of the CPUs based on load.

What I don't know is the cost of doing so. That is, when it switches from 1GHz to 2.4GHZ, yes, it got faster, but was there, say, a 1ms hitch between the two? Did that hitch affect both cores on that die or just one? If there was a cache-to-cache coherence transfer at the time, did it also experience that hitch?

These details could vary by processor platform, vendor and maybe even chipset and BIOS if the switch is effected via SMM or the like. A sloppier CPU scheduler that kept all the CPUs in the high-frequency state (or low frequency state) would eliminate these sorts of hitches, whereas one that kept the load more concentrated might experience more such hitches when the occasional background load spills onto the CPU that was left sleeping.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 9, 2009 10:08 UTC (Wed) by etienne_lorrain@yahoo.fr (guest, #38022) [Link]

I also have some strange behaviour on a no-name dual core all intel portable PC, kind of 2-4 seconds where mouse is not even moving, without any load whatsoever, no log in /var/log/messages, completely random.
This portable PC is cheap and "designed for the other OS" system even if it was sold without anything installed: the DMI information is blank, the ACPI information does not seem to be better.
I tend to think that it is a SMM problem, instead of a scheduler problem, the crappy BIOS (cannot update because no DMI name) does not like Linux, or was explicitely designed to give a bad experience. I would really like to be wrong here.
There was a time when Linux did not rely on any BIOS, but it is no more (SMM cannot be disabled, even under Linux - what is what is handling the forced power off by pressing On/Off button for more than 3 seconds).

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 10, 2009 22:23 UTC (Thu) by efexis (guest, #26355) [Link]

This I believe is something that was more of an issue than it is now, so CPU's can ramp up their speed much quicker than they could've done before. One problem was for example that higher CPU speeds requires higher volts which can cause delays with the CPU stalling while the voltage steps up. Now instead the voltage will be pushed up a split moment before the frequency is ramped up, so there's no stall. Otherwise, it's all down to the CPU, with different models taking different amounts of time to change frequency, it can make sense to jump to the highest frequency when the usage goes up and then slow it down if needed (such as the ondemand governor does) or scale it up step by step. You want to try set a lower watermark where responsiveness is important, so CPU's always running at say twice the speed that you need it, so you always have room to move into while you wait for the cpu to speed up (eg, when load goes from 50% to 80%, the CPU speeds up to bring the load back down to 50%. Only if loads reaches 100% have you not sped up quickly enough). Of course if you wish to conserve more power, you run the CPU at speeds closer to load. In Linux, there're many tuneables for you to play with to get the responses you wish (/sys/devices/system/cpu/cpu?/cpufreq/<governor>). To see what's available on the Windows platform, there's a free download you can find by googling rmclock that proper spoils you for configuration options. There's no one rule that has to fit all, during boot up the kernel will test transition speeds and set defaults accordingly.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 8:14 UTC (Mon) by ketilmalde (guest, #18719) [Link] (1 responses)

> There is not one single person I know that owns a dual-socket desktop.

I have an old dual Pentium-II 450MHz. It's not actually in use anymore, though, so it probably doesn't count.

> That being said I don't think it would make a big deal.

I think it might - things like processor affinity is likely to matter a great deal more on multiple socket systems than on just multicore systems. Multicore chips typically come with a large, shared cache, so moving threads across cores isn't as costly as moving them across sockets.

From what I read, BFS doesn't even try to be NUMA-aware, it doesn't seem unreasonable that it would perform quite differently on single and multi-socket systems.

-k

BFS vs. mainline scheduler benchmarks and measurements

Posted Jun 8, 2010 13:22 UTC (Tue) by vonbrand (subscriber, #4458) [Link]

Way back when I confiscated a dual Pentium Pro (200MHz) to use as a desktop machine for use in a class I was teaching... the machine was old already (I actually canibalized two of them to get a working one).

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 13:34 UTC (Mon) by da4089 (subscriber, #1195) [Link] (1 responses)

At my office, everyone has an 8-core desktop.
At home, people tend to have single-socket, quad-core desktops.
Laptops are mostly dual-core, although the last two guys who bought one got quad-core, 17" monsters.

So, I think Ingo was reasonable in his choice of platform.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 0:18 UTC (Tue) by awalton (guest, #57713) [Link]

> At home, people tend to have single-socket, quad-core desktops.

I want to live at your home. We've bought 4 new home PCs in the past two years, including one just a month ago. They're all dual cores. Even my brand new laptop is dual core.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 10:30 UTC (Mon) by nye (subscriber, #51576) [Link] (1 responses)

>A dual quad-core system with hyperthreading? That's hardly a bad choice of hardware for the "desktop system" scale. That's the standard CPU setup for a Mac Pro, for instance.

I hope this doesn't sound trollish, but if you think that's even remotely realistic for even one percent of the PC user base, then you are living in a fantasy realm - or perhaps five years in the future.

I've never even *seen* a computer that powerful. A machine like that would cost *thousands* - nobody spends more than £500 on a computer unless they are a serious enthusiast who happens to be rolling in money - the average user, if they think about it at all, is around now thinking that it might be time to get one of those newfangled dual-core machines.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 10:34 UTC (Mon) by nye (subscriber, #51576) [Link]

Okay I realise that that probably did sound trollish, and it's been better covered upthread. My apologies.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 12:16 UTC (Mon) by fb (guest, #53265) [Link]

Yes, Con's response was outright rude. OTOH he had made clear that he wasn't interested in that kind of discussion.

However Ingo was obviously down the route of "lies & benchmarks". The point of the scheduler is low end machines, and responsiveness. Ingo posts a benchmark with ridiculously high-end machine, and measuring performance.

I just wish that Con had had the cool head to politely point that if you ask the wrong question, you get the wrong answer.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 8:09 UTC (Mon) by sitaram (guest, #5959) [Link]

Maybe, but a couple of para's into Ingo's email I did wonder how relevant the testbed/tests were to my normal workload.

Con's email merely confirmed my suspicions.

Too bad... I might now have to take off my "user" hat (distro supplied stuff only, nothing gets compiled locally, etc) and actually try BFS to see for myself.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 5:00 UTC (Mon) by MattPerry (guest, #46341) [Link] (10 responses)

I thought that Con's response was completely appropriate. The test machine that Ingo used, and the tests he performed, were not what BFS was designed for. Ingo is either being disingenuous or just didn't bother to read the FAQ. If Ingo wants to conduct a useful test, he should try the scheduler on a single processor, dual-core machine performing tasks that normal, non-programmer computer users would perform (music listening, web browsing, file copies, word processing, and so forth).

I understand Con's response

Posted Sep 7, 2009 5:46 UTC (Mon) by aorth (subscriber, #55260) [Link] (5 responses)

It's important to read the Gmane thread linked a few comments up. While I'm excited for Linux in the server/high performance space, my Thinkpad only has one core. I've seen a marked increase in responsiveness with Con's BFS. They are things which can't be benchmarked, but make all the difference (like the time for my gmrun box to pop up when I hit Alt-F2 in Fluxbox) when using Linux on a desktop.

"Can't be benchmarked" – No.

Posted Sep 7, 2009 6:14 UTC (Mon) by quotemstr (subscriber, #45331) [Link] (4 responses)

The explanation is less technical and more psychological. What you're seeing is observer bias. See that poster on LKML who claimed that he was seeing improvements in sub-seconds lag in a 3D FPS (which is probably spinning at 100% CPU anyway)? That's precisely the kind of environment most susceptible to observer bias: a supposed small effect in a noisy signal like game latency.

I'll believe there's something to this "can't be benchmarked" nonsense when I see a double-blind experiment run that shows a statistically significant effect. As the old saying goes, "data is not the plural of anecdote".

"Can't be benchmarked" – No.

Posted Sep 7, 2009 6:46 UTC (Mon) by flewellyn (subscriber, #5047) [Link]

Perhaps there is a way to benchmark such things: make a test 3D program which plays multimedia onto a rotating 3D cube, and outputs the frame rate and other latency data on the screen. Run this, then start up some other things that contend for the scheduler, like some I/O (copying a large file?), some network traffic (pinging a host in the LAN?), and such. See how the 3D app holds up under such strain, by watching the numbers.

I don't know how well this would work, but it'd be a test of some kind.

"Can't be benchmarked" – No.

Posted Sep 7, 2009 13:09 UTC (Mon) by cesarb (subscriber, #6266) [Link] (1 responses)

The same poster mentioned frame drops in mplayer. That would be somewhat easy to convert into a benchmark (if mplayer does not output to the console the number of frames dropped, edit its source code to make it do so; then write an app to move the mplayer window around the screen pseudo-randomly and drop it back to the desktop, and see how many frames you can make it drop).

All the other examples mentioned by that poster sound like they could be benchmarkable with some coding effort. For instance, in the Doom 3 example, you would not measure the frame rate, but the frame jitter (record the time of the end of the "flush" call which actually pushes the image to the screen for each frame, subtract from the time for the previous frame, and see which is the highest difference and how uniform the differences are). Even if for some reason you cannot change the source code of your game, you can change the source code of the libraries it calls to do the "flush", or even interpose with LD_PRELOAD or something like it.

You could even measure the "input lag" in his sound example by building a hardware contraption which "presses a key" (by pretending to be a keyboard), listens to the analog audio output, and logs the time difference between the input and the output.

This all seems benchmarkable without the need for a double-blind test.

"Can't be benchmarked" – No.

Posted Sep 7, 2009 17:12 UTC (Mon) by cesarb (subscriber, #6266) [Link]

This is what I meant by interposing with LD_PRELOAD:

http://github.com/cesarb/glxswapbuffersmeasure/tree/master

This is a small quick-and-dirty library I just wrote which hooks into glXSwapBuffers via LD_PRELOAD and prints some statistics to stderr on exit.

An example of its output with everyone's favorite "benchmark" tool, glxgears, on an outdated distribution (thus an older kernel):

LD_PRELOAD=./glxswapbuffersmeasure.so glxgears
1142 frames in 5.0 seconds = 228.375 FPS
1035 frames in 5.0 seconds = 206.474 FPS
934 frames in 5.0 seconds = 186.540 FPS
glXSwapBuffers count: 3947, avg: 0.004757, variance: 0.000045, std dev: 0.006699, max: 0.204504

I did some moving of windows around to make it stutter a bit more, and the output from my test library shows it (200ms max latency, which corresponds to around 5 FPS). Note that the average time between glXSwapBuffers calls approximately matches glxgear's FPS printout.

It should be quite simple for someone who sees latency problems which seem to be cured by BFS to try to run the same 3D game with something like this library both in the mailine scheduler and in BFS and see if it shows any differences in the output. Of course, the code I posted can be enhanced to get better statistics (like a histogram of the latencies); I put the code under Creative Commons CC0 (a bit similar to "public domain").

"Can't be benchmarked" – No.

Posted Sep 7, 2009 13:49 UTC (Mon) by job (guest, #670) [Link]

I disagree. You could, for example, easily count the number of buffer underruns with pulseaudio when playing an mp3 at the same time as you compile the kernel. Then you could do the same thing with a movie.

These are the kinds of things normal users do when latency really counts. The problem is not measuring it, the problem is that nobody is really interested.

Morton's Fork

Posted Sep 7, 2009 6:35 UTC (Mon) by quotemstr (subscriber, #45331) [Link] (2 responses)

First of all, the burden of proof is on BFS advocates to provide a better test. Ingo's test was well-described and performed under reasonable conditions. Kolivas provided no comparably rigorous numbers. Your suggestion, to test what users actually use, puts kernel developers in an unreasonable dilemma. One the one hand, kernel developers can test the tasks that "users would perform", but because numeric results of these tests are not easily measured, they are meaningless without an expensive, inconvenient double-blind satisfaction study. (And really, the onus is on BFS advocates to provide one if that's what it takes.)

On the other hand, kernel developers can use contrived tests like the pipe example that are easily quantified, but that only approximate user workloads. These tests can be improved, but one will always be able to claim that they don't measure what users "really" do. Either way, the claim that BFS is superior will have been made unfalsifiable and unscientific.

Morton's Fork

Posted Sep 7, 2009 12:57 UTC (Mon) by Lennie (subscriber, #49641) [Link] (1 responses)

Let's start with 'frames skipped' in mplayer or vlc or something.

Morton's Fork

Posted Sep 11, 2009 1:51 UTC (Fri) by Spudd86 (guest, #51683) [Link]

Ingo mention that he does test exactly this on low end machines further up

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 6:53 UTC (Mon) by bvdm (guest, #42755) [Link]

Did Ingo claim that he was testing BFS against the mainline scheduler for BFS's intended use cases? No.

"I'd guess that a machine with more than 16 CPUS would start to have less performance."

Even if you consider hyper-threading to result in 16 cores (which is not the case at all), in his FAQ Con claims that BFS should perform well up to 16 cores.

I realize how people can be very passionate about Linux and how the scheduler as something that critically affects user experience becomes important, but is all this emotion really necessary?

Well, this settles it for me

Posted Sep 7, 2009 6:47 UTC (Mon) by bvdm (guest, #42755) [Link] (28 responses)

Was it unreasonable for Ingo to respond? I think not. Con's announcement was widely reported and raised many questions. If something is said on the public square, surely anyone with an interest should be welcome to respond?

Did Ingo respond in an unreasonable way? No, he his email was nothing but courteous, though written by someone who is evidently confident of his case.

Are Ingo's benchmarks unreasonable? No, only a fool would consider well-chosen benchmarks as completely worthless. Ingo did not attack BFS's use cases, only made a case for the mainline ones.

The bottom line is that Ingo made it clear that his concern is the mainline scheduler. He could have picked arbitrary benchmarks and run them on a netbook if he wanted to embarrass Con.

If you can't stand the heat, why go back into the kitchen?

Well, this settles it for me

Posted Sep 7, 2009 7:22 UTC (Mon) by yoshi314 (guest, #36190) [Link] (22 responses)

i've been getting the impression recently that core kernel devs are totally
disconnected from the desktop world. ingo's choice of hardware is not too
odd, but still unrealistic for most desktop users (or at least where i
live).

benchmarks he did are relevant to more high-end machines.

con has a point, both about the choice of hardware and selection of
benchmarks. he thinks more like a desktop user.

but his attitude is awful. this might serve as a starting point for re-
enactment of scheduler flame-wars again.

Well, this settles it for me

Posted Sep 7, 2009 7:32 UTC (Mon) by bvdm (guest, #42755) [Link] (11 responses)

Firstly, Linux has a very small desktop presence, so not entirely optimizing for the desktop is a rational design decision, though "totally disconnected" is very hard to swallow.

Secondly, one hardly wants to re-implement something like a scheduler every year. Designing for the near and medium future creates stability. Anyway, you would be hard pressed to find a desktop machine without at least 2 cores these days.

As for netbooks, if interactivity (people keep posting about gaming FPS and high-def audio and high-res desktop experience) is such a concern, why are you using a netbook? A typical netbook has far fewer processes running.

I don't think there ever was a "war". It is a shame that so much unproductive drama was generated by someone who is evidently skillful at performing for the peanut gallery.

Well, this settles it for me

Posted Sep 7, 2009 7:44 UTC (Mon) by k8to (guest, #15413) [Link] (3 responses)

In conversations among engineers throughout silicon valley who have reason to push their code into or hack on Linux, the view of the LKML as unworkably hostile, short sighted, and unwilling to accept external ideas is now the norm. This is essentially a complete reversal from 10 years ago when people viewed it as relatively open and inviting.

Yes, this is subjective and I'm sure it has noise. I think it's also the truth.

Well, this settles it for me

Posted Sep 7, 2009 7:45 UTC (Mon) by k8to (guest, #15413) [Link]

Ie. I think this is the "totally disconnected" angle.

Well, this settles it for me

Posted Sep 7, 2009 8:03 UTC (Mon) by bvdm (guest, #42755) [Link] (1 responses)

You are a fortunate man to have the time and opportunity to traverse Silicon Valley so thoroughly :-p

But seriously, the only objective measures we have are the number of contributors and SLOC added, and both of these are still accelerating.

Now I would be astounded if the Linux kernel were the only technical project in the world without non-technical problems, but don't you think there are many other explanations for this change in perception? Such as, perhaps:

- That having your code included in the kernel has a much increased monetary benefit and is therefore more sought after
- That the existing kernel developers have increased in their experience and skill and that standards for acceptance are therefore higher today
- That the stature of being a core kernel developer has risen and that ego may be involved
- That many parts of the kernel is near-optimal or at least very mature and that it is sensible to value stability in those areas

And are the driver staging tree and desktop and security advances such as KMS and SMACK not countersigns to what you are suggesting?

Well, this settles it for me

Posted Sep 7, 2009 18:24 UTC (Mon) by k8to (guest, #15413) [Link]

It's not the money.

It's likely to be driven by standards, but the contention is that these standards are often more arbitrary than useful.

Ego on the part of the maintainers is certainly involved. Among my contacts, ego on the part of the author has certain not *risen* in the interim, although it may be high (I doubt it).

Stability has certainly become more prized.

Well, this settles it for me

Posted Sep 7, 2009 16:06 UTC (Mon) by einstein (guest, #2052) [Link] (1 responses)

> Firstly, Linux has a very small desktop presence,

I don't think that we desktop linux users are entirely happy with smug little comments like that. For us, linux is the only desktop presence and it looms large.

> so not entirely optimizing for the desktop is a rational design decision, though "totally disconnected" is very hard to swallow.

I think Con may have a valid point in questioning the one-size fits all paradigm. While it's an admirable goal to create a single kernel which runs optimally on everything from PDAs to supercomputing clusters, there may be too much of a divergence in performance profiles for that to be entirely practical.

As Linus has said, a desktop linux presence is vital to its viability, so optimizing desktop interactivity ought to be a very high priority.

Well, this settles it for me

Posted Sep 7, 2009 22:24 UTC (Mon) by mingo (guest, #31122) [Link]

As Linus has said, a desktop linux presence is vital to its viability, so optimizing desktop interactivity ought to be a very high priority.

It is. See for example this recent discussion on lkml. That discussion and those (non-trivial) patches were all about desktop latencies - and it's all part of v2.6.30 now.

Because I can

Posted Sep 7, 2009 22:20 UTC (Mon) by man_ls (guest, #15091) [Link] (4 responses)

As for netbooks, if interactivity (people keep posting about gaming FPS and high-def audio and high-res desktop experience) is such a concern, why are you using a netbook?
Because they are light and cute? You have not understood what "interactivity" means. People do not post about FPS or high-def audio per se, but about jitter, frame drops and audio skips. Those are really nasty when watching a movie or listening to music, and computers are said to be multitasking these days.

Because I can

Posted Sep 9, 2009 7:57 UTC (Wed) by gmaxwell (guest, #30048) [Link] (3 responses)

Jitter, frame drops, and audio skips are all *easily measurable*. Yet *none* of the advocacy of BFS that I've seen includes any measure of these things. Only vague hand-waving about smoothness. Perhaps these people should color the edges of there disks with green markers... I hear it reduces jitter.

Meanwhile I do audio processing with a ~2ms processing interval using the mainline scheduler, thrashing the system, high loads... and underruns are basically unheard of at least after tossing the drivers and hardware that I determined were misbehaving (with measurements, ... imagine that!)

I don't doubt that there are genuine areas for improvement, even in the scheduler but it isn't going to get better without real measurements and some social skills superior to those of Hans Reiser.

Interactive benchmarks

Posted Sep 9, 2009 20:14 UTC (Wed) by man_ls (guest, #15091) [Link] (2 responses)

You are right, there are no benchmarks that show that BFS is good at interactivity. However I contend that such "hand-waving" is to be expected from an anaesthetist and a crowd of enthusiasts (and is not a bad thing at all). The real pity is that on lkml, a list full of high-flying engineers, nobody has been able to construct those benchmarks or do those measurements either. The best we have is a scheduler hacker posting odd benchmarks on esoteric hardware. No offense for Ingo, he was very respectful and had interesting data, but it was all biased:
we tune the Linux scheduler for desktop and small-server workloads mostly [...] what i consider a sane range of systems to tune for - and should still fit into BFS's design bracket as well according to your description: it's a dual quad core system with hyperthreading
And then repeating the measures on a quad-core machine, the best he has offered so far. It seems that, despite having an expressed focus on the desktop, a netbook and a few days for testing on it are out of reach.

As to the benchmarks, the first test was how fast can he build the kernel using n processes. Well, this is only measuring thoughput; if each process is supposed to be interactive, it is not unreasonable to expect that they will more easily interrupted and thus the build will last longer. Then a very artificial pipe-messaging test, followed by similarly contrived benchmarks -- which CFS has already been tuned to. So the "other side" (lkml) has not been able to produce anything better either to show that CFS is good at interactivity, measuring skips and jitter, and I find this to be even more pitiful.

Interactive benchmarks

Posted Sep 9, 2009 23:40 UTC (Wed) by njs (subscriber, #40338) [Link] (1 responses)

> As to the benchmarks, the first test was how fast can he build the kernel using n processes.

To be fair, that benchmark is originally Con's, not Ingo's (Con's original announcement claims that "make -j4 on a quad core machine with BFS is faster than *any* choice of job numbers on CFS").

Interactive benchmarks

Posted Sep 10, 2009 9:52 UTC (Thu) by man_ls (guest, #15091) [Link]

More to the point: even when one side proposed invalid benchmarks, the other side was not able to come up with anything better. (And no, "beat them at their own benchmarks" is not a valid excuse; we are talking about engineering, not about marketing.)

Well, this settles it for me

Posted Sep 7, 2009 7:46 UTC (Mon) by andreashappe (subscriber, #4810) [Link] (9 responses)

Hi,

> i've been getting the impression recently that core kernel devs are totally
> disconnected from the desktop world. ingo's choice of hardware is not too
> odd, but still unrealistic for most desktop users (or at least where i
> live).

I've bought a new desktop rig three months ago and paid not unreasonable 1100 euro for a quad core (+hyper-threading) i7 processor backed up by 6gb ram.

I do not believe that Linux should target < 1000 Euro machines (at least not for mainline development). If there's use for another scheduler Con can keep it out-of-tree (as he seems to intend to). When distributions pick it up it might even get into mainline. But his childish behaviour after Ingo benchmarked his patch (with a workload that was well within the Con's use case description) does not bode well. Not well at all.

cheers, Andreas

Well, this settles it for me

Posted Sep 7, 2009 8:03 UTC (Mon) by Cato (guest, #7643) [Link] (4 responses)

So the whole focus on netbooks is a waste of time, then? The majority of laptops and desktops these days cost less than 1000 Euros/USD - in fact when building a new dual-core desktop system for casual web surfing I found it hard to spend more than 400 euros, and the resulting system is far faster than really needed. And then there's the whole embedded space of course, and all the people introduced to Linux by putting it on PCs that are too old to run a recent Windows version well, or by turning an old PC into a small server.

Well, this settles it for me

Posted Sep 7, 2009 8:11 UTC (Mon) by andreashappe (subscriber, #4810) [Link] (3 responses)

> So the whole focus on netbooks is a waste of time, then?

I was talking about _mainline_. Pray read the rest of my post (where I mentined out-of-tree patches). And AFAIK embedded systems often have out-of-tree patchsets for their architectures.

> The majority of laptops and desktops these days cost less than 1000 Euros/USD

If some scheduler thing would be added to the kernel this would take 2-3 release cycles (at least).. by which time multi-core systems are even more common than today.

cheers, Andreas

Well, this settles it for me

Posted Sep 7, 2009 15:45 UTC (Mon) by broonie (subscriber, #7078) [Link] (2 responses)

Embedded systems are using fewer and fewer non-mainline patches - essentially all the CPU vendors who don't have good mainline support are experiencing substantial pressure to sort that situation out sooner rather than later.

Well, this settles it for me

Posted Sep 7, 2009 16:01 UTC (Mon) by andreashappe (subscriber, #4810) [Link] (1 responses)

wouldn't the situation be the same with an out-of-tree scheduler? If it would reap benefits then pressure for inclusion would build up.

cheers, Andreas

Well, this settles it for me

Posted Sep 7, 2009 16:25 UTC (Mon) by broonie (subscriber, #7078) [Link]

Yes, though Con's disinterest in that might be an issue.

Well, this settles it for me

Posted Sep 7, 2009 8:30 UTC (Mon) by sitaram (guest, #5959) [Link]

You must be channeling Marie Antoinette... :-)

You will not believe the number of people in India who still use P4s (and God even P3s sometimes). Far more than the Core2Duo kind, I rather suspect. Maybe not in new purchases but in total numbers. We don't throw away stuff so fast anyway.

After reading your email I'm even more convinced that Ingo did not understand what Con was trying to say (*)

Sitaram

(*) ...or he did but didn't want to risk saying the sort of stuff you said ;-)

Well, this settles it for me

Posted Sep 7, 2009 9:13 UTC (Mon) by endecotp (guest, #36428) [Link] (2 responses)

> I do not believe that Linux should target < 1000 Euro machines

Maybe you're living on a different planet. The only time I've ever spent anything like that much was my first PC back in 1994 - a 66MHz 486.

Well, this settles it for me

Posted Sep 7, 2009 12:46 UTC (Mon) by pboddie (guest, #50784) [Link]

Maybe you're living on a different planet. The only time I've ever spent anything like that much was my first PC back in 1994 - a 66MHz 486.
Indeed. Although there can be good reasons for paying €1000 (or £1000) for a system, it's been a long time since anyone really had to. It reminds me of the "Killer PCs for £1500" idiocy the UK computing press used to run on the cover of their magazines every month back in the early-to-mid 1990s, and even at that time such dull retail summaries served the advertisers far more than they did the actual readership.

Well, this settles it for me

Posted Sep 7, 2009 16:07 UTC (Mon) by andreashappe (subscriber, #4810) [Link]

> Maybe you're living on a different planet.

Could be, I'm using it for coding and running statistics stuff mostly (while doing 'normal' video/music listening).

But that thing did cost me around 1000 euro four months ago and would be under that by now.. and will be fairly standard *before* a new scheduler would be added to mainline.

People that experiencing performance or latency problems on existing hardware might be better of if they would just *report* their problems to the lkml. Ingo is quite responsive to feedback.

(embedded usage differs.. but that is something that the market (tm) should be perfectly able to decide).

Well, this settles it for me

Posted Sep 7, 2009 11:24 UTC (Mon) by rsidd (subscriber, #2582) [Link] (4 responses)

If you can't stand the heat, why go back into the kitchen?

Con did not go back into the kitchen. He was explicitly avoiding LKML. Ingo tried to pull him in. And posting graphs as 6001x4201 JPG files shows extraordinary cluelessness. Every graphing program I've seen supports vector formats like EPS or PDF, and if he must use JPG, he can at least choose a size that fits on screen -- or does he use a 6000x4200 resolution monitor?

I'm running a Core 2 duo laptop with 4 GB RAM, and most of the time I don't suffer interactivity issues. But on lesser machines it is a big problem. If Ingo doesn't use such machines, he should be quiet. Con's problem was not "performance", it was interactivity, and Ingo's benchmarks are basically beside the point.

Jens Axboe posted other benchmarks that sound more reasonable as measures of interactivity (which is Con's concern, not "performance"), and he is not happy with CFS, but he was not able to boot the BFS kernel.

Well, this settles it for me

Posted Sep 7, 2009 11:38 UTC (Mon) by bvdm (guest, #42755) [Link] (1 responses)

Con made a very public re-entry and raised many questions. Ignor had every right to respond. And he did so in a calm and admirable manner.

Your comments about the image size ares just ad hominem which I will ignore.

Have you read Ignor's email carefully or at all? He is clearly making the case that, whatever BFS's advantages on lower end machines may be (which he chose not to contest), CFS is still better suited for the mainline.

No-one is arguing that CFS is perfect, but I have a grave concern that Con is *again* pissing in the drinking well with his style of doing things.

Well, this settles it for me

Posted Sep 7, 2009 22:35 UTC (Mon) by man_ls (guest, #15091) [Link]

Ignor had every right to respond. [...] Have you read Ignor's email carefully or at all?
No, it's pronounced "Aye gnor". (Sorry, couldn't resist after the second mention.)

Well, this settles it for me

Posted Sep 7, 2009 13:05 UTC (Mon) by aigarius (subscriber, #7329) [Link]

The images are 1024px wide now.

Well, this settles it for me

Posted Sep 9, 2009 11:27 UTC (Wed) by liljencrantz (guest, #28458) [Link]

Ingo has said that the graph size was a user error, apologized and replaced them. Calling him «extraordinary clueless» without knowing the facts is hostile and unmotivated. Mistakes happen.

I agree that Ingos choice of test machine and benchmarks is telling when it comes to what his priorities are - he gets paid to create software that runs well on big systems, 8 CPUs probably looks small to him. No malice or stupidity involved, just a different perspective.

I think the ball is firmly in the BFS camps court. Con won't and shouldn't deal with this, but any random BFS user with a bit of time could sit down and redo a set of benchmarks that _he_ feels is more relevant and use as a counterpoint. Maybe compiling vim on an Atom CPU as well as some measurements of dropped frames in mplayer while compiling? Latencies and stuttering may be hard to measure, but it is far from impossible. Something better than «it feels better when i shake my mouse» is needed.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 8:13 UTC (Mon) by sean.hunter (guest, #7920) [Link]

Well here we are again. Con does a bunch of development off list, makes a bunch of extravagant claims, goes crying to mamma when anyone actually tests his stuff. "Mingo is invading my private space by actually checking whether my claims are true". Cue hordes of Con fans posting "it seems faster on my xxxx when I do xxxx and xxx".

All this talk about kernel core devs being disconnected from the real world is just people misunderstanding the kernel dev cycle. The stuff mingo is developing is going to be in release distros in six months or so. Six months after that 16 hyperthreaded/8 physical cores on a desktop box is not at all going to be strange. We have some of those where I work.

If Con and his friends actually want to do something useful, they should a)listen carefully to criticism
b)do some careful benchmarking and post reproducable numbers
c)work with Ingo- he actually really knows what he's talking about

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 9:02 UTC (Mon) by DavidG (guest, #60628) [Link] (10 responses)

This isn't a typical desktop machine at all. But the real question of course is not how well these benchmarks run, but "Do you have support for smooth full-screen flash video yet?"

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 10:00 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (9 responses)

Wrong part of the kernel. You want KMS / DRM, specifically screen flipping (to get all the changes from a frame onto the screen at the same time, and ensure that time isn't while the screen is being drawn)

But mostly you want high level graphics drivers (to accelerate video playback) and a decent Flash player (either from Adobe or reverse engineered by Gnash etc.) which aren't in the kernel at all.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 10:39 UTC (Mon) by DavidG (guest, #60628) [Link]

I was referring to the same XKCD comic (http://xkcd.com/619/) that encouraged Con to do this, but no the scheduler plays a big role as well, with the greatest drivers and GPU you'd otherwise still end up with glitches in the sound and dropped frames in your video player.

I'd suggest two new benchmarks:
- One with basically an Ogg player playing music and a sound capture application that scans the stream for glitches and transfers the amount of glitches to some sort of benchmarking number,
- a benchmark that detects dropped frames and sound glitches when playing video's and transfers that to some sort of benchmarking result.

And all under some sort of heavy load (e.g. make kernel, cpuburn) Perhaps these tests are already available, but I could not find tests that are design to do this specifically under heavy load...

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 12:05 UTC (Mon) by ldarby (guest, #41318) [Link] (7 responses)

Going slightly off topic, mplayer has always been able to play flv's smoothly, so I don't see how you can call Adobe's flash player "decent", it constantly pegs the cpu at 100%, doing nothing more than mplayer which uses 0-5%.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 14:41 UTC (Mon) by cortana (subscriber, #24596) [Link] (2 responses)

FYI, the reason Flash is so godawfully crap at playing video is because it actually *does* have to do a lot more than mplayer.

All mplayer has to do is decode video to YUV and then hand the picture off to the video card, which scales it to fit the screen, etc, in hardware.

Flash has to decode the video, then convert it to RGB, then combine it with other graphical elements created on the fly by a flash movie written by someone who has no idea how to program efficiently, then somehow get the result to the screen, and the methods for doing this for RGB data on Linux are buggy, inefficient, and hardware-dependent (when they even exist in the first place!)

Having said that, there is a great deal of room for improvement on the Flash side as well. One of the people who works for Adobe in porting the Flash player to Linux--a thankless task!--occasionally posts to a blog where he details interesting problems he runs into. Some interesting posts include:

http://blogs.adobe.com/penguin.swf/2008/05/
http://blogs.adobe.com/penguin.swf/2006/10/
http://blogs.adobe.com/penguin.swf/2006/09/#a001737

and another that I can't find right now where he writes about how the Windows version is accelerated by assembly code which has been optimised for years, but the Linux version can't because it's all locked away in Microsoft's assembler source format or something.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 15:30 UTC (Mon) by Cato (guest, #7643) [Link] (1 responses)

That point about the Microsoft assembler format is curious - surely it wouldn't be that hard to translate that format to GNU's? A small investment of time and a massive return if Adobe were to then use accelerated assembly code for Flash...

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 16:38 UTC (Mon) by cortana (subscriber, #24596) [Link]

Found it!

http://www.kaourantin.net/2005/08/porting-flash-player-to...

Now, the 'see this post for updated info' does say that this information is outdated, and that version 9 of the Flash Player on Linux is much faster than previous versions (which it was).

I still think there's huge room for improvement. But I understand that, while the Windows Flash Player developers spend time on optimizing, the Linux porters have to spend time making it work on a much wider variety of environments and with different versions of different libraries and graphics drivers and dealing with bugs in all of the above that are fixed in newer versions that end-users can't install because their distribution hasn't been updated for over a year, etc. etc.

For instance, I Flash won't bother to use the GPU on my Intel based laptop to accelerate rendering because:

$ glxinfo | grep 'client glx vendor'
client glx vendor string: SGI

and 'SGI' implies 'software rendering', which of course isn't true on my Intel-equipped laptop, but apparently when the developers tried to use a better method it would cause crashes on some distributions...

Details at <http://blogs.adobe.com/penguin.swf/2008/05/flash_uses_the...>.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 15:58 UTC (Mon) by mjg59 (subscriber, #23239) [Link] (3 responses)

Flash works in RGB space, whereas mplayer has the luxury of not needing to composite UI on top of the video and so can just dump YUV data into the hardware scaling engine. That's where most of the performance difference comes from.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 20:11 UTC (Mon) by alankila (guest, #47141) [Link] (2 responses)

Linux with its texture-from-pixmap extension should be able to support RGB scaling via hardware, only I spent a few days last summer trying to get this feature to work---and after failing that, just getting some demos that should use this feature to work. That failed too. But: compiz works just fine, so the extension is probably okay. The X errors which I received were just so undebugable that I just gave up trying to figure out what is wrong.

Once Linux, too, can take RGB surface and display it on screen with hardware acceleration, things will indeed be better for us. But here's me wishing: it should be easy, not just "maybe works if you happen to have the right mixture of everything installed and moon's phase is just right".

It's awful how Linux slowly conditions one to inferior experience. I had a friend visiting and he was genuinely surprised when he saw that I could press the full-screen button on youtube and it actually worked. He failed to notice that at that time I was actually running Firefox within Windows... ;-(

From end-user point of view, this problem can't die fast enough.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 11:58 UTC (Tue) by nye (subscriber, #51576) [Link] (1 responses)

> I had a friend visiting and he was genuinely surprised when he saw that I could press the full-screen button on youtube and it actually worked

I agree that performance isn't brilliant - in particular, the YouTube Flash applet becomes absurdly slow in full-screen when the controls are visible, though some other applets don't have that problem - but I'd be surprised if full-screen Flash video didn't work at all. In what way does it fail for you/him?

*My* problem with Flash is that Firefox appears to be aggressively single-threaded, and a few times a minute decides to peg the CPU for a second or so, so if I want to play Flash smoothly I have to use Opera - or Konqeror, or any non-Firefox browser really.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 23:59 UTC (Tue) by alankila (guest, #47141) [Link]

Well, what I get is that the full-screen button flashes a full-screen video for a frame or two, and invariably falls back to windowed mode. I'm not sure if that is the symptom for him, though.

It could also be another problem: right now mouse button clicks within the flash applets don't seem to register -- I have to start video playback by pressing space because clicking with the mouse somehow doesn't seem to go through. Especially pressing the full-screen button does absolutely nothing right now. *sigh*

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 9:43 UTC (Mon) by russell (guest, #10458) [Link] (7 responses)

Every one of Ingo's tests is about throughput, not latency. Unless Ingo defines latency as the amount of time available to make coffee while a kernel build completes :)

Would be great if Con, et al, spent some time developing benchmarks that put numbers to how a desktop feels. That's probably the only way these conversations will stop diving into the ground on flames.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 10:28 UTC (Mon) by jamesh (guest, #1159) [Link] (2 responses)

The pipe test looks like it is entirely about latency rather than throughput. It has two processes that wait on each other in an alternating fashion.

Scoring well on that benchmark will depend heavily on the average latency of the scheduler. Unfortunately it doesn't tell you the variance in the latency.

The benchmark Jens Axboe suggested (http://thread.gmane.org/gmane.linux.kernel/886319/focus=8...) might do the trick in measuring that though.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 10, 2009 4:01 UTC (Thu) by russell (guest, #10458) [Link] (1 responses)

The pipe test would work best if the scheduler gave each task sufficient time to fill or empty the pipe, depending on it's role. It would suffer badly if it kept preempting those tasks to give some other task a go when it became runnable.

The pipe test is more about ordering producers and consumers. Not latency.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 10, 2009 5:18 UTC (Thu) by jamesh (guest, #1159) [Link]

In the pipe test, neither process is going to be able to fill the pipe buffer. Each process blocks on the other doing alternating reads and writes on the pipes with pretty much no work in between.

I guess it is possible that a scheduler could preempt the task between when the read returns and before it performs the write, but that seems unlikely.

My intuition is that performance would primarily depend on how quickly the scheduler gets round to run a process when it becomes unblocked, which is essentially a measure of average scheduling latency (and as I said before, this doesn't tell you much about the variance in that latency).

Does the kernel scheduler even matter???

Posted Sep 7, 2009 11:05 UTC (Mon) by eru (subscriber, #2753) [Link] (3 responses)

It seems to me most of the desktop latencies stem from bloated desktop managers and other GUI software, and from bad graphics drivers. The kernel cannot fix these issues, no matter which scheduler it has. The question to ask is do the user interface really have to relay events through about 100 levels of complex libraries every time the user performs some action.

Does the kernel scheduler even matter???

Posted Sep 7, 2009 11:21 UTC (Mon) by mjthayer (guest, #39183) [Link] (2 responses)

Still, if (and I haven't actually done any tests here) the kernel can fix this in a reasonable way, why not do it?

Does the kernel scheduler even matter???

Posted Sep 7, 2009 13:46 UTC (Mon) by mingo (guest, #31122) [Link] (1 responses)

For many things the process scheduler does matter.

For most desktop things, the IO scheduler, the file system and the VM has a far bigger role.

To measure latencies, there's the latencytop tool from Arjan that can give a first-level guess about what the main source of latencies is.

If it's IO latencies then blktrace can be used to pin them down more precisely.

If it's indeed the process scheduler that is causing latencies, the latency tracer can be used to pin down the reason more precisely. On the lowest level the function tracer can be used too, for harder cases. There's also a lot of built-in statistics, tracepoints and in .31 based kernels also performance counters that can help in the pinning down of such bugs.

Does the kernel scheduler even matter???

Posted Sep 10, 2009 20:54 UTC (Thu) by ajb (subscriber, #9694) [Link]

The scheduler could still help more under conditions of VM stress. For example, on my netbook, which thrashes when you run firefox + anything, I literally run killall -STOP firefox-bin; killall -CONT other-app when I want to switch between them. This is a lot more convenient than quitting and restarting each app, which otherwise I would have to do. I imagine there might be some less manual way to achieve the same thing by building some more smarts into the scheduler/VM to achieve the same effect. Possibly with help from userspace.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 10:35 UTC (Mon) by mb (subscriber, #50428) [Link] (2 responses)

These are the kernbench results for bfs-209 on a 4-way SMP.
BFS is about 8% faster in this test here, as far as I can tell.

mb@homer:~/linux/test/linux-2.6.30$ cat kernbench-cfs.log
Sun Sep 6 12:36:20 CEST 2009
2.6.30.5
Average Half load -j 3 Run (std deviation):
Elapsed Time 737.527 (0.39501)
User Time 2031.15 (0.38175)
System Time 168.603 (0.166233)
Percent CPU 298 (0)
Context Switches 85408 (878.425)
Sleeps 97395 (115.724)

Average Optimal load -j 4 Run (std deviation):
Elapsed Time 611.893 (0.462313)
User Time 2035.69 (4.97924)
System Time 169.565 (1.05992)
Percent CPU 329.333 (34.3259)
Context Switches 103275 (19596.7)
Sleeps 97440.5 (121.078)

mb@homer:~/linux/test/linux-2.6.30$ cat kernbench-bfs.log
Sun Sep 6 15:14:08 CEST 2009
2.6.30.5-bfs
Average Half load -j 3 Run (std deviation):
Elapsed Time 728.563 (0.448144)
User Time 2031.75 (0.640494)
System Time 171.72 (0.167033)
Percent CPU 302 (0)
Context Switches 35229 (6473.67)
Sleeps 113467 (118.95)

Average Optimal load -j 4 Run (std deviation):
Elapsed Time 563.54 (0.32078)
User Time 2039.38 (8.36488)
System Time 173.375 (1.81625)
Percent CPU 348 (50.3905)
Context Switches 79741.5 (48946.2)
Sleeps 108397 (5554.48)

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 12:56 UTC (Mon) by mingo (guest, #31122) [Link] (1 responses)

Mind testing the latest mainline scheduler as well?

You can find the latest scheduler development code in the latest -tip tree.

Thanks!

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 15:43 UTC (Mon) by mb (subscriber, #50428) [Link]

I'll test it soon, thanks!
I already downloaded and compiled it. I think I'll run the tests tomorrow.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 11:33 UTC (Mon) by ikm (guest, #493) [Link] (29 responses)

As far as I remember, Con's drama was that he could only "feel" the difference, not measure it. That's why selling his work to kernel devs has always been a hard bargain.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 11:41 UTC (Mon) by ikm (guest, #493) [Link] (28 responses)

And actually, yeah, Con's reply to this benchmarks' email summarizes it nicely:

"[..] lots of bullshit meaningless benchmarks showing how great cfs is
and/or how bad bfs is, along with telling people they should use these
artificial benchmarks to determine how good it is, demonstrating yet
again why benchmarks fail the desktop"

I'd add that Ingo's tests do look like they kinda miss the point. Coupled with his 6000x4000 jpeg graphs, they leave a strange impression.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 11:49 UTC (Mon) by bvdm (guest, #42755) [Link] (23 responses)

ikm: i don't think you should expect to convince the lwn.net audience with arguments suggesting Ingo Molnar's technical incompetence. Really.

everyone: can we raise the level of this debate a bit?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 12:09 UTC (Mon) by kragil (guest, #34373) [Link] (10 responses)

Ingo is probably one of best hackers on this planet but that does not mean he is living in the world as everyone else.

When I read:
"So the testbox i picked fits into the upper portion of what i
consider a sane range of systems to tune for - and should still fit
into BFS's design bracket as well according to your description:
it's a dual quad core system with hyperthreading."

Tune the scheduler for 16 core machine? Thank you very much. I know nobody with more than a quadcore and those are spanking new.

And it is really really unfair to test a scheduler that wants to enhance interactivity for pure performance on a system that is clearly the upper limit of what the scheduler was designed for.

What I take from this discussion is that Kernel devs live in a world where Intels fastest chips in multi socket systems are low end and they will cater only to the enterprise bullcrap that pays their bills.

Despite what Linus says Linux is not intended to be used on the desktop(at least not in the real world).

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 13:18 UTC (Mon) by aigarius (subscriber, #7329) [Link] (3 responses)

i7 has been around for what? A year already? 8 cores there. Benchmarking forward a couple years for kernel development is a reasonable assumption. Meanwhile, even people with quad-cores say that Ingo's tests are still showing the same results.

Cory needs to show quantifiable tests so that performance of different versions of schedulers can actually be compared. How can we know that a patch improves on the code if there is no quantifiable number showing that conclusively?

Scientific approach, please. Insulting people does not win arguments in technical communities. Facts, tests and numbers do.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 16:13 UTC (Mon) by andreashappe (subscriber, #4810) [Link]

> i7 has been around for what? A year already? 8 cores there.

4 cores plus ht.

Still makes me smile when I see the htop output.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 7:48 UTC (Tue) by epa (subscriber, #39769) [Link]

It might help to see some numbers. Take Fedora's smolt data, which is from people who have clicked 'yes' when installing Fedora and have reported what hardware they use.

This shows that more than half of Fedora systems are dual-processor, with another 38% having a single CPU. So based on hardware that's in use now, a one- or two- processor test would be more reasonable. Of course it's useful to test on 16-processor monsters as well, but that is not the typical desktop and won't be for some time. (And by the time it is, all sorts of other assumptions will have changed too.)

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 8:34 UTC (Tue) by branden (guest, #7029) [Link]

Aigarius,

How about we bench based on the profiles of the machines people bring to
Debconf?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 13:37 UTC (Mon) by mingo (guest, #31122) [Link] (3 responses)

What I take from this discussion is that Kernel devs live in a world where Intels fastest chips in multi socket systems are low end and they will cater only to the enterprise bullcrap that pays their bills.

I certainly dont live in such a world and i use a bog standard dual core system as my main desktop. I also have a 833 MHz Pentium-3 laptop that i booted into a new kernel 4 times today alone:

  #0, d5f8b495, Mon_Sep__7_08_39_36_CEST_2009: 0 kernels/hour
  #1, b9e808ca, Mon_Sep__7_09_19_47_CEST_2009: 1 kernels/hour
  #2, b9e808ca, Mon_Sep__7_10_26_28_CEST_2009: 1 kernels/hour
  #3, b9e808ca, Mon_Sep__7_14_58_48_CEST_2009: 0 kernels/hour

  $ head /proc/cpuinfo 
  processor	: 0
  vendor_id	: GenuineIntel
  cpu family	: 6
  model		: 8
  model name	: Pentium III (Coppermine)
  stepping	: 10
  cpu MHz	: 846.242
  cache size	: 256 KB

  $ uname -a
  Linux m 2.6.31-rc9-tip-01360-gb9e808c-dirty #1178 SMP Mon Sep 7 22:38:18 CEST 2009 i686 i686 i386 GNU/Linux

And that test-system does that every day - today isnt a special day. Look at the build count: #1178. This means that i booted more than a thousand development kernels on this system already.

Now, to reply to your suggestion: for scheduler performance i picked the 8 core system because that's where i do scheduler tests: it allows me to characterise that system _and_ also allows me to characterise lower performance systems to a fair degree.

Check out the updated jpgs with quad-core results.

See how similar the single-socket quad results are to the 8-core results i posted initially? People who do scheduler development do this trick frequently: most of the "obvious" results can be downscaled as a ballpark figure.

(the reason for that is very fundamental: you dont see new scheduler limitations pop up as you go down with the number of cores. The larger system already includes all the limitations the scheduler has on 4, 2 or 1 core, and reflects those properties already so there's no surprises. Plus, testing is a lot faster. It took me 8 hours today to get all the results from the quad system. And this is right before the 2.6.32 merge window opens, when Linux maintainers like me are very busy.)

Certainly there are borderline graphs and also trickier cases that cannot be downscaled like that, and in general 'interactivity' - i.e. all things latency related come out on smaller systems in a more pronounced way.

But when it comes to scheduler design and merge decisions that will trickle down and affect users 1-2 years down the line (once it gets upstream, once distros use the new kernels, once users install the new distros, etc.), i have to "look ahead" quite a bit (1-2 years) in terms of the hardware spectrum.

Btw., that's why the Linux scheduler performs so well on quad core systems today - the groundwork for that was laid two years ago when scheduler developers were testing on a quads. If we discovered fundamental problems on quads _today_ it would be way too late to help Linux users.

Hope this explains why kernel devs are sometimes seen to be ahead of the hardware curve. It's really essential, and it does not mean we are detached from reality.

In any case - if you see any interactivity problems, on any class of systems, please do report them to lkml and help us fix them.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 8:46 UTC (Tue) by kragil (guest, #34373) [Link] (2 responses)

Reading all your answers calmed me down a bit :) Thanks

I think our major disagreement here is the "look ahead".

I strongly believe that computers have reached the point where this relentless upgrade cycle should and has stopped. If you bought a P4 with HT and 1 GB in 2003 it is still perfectly capable of running the newest software 95% of desktop users need. Machines like that can turn 7 YEARS soon. People will look for computers that use less engery and don't have moving parts that just break after a few years.
PCs will be like old TV sets and work for many many years (10 to 15 years). The software has to adapt. That is the "look ahead" I see, but I can understand why Red Hat plans for something different.

I think faster ARM,Mips and Atom CPUs are the architecture most desktop Linux kernels will run on and the relative percentage of X-core X86 monsters will decline (maybe even rapidly).

And no I don't think Fedoras smolt data is any good here. Fedora users are technical people and are unlikely to run really old hardware like my sisters for example.

I also don't think Linux will ever get problems with the fastest computers, its dominance in the HPC area will make sure of that.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 9:30 UTC (Tue) by mingo (guest, #31122) [Link] (1 responses)

And no I don't think Fedoras smolt data is any good here. Fedora users are technical people and are unlikely to run really old hardware like my sisters for example.

That's all fine and i have a Fedora Core 6 box too on old hardware - which is very old.

I wouldnt upgrade the kernel on it though - and non-technical users would do that even less likely. Software and hardware is in a single unit and for similar reasons it is hard to upgrade hardware is it difficult to upgrade software as well. Yes, you pick up security fixes, etc. - but otherwise main components like the kernel tend to be cast into stone at install time. (And no, if you are reading this on LWN.Net then your box probably does not qualify ;-)

Which means that most of the 4 years old systems have a 4 years old distribution on them, with a 4 years old kernel. That kernel was developed 5 years ago and any deep scheduler decisions were done 6 years ago or even later.

So yes, i agree that the upgrade treadmill has to stop eventually, but _I_ cannot make it stop - i just observe reality and adopt to it. I see what users do, i see what vendors do and i try to develop the kernel in the best possible technical way, matching those externalities.

What i'm seeing right now as the scheduler and as the x86 co-maintainer is that the hardware side shows no signs of slowing down and that users who are willing to install new kernels show eagerness to buy shiny new hardware. Quads yesterday, six-cores today, opto-cores in a year or two.

Most of the new kernel installs goes to fresh new systems, so that's an important focus of the upstream kernel - and of any distribution maker. That is the space where we _can_ do something realistically and if we did something else we'd be ignoring our users.

I could certainly be wrong about all that in some subtle (or not so subtle) way - but right now the fact is that most of the bugreports i get against development code we release is done on relatively new hardware.

That is natural to a certain degree - new hardware triggers new, previously unknown limitations and bottlenecks, and new hardware has its own problems too that gets mixed into kernel problems, etc. Old hardware is also already settled into its workload so there's little reason to upgrade an old, working box in general. There's also the built-in human excitement factor that shiny new hardware triggers on a genetic level ;-)

There's an easy way out though: please report bugs on old hardware and make old hardware count. The mainline kernel can only recognize and consider people who are willing to engage. The upstream kernel process is a fundamentally auto-tuning and auto-correcting mechanism and it is mainly influenced by people willing to improve code.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 9, 2009 11:41 UTC (Wed) by nix (subscriber, #2304) [Link]

Well, I'm a counterexample: I upgrade my hardware every decade, if that, but the kernels are normally as new as possible, because I'd like newish software, thanks, and that often likes new kernels. Further, everyone I know who isn't made of money and runs Linux does the same thing: they tend to run Fedora, recentish Ubuntu, or Debian testing, because non-enterprise users generally do not want to run enterprise distros because all the software on them is ancient, and non-enterprise distro kernels *do* get upgraded.

I suspect your argument is pretty much only true for corporate uses of Linux (i.e. 'just work with *this* set of software', as opposed to other uses which often involve installation of new stuff). But perhaps those are the only uses that matter to you...

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 16:19 UTC (Mon) by einstein (guest, #2052) [Link]

> Despite what Linus says Linux is not intended to be used on the desktop(at least not in the real world).

Speak for yourself. I've been using linux on the desktop in the real world for years, as have a number of other people I know, your snarky little jabs notwithstanding.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 19:38 UTC (Mon) by leoc (guest, #39773) [Link]

Despite what Linus says Linux is not intended to be used on the desktop(at least not in the real world).

For a system not intended to be used in the "real world" it is doing pretty well considering it has around 1/4 the market share of OS X.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 12:22 UTC (Mon) by ikm (guest, #493) [Link] (11 responses)

> i don't think you should expect to convince the lwn.net audience with arguments suggesting Ingo Molnar's technical incompetence. Really.

I expect everyone can draw the conclusions of their own. I've made mine. Ingo's a nice guy, but I don't think he's measuring the right things here. But how are you going to measure things like:
  • mplayer using OpenGL renderer doesn't drop frames anymore when dragging and dropping the video window around in an OpenGL composited desktop
  • Composite desktop effects like zoom and fade out don't stall for sub-second periods of time while there's CPU load in the background
  • LMMS (a tool utilizing real-time sound synthesis) does not produce "pops", "crackles" and drops in the sound during real-time playback due to buffer under-runs
  • Games like Doom 3 and such don't "freeze" periodically for small amounts of time (again for sub-second amounts) when something in the background grabs CPU time
Those are things a person has reported as a followup on the thread in question. Do you think his was lying?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 12:41 UTC (Mon) by bvdm (guest, #42755) [Link] (1 responses)

Do you have a point other than that the current scheduler is not perfect? We all knew that. And Ingo invited Con to help improve it. So you don't really have a point at all, do you?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 12:59 UTC (Mon) by ikm (guest, #493) [Link]

Go troll elsewhere. Thank you.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 13:11 UTC (Mon) by mingo (guest, #31122) [Link] (2 responses)

But how are you going to measure things like:

* mplayer using OpenGL renderer doesn't drop frames anymore when dragging and dropping the video window around in an OpenGL composited desktop

* Composite desktop effects like zoom and fade out don't stall for sub-second periods of time while there's CPU load in the background

* LMMS (a tool utilizing real-time sound synthesis) does not produce "pops", "crackles" and drops in the sound during real-time playback due to buffer under-runs

* Games like Doom 3 and such don't "freeze" periodically for small amounts of time (again for sub-second amounts) when something in the background grabs CPU time

This is a list of routine interactivity problems that we track down and address. In the past few years we've got extensive infrastructure built up in the mainline kernel that allows their measurement and allows us to eliminate them.

A good place to start would be to try the latency tracing suggestions from Frederic Weisbecker on lkml:

Such properties of the desktop are measured routinely (sometimes easily - sometimes it needs quite a bit of work) - so please report them and help out tracking them down.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 14:02 UTC (Mon) by ikm (guest, #493) [Link] (1 responses)

Yay, that's a start. I hope this can go somewhere eventually. Clearly it's the interactivity issues Con has always been after, not the bulk workloads. With a way to measure and quantify those issues and scenarios, something might get going somewhere.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 21:53 UTC (Mon) by mingo (guest, #31122) [Link]

You might want to try latencytop. We added the instrumentation for that after the CFS merge - to make it easier to prove/report scheduler (and other) latencies.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 16:21 UTC (Mon) by lacostej (guest, #2760) [Link] (5 responses)

> But how are you going to measure things like:

Can't these tool detect when they hang/stall ?

Can't we pipe modify them to report the issues in a known format (or to a third party daemon) and use those tools as tests ?

I mean if I was Con, that's the first thing I would do: create a measurable suite of tests.

Instead of talking of feelings, we would talk about measurable things. It's not like we're talking about usability. Even usability can be tested up to some degree.

So, can't we elevate the debate ?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 18:35 UTC (Mon) by hppnq (guest, #14462) [Link] (4 responses)

I mean if I was Con, that's the first thing I would do: create a measurable suite of tests.

Actually, he did that: you may find interbench interesting. It was used to produce Con's performance statistics. Also, see this 2002 interview with Con, discussing his earlier effort ConTest and scheduler benchmarking in general.

The challenge, it seems, is to get scheduler developers to agree on what constitutes a normal workload on normal systems tuned in normal ways.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 21:45 UTC (Mon) by mingo (guest, #31122) [Link] (3 responses)

The challenge, it seems, is to get scheduler developers to agree on what constitutes a normal workload on normal systems tuned in normal ways.

There's not much disagreement really. Everyone agrees that interactivity problems need to be investigated and fixed - it's as simple as that. We have a lot of tools to do just that, and things that get reported to us we try to get fixed.

In practice, interactivity fixes rarely get in the way of server tunings - and if they do, the upstream kernel perspective was always for desktop/latency tunings to have precedence over server/thoughput tunings.

I'm aware that the opposite is being claimed, but that does not make it a fact.

Try a simple experiment: post a patch to lkml with Linus Cc:-ed that blatantly changes some tunable to be more server friendly (double the default latency target or increase some IO batching default) at the expense of desktop latencies. My guess is that you'll see a very quick NAK.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 8:01 UTC (Tue) by hppnq (guest, #14462) [Link] (2 responses)

We have a lot of tools to do just that, and things that get reported to us we try to get fixed.

Ah, my point is that you claim to compare apples to apples while you use different tools than Con to compare the performance of the BFS and CFS schedulers. It is entirely possible that I missed the comparison of benchmarking tools, of course, and I'm not saying that you or Con should choose any particular tool: I am simply observing there is a difference.

But, looking at the interbench results, I cannot help but think that it would have been better if Con had used some other benchmarks as well: one could drive a truck through those standard deviations.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 8:48 UTC (Tue) by mingo (guest, #31122) [Link] (1 responses)

Ah, my point is that you claim to compare apples to apples while you use different tools than Con to compare the performance of the BFS and CFS schedulers. It is entirely possible that I missed the comparison of benchmarking tools, of course, and I'm not saying that you or Con should choose any particular tool: I am simply observing there is a difference.

Well, the reason i spent 8+ hours for each round of testing is because i threw a lot of reliable and relevant benchmarks/workloads at the schedulers. Most of those were used by Con too in the past for scheduler work he did so it's not like he never runs them or disagrees with them on some fundamental basis - he just chose not to test them on BFS this time around. Sysbench comes from FreeBSD for example, hackbench was written many years ago to test chat server latencies/throughput, kbuild, lat_tcp and lat_pipe is well-known as well, etc.

Basically i applied a wide spectrum of tests that _I_ find useful to build a picture about how good a scheduler is, and posted the results. (I wanted to find the strong spot of BFS - which by in turn would be a weak spot of the mainline scheduler.)

So i tested what i was curious about (basic latency in four tests, throughput and scalability in two other tests) - others can test what they are curious about - testing these schedulers is not that hard, it's not like i have a monopoly on posting scheduler comparisons ;-)

But, looking at the interbench results, I cannot help but think that it would have been better if Con had used some other benchmarks as well: one could drive a truck through those standard deviations.

The inherent noise in the interbench numbers does not look particularly good - and i found that too in the past. But it's still a useful test, so i'm not dissing it - it's just very noisy in general. I prefer low noise tests as i want to be able to stand behind them later on. When i post benchmarks they get a lot of scrutiny, for natural reasons, so i want sound results. You wont find many (any?) measurements from me in the lkml archives that were discredited later.

Also, on the theoretical angle, i dont think there's much to be won on the interactivity front either: the mainline scheduler has a fixed deadline (/proc/sys/kernel/sched_latency_ns) which you can tune down if you wish to and it works hard to meet that latency goal for every task. If it doesn't then that's a bug we want to fix, not some fundamental design weakness.

But ... theory is one thing and practice is another, so it always makes sense to walk the walk and keep an open mind about all this.

So what we need now are bugreports and testers willing to help us. These kinds of heated discussions about the scheduler are always useful as the attention on the scheduler increases and we are able to fix bugs that don't get reported otherwise - so i'm not complaining ;-)

For latency characterisation and debugging we use the latency tests i did post (pipe, messaging, etc.), plus to measure a live desktop we use latencytop, latency tracer, the 'perf' tool, etc.

So there's plenty of good tools, plenty of well-known benchmarks, plenty of good and reliable data, and a decade old kernel policy that desktop latencies have a precedence over server throughput - and the scheduler developers are eager to fix all bugs that get reported.

Let me note here that based on these 100+ comment discussions here on LWN and on Slashdot as well, we only got a single specific latency bugreport against the upstream scheduler in the past 24 hours. So there's a lot of smoke, a lot of wild claims and complaints - but little actionable feedback from real Linux users right now.

So please, if you see some weirdness that is suspected to be caused by the scheduler then please post it to lkml. (Please Cc: Peter Zijstra and me as well to any email.) I'm sure the scheduler is not bug-free and i'm sure there's interactivity bugs to fix as well, so dont hesitate to help out.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 11:45 UTC (Tue) by hppnq (guest, #14462) [Link]

Thanks for clarifying! Not only do I appreciate all those hours of developing and testing wonderful software, I also like it a lot that you take the time to comment about it here at LWN. :-)

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 11:58 UTC (Mon) by yoshi314 (guest, #36190) [Link] (3 responses)

"I'd add that Ingo's tests do look like they kinda miss the point. Coupled
with his 6000x4000 jpeg graphs, they leave a strange impression."

well, if HT quad-core is just his testbox, operating on huge jpeg files is
problably not an issue on his his real workstation.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 13:03 UTC (Mon) by mingo (guest, #31122) [Link] (1 responses)

Note, i fixed the jpegs - it was a silly mistake - sorry about that.

[ I did notice the image viewing suckiness on my dual-core laptop but blamed it on firefox ;-) ]

Anyway, i've fixed the jpegs and i've re-done and posted the measurements on a quad too in this lkml post, and the results are similar to the dual quad.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 16:18 UTC (Mon) by smurf (subscriber, #17840) [Link]

Note, i fixed the jpegs - it was a silly mistake - sorry about that.

Sure. However, your task now is to write at least twenty lines of "JPEGs are good for pictures. And nothing else. Really.", and remember to use PNGs next time. Thanks.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 16:52 UTC (Mon) by foom (subscriber, #14868) [Link]

> well, if HT quad-core is just his testbox, operating on huge jpeg files is
> problably not an issue on his his real workstation.

Well, they weren't a problem on my 2-year-old mac laptop either (in Safari). I didn't actually notice that they were huge except that it was rather slow to download. Maybe Firefox's image rendering library needs some optimization work too. :)

<Summary> The debate thus far

Posted Sep 7, 2009 13:01 UTC (Mon) by bvdm (guest, #42755) [Link] (2 responses)

Con: Hello, I'm back! I'm still just as passive-aggressive as before, but I've got some new ideas!

Mingo: That's interesting. Here are some benchmark graphs for use cases that are measurable. You are welcome to suggest others and send patches.

Con: You've invaded my personal space! I don't want a considered response, all I wanted was an audience!

The Intertubes: Yah! We can get all worked up about things nobody can measure and hardly understand ourselves! All over again!

Lwn.net trolls: Here is an ad hominem attack on that fool noob Igno whats-his-name. He cannot even get his images size right! LOL

Mingo: Here are some more graphs for fewer cores. Still looks the same.

<Summary> The debate thus far

Posted Sep 7, 2009 14:26 UTC (Mon) by jospoortvliet (guest, #33164) [Link] (1 responses)

hehe, priceless. Now wait for the trolls to decent upon your ass :D

I do think Con regularly does very cool things with his scheduler work,
btw. He did kick start (ok, inspire) the work which lead to our current
default scheduler, remember...

<Summary> The debate thus far

Posted Sep 7, 2009 15:45 UTC (Mon) by bvdm (guest, #42755) [Link]

You are of course right. Which makes it all the more tragic that his lack of agreeableness robbed him of his due recognition and continued contribution.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 15:07 UTC (Mon) by paragw (guest, #45306) [Link] (37 responses)

So what happened to pluggable schedulers? I recall there was plugsched by Peter Williams up until 2.6.22 but not sure where it went from there.

It sounds like one scheduler fits all approach may not be the right one - or am I mistaken and CFS is doing well for all desktop, server and in-between workloads? If it is then it makes pluggable schedulers less attractive.

However it still would be good to be able to do sched=server, sched=desktop, sched=netbook (lol!) type things. I think the scheduler code will also be definitely simplified if it is given a definite objective as opposed to the dance it has to do right now making sure everyone is happy. We could even do sillier things on the desktop by feeding the desktop scheduler a list of processes and its descendants to award more interactivity to - no matter what happens in the background, it can put the memory hogs and CPU hogs to rest and allow me to click on the windows etc.

/me goes digging plugsched on google.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 18:39 UTC (Mon) by niner (subscriber, #26151) [Link] (20 responses)

"We could even do sillier things on the desktop by feeding the desktop scheduler a list of
processes and its descendants to award more interactivity to - no matter what happens
in the background"

But you can do that already! In fact, you ought to have been able to do that for
decades. What you want is just the simple nice and renice commands. Works for any
list of processes you want and their descendants. No need to hardcode names into a
scheduler.

I keep wondering why people seem to have completely forgotten about nice values and
instead expect the scheduler to guess what are the important processes for them, when
they can simply tell it.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 20:37 UTC (Mon) by roskegg (subscriber, #105) [Link] (9 responses)

Because nice and renice don't affect interactivity issues as much as you would think they would.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 21:35 UTC (Mon) by mingo (guest, #31122) [Link] (8 responses)

Because nice and renice don't affect interactivity issues as much as you would think they would.

What do you mean?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 5:12 UTC (Tue) by realnc (guest, #60393) [Link] (7 responses)

Meaning that even if you nice 19 every other process, mplayer will still battle for CPU time with the compositor and drop frames and skip sound as soon a composite effect kicks in.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 6:10 UTC (Tue) by mingo (guest, #31122) [Link] (6 responses)

Does it behave in an anomalous way for you? What would you expect it to do and what does it do for you currently?

I.e. the default behavior is that if both compiz and mplayer are running (and both are a single thread) then they should get 50%/50% of a single CPU - or be nicely on separate CPUs on dual-core. (with an added twist that Xorg generally tends to get some amount of CPU time as well when compiz is active - plus whatever other app that is generating X output.)

If that's not enough then come nice levels into play.

You can indeed renice up - but you can also renice down - so you can set mplayer to nice -5 for example.

Nice levels work according to a very simple rule: if you set mplayer to nice -1, it will get 55% of CPU time, compiz gets 45% of time. Yet another nice level and it's 60% versus 40%. It goes roughly 20% up with every nice level - so nice -5 should get you 75%/25%, nice -10 gives you 90% CPU time and 10% CPU time for compiz.

More tasks can modify this behavior - but this is the general principle. If this does not work like that for you, please report it as a scheduler bug on lkml.

Note, you can set negative nice levels as an ordinary user as well, There's an rlimit for it (and PAM support): see the 'nice' attribute in /etc/security/limits.conf - you can set it per user.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 7:31 UTC (Tue) by epa (subscriber, #39769) [Link] (4 responses)

The thing is, nice levels mostly affect total throughput, but what needs improvement is latency. A 50-50 split between two tasks sounds ideal, but that only makes sense if they are both CPU-bound tasks. In the case of compiz and mplayer, the first spends most of its time blocking on user input, and the second doesn't need much CPU time (probably a lot less than 50% on a modern system) but it does need to respond quickly and not be blocked for too long. 'nice' doesn't really address these issues.

(Also I think that 'nice' won't help you if one process starts thrashing the memory and swapping; another process, even if nominally at a lower niceness level, will be heavily slowed down.)

When nice lets you specify desired maximum latencies, as well as just throughput, then it will be a suitable way to get good desktop performance.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 9:44 UTC (Tue) by mingo (guest, #31122) [Link] (3 responses)

The thing is, nice levels mostly affect total throughput, but what needs improvement is latency.

That's exactly what the upstream scheduler does. The upstream scheduler considers latency goals in a nice-level normalized way. See the wakeup_gran() function in kernel/sched_fair.c:


  static unsigned long
  wakeup_gran(struct sched_entity *curr, struct sched_entity *se)
  {
          unsigned long gran = sysctl_sched_wakeup_granularity;
          [...]
                 if (unlikely(se->load.weight != NICE_0_LOAD))
                          gran = calc_delta_fair(gran, se);
          [...]
  }

See the calc_delta_fair() - that is the nice level normalizer. Plus-reniced tasks will get longer latencies - minus-reniced tasks will get shorter wakeup latencies.

If this does not work for you then that's a bug, please report it in that case.

Note that you can tune the basic kernel latency goals/deadlines via two dynamic sysctls: sched_wakeup_granularity_ns and sched_latency_ns. Lower those and you'll get a snappier desktop - at the expense of some throughput.

You can set these in /etc/sysctl.conf to make the settings permanent. (and please report it to us if a new setting improves some workload in a dramatic way - we constantly re-tune the upstream default as well, to make for a snappier desktop.)

(Note that for forced preemption (CPU bound tasks) HZ is a lower limit - but otherwise it's tunable in a finegrained way. So say you want to change from HZ=250 to HZ=1000 if you want to set the latency targets down to 1 millisecond.)

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 10:22 UTC (Tue) by epa (subscriber, #39769) [Link] (2 responses)

Thanks for the info. I was still thinking of classic UNIX nice values. It would be even better if you could specify some units for the latency - Linux is not a hard real-time system but nonetheless users might want to say 'maximum latency 10ms for this process' as a best-effort goal and something to benchmark against. Do any distributions come with an appropriate set of nice values built in?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 11:58 UTC (Tue) by mingo (guest, #31122) [Link] (1 responses)

One difference is that nice levels are relative - that way "nice +5" makes relative sense from within a nice +10 workload. Latency values tend to be absolute. Relative makes more conceptual sense IMO - as workloads are fundamentally hierarchical and a sub-workload of some larger workload might not be aware of the larger entity it is running in.

Also, a practical complication is that there's not much of a culture of setting latencies and it would take years to build them into apps and to build awareness.

Also, latencies are hardware dependent and change with time. 100 msecs on an old box is very different from 100 msecs on a newer box.

Maybe for media apps it would make sense to specify some sort of deadline (a video app if it wants to display at fixed frequency, or an audio app if it knows its precise buffering hard limit) - but in practice these apps tend to not even know their precise latency target. For example the audio pathway could be buffered in the desktop environment, in the sound server and in the kernel too.

Nor would it solve much: most of the latencies that people notice and which cause skipping/dropped-frames etc. are bugs, they are unintended and need fixing.

Nevertheless this has come up before and could be done to a certain degree. I still hope that we can just make things behave by default, out of box, without any extra tweaking needed.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 12:41 UTC (Tue) by epa (subscriber, #39769) [Link]

I agree that relative niceness levels make the most sense in a batch processing environment or in a 'lightly interactive' environment such as a Unix shell, where it should respond quickly when you type 'ls', but there is no firm deadline.

I think they make a bit less sense for multimedia applications or even ordinary desktop software (where users nowadays expect smooth scrolling and animations). You are right that in the Unix world there isn't much culture of setting quantifiable targets for latency or CPU use; we are accustomed to mushy 'niceness' values, where setting a lower niceness somehow makes it go faster, but only the most greybearded of system administrators could tell you exactly how much.

One reason to specify a latency target in milliseconds is just to have something quantifiable. A lot of discussions on LKML and elsewhere about scheduling seem to suffer from a disconnect between one side running benchmarks such as kernel compiles, which give hard numbers but aren't typical of desktop usage, and another side who just talk in qualitative terms about how much faster it 'feels'.

I expect that if a 'max latency' option were added to the kernel and it did almost nothing at all to start with, it would still provide a framework for improvements to take place - a latency of 110ms when 100ms was requested could now be a quantifiable performance regression, and people can benchmark their kernel against a promised performance target rather than just trying to assess how it feels. (You yourself have provided such a latency benchmark - the 'load enormous JPEG in Firefox' test suite :=-).

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 19:12 UTC (Tue) by realnc (guest, #60393) [Link]

Does it behave in an anomalous way for you? What would you expect it to do and what does it do for you currently?

It does behave "anomalous." A simple example would be mplayer (or any other video player) or an OpenGL app "hanging" for a bit while I leave my mouse over the clock in the systray. This brings up details about the current time (what day it is, month, etc) in a "bells and whistles" pop-up that just doesn't pop-up out of the blue but slowly fades-in using transparency. It is for the duration of this compositing effect (which actually doesn't even need that much CPU power) that mplayer stalls, barks and drops frames.

Now image how bad things can seem with virtually a crapload of actions (opening menus, switching desktops, moving windows, etc, etc) result in frame skipping, sound stuttering, mouse pointer freezing, etc. They perform well, that's not the problem. The problem is that due to the skips and lag, they *seem* to be sluggish. Not in a dramatic way, but still annoying. I was actually quite used to Linux behaving like that. But after applying the BFS patch, Linux joined the list of "smooth GUI" OSes (alongside OS X and MS Vista/7). That's how a desktop should feel like. Frankly, I never quite suspected the kernel to be at fault here, but rather the applications themselves. But after seeing BFS solving all those problems, it seems the kernel can be at fault for such things.

The Android folks also confirmed that their devices ran much more fluid and responsive after they loaded a custom firmware on them with a BFS-patched kernel. Folding users claim increased folding performance which doesn't interfere with their GUI anymore. This can't be coincidence.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 20:55 UTC (Mon) by paragw (guest, #45306) [Link] (9 responses)

"I keep wondering why people seem to have completely forgotten about nice values and instead expect the scheduler to guess what are the important processes for them, when they can simply tell it."

Did that ever work satisfactorily in practice though? If it did why are people still cranking out different scheduler for desktops?

Thing is usability wise we have come further on a Linux desktop and I guess people are starting to expect the OS to do the right thing without them having to do work and make decisions. (About Xorg renice - what about its clients - every time I start a program, should I renice it if it is a Xorg client? If we instead had the desktop scheduler boost interactivity for all Xorg client programs - that makes it very easy for the user.)

And I was saying we can afford to do such silly things in the Desktop scheduler if the sole objective of the desktop scheduler was interactivity. If one scheduler was to do interactivity and throughput and what not - it quickly becomes complex and thus ineffective. If we had a pluggable scheduler for one thing we could simplify a lot of code and for another we can let people choose what fits their needs best.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 3:55 UTC (Tue) by fest3er (guest, #60379) [Link]

When I was going to do something that might get away from me (fork lots of
processes doing something), I would try to remember to 'nice --20 sh' in
another window. Because I would often have the boundary condition wrong
and would generate 20 000 to 40 000 processes running BTTW. That one
nice'd shell would save me almost every time. I've done this on my AT&T
UNIXPC, and systems running SysV/68, SysV/88, Irix, SunOS[345], Linux,
BeOS, BSD, and others.

There have been times in the past when nice'ing the X server improved
performance on my single-proc PIII-866; for 5-10 years now, only two or
more CPUs let X run smoothly.

There have been times in the past when nothing would smooth out the
choppiness of the EXT2/3 driver under heavy R/W load, whether I had two
PII-266's or a PIII-866. I solved that problem by switching to ReiserFS.

In recent years and on two completely different systems, I've noticed a
tendency for the kernel to do weird things with the PS/2 drivers (system
slows down, gets choppy, and even silently resets). This last time, I
pulled the plugs for the PS/2 ports and the system returned to normal.
(The chipset fan was overworking itself, so I had *some* clue where to
look.)

There can be many reasons why a system is 'choppy', and it's not always
the scheduler. Sometimes it's the interrupt handler dealing with some
device that's gone haywire. Sometimes it's the block layer not doing disk
I/O very nicely or a server process being very inefficient. Sometimes it's
an application that's gone braindead. And if a scheduler can be developed
that smooths out the choppiness in single- and dual-core systems, great!
Go for it! An older single-CPU system may never be fast, but it ought to
run smoothly under normal user operations.

The scheduler has gotten better over the past 15 years. And it will
continue to improve. But apps have to improve as well and not always
assume the 'system' will take care of everything.

As Ingo says, 8-core systems aren't mainline. But they will be. Perhaps
Con is looking to improve today's mainline systems, not tomorrow's. Is
this apples v. oranges? Or is it ain't? Mayhap never the twin shall meet.
But all parties involved should strive to keep the discourse civil and
positive.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 6:39 UTC (Tue) by mingo (guest, #31122) [Link] (7 responses)

Did that ever work satisfactorily in practice though?

Yes. (See my other post about nice levels in this discussion.) If it does not it's a bug and needs to be reported to lkml.

There's also the /proc/sys/kernel/sched_latency_ns control in the upstream scheduler - that is global and if you set that to a very low value like 1 msec:

    echo 1000000 > /proc/sys/kernel/sched_latency_ns
you'll get very fine-grained scheduling. This tunable has been upstream for 7-8 kernel releases already.

If it did why are people still cranking out different scheduler for desktops?

Primarily because it's fun to do. Also, in no small part because it's much easier to do than to fix an existing scheduler (with all its millions of current users and workloads) :-)

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 12:30 UTC (Tue) by i3839 (guest, #31386) [Link] (6 responses)

Weird, I don't see /proc/sys/kernel/sched_latency_ns. After reading
the code it's clear it depends on CONFIG_SCHED_DEBUG, any reason for
that? It has nothing to do with debugging and the code saved is minimal.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 12:37 UTC (Tue) by mingo (guest, #31122) [Link] (5 responses)

Please send a patch, i think we could make it generally available - and also the other granularity options i think. CONFIG_SCHED_DEBUG default to y and most distros enable it. (alongside CONFIG_LATENCYTOP)

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 9, 2009 8:42 UTC (Wed) by realnc (guest, #60393) [Link] (1 responses)

I've tried those tweaks. They don't really help much.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 10, 2009 9:53 UTC (Thu) by mingo (guest, #31122) [Link]

Thanks for testing it. It would be helpful (to keep reply latency low ;-) to move this to email and Cc: lkml.

You can test the latest upstream scheduler development tree via:

http://people.redhat.com/mingo/tip.git/README

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 9, 2009 11:50 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

I thought CONFIG_LATENCYTOP had horrible effects on the task_struct size and people were being encouraged to *disable* it as a result?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 10, 2009 9:56 UTC (Thu) by mingo (guest, #31122) [Link]

It shouldnt have too big cost unless you are really RAM constrained. (read running: a 32 MB system or so) So it's a nice tool if you want to see a general categorization of latency sources in your system.

latencytop is certainly useful enough so that several distributions enable it by default. It has size impact on task struct but otherwise the runtime cost should be near zero.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 10, 2009 19:35 UTC (Thu) by i3839 (guest, #31386) [Link]

I'll try to send a patch against tip later this week, not feeling too well at the moment.

pluggable schedulers vs. tunable schedulers

Posted Sep 8, 2009 9:51 UTC (Tue) by mingo (guest, #31122) [Link] (15 responses)

So what happened to pluggable schedulers?

In fact, wouldn't it be even cooler technically to have a scheduler that you could tune either for low-latency desktop workloads or for server-oriented throughput workloads? And this could all be done runtime, without rebooting the kernel.

Some easy runtime tunable parameter in /proc/sys/kernel/ that sets the expected preemption deadline of tasks. So on a server you could tune it to 100 msecs, on a desktop could tune it to 5 msecs - all with the same scheduler.

No reboots needed, only a single scheduler needs to be maintained, only a single scheduler needs bugfixes - and improvements to both workloads will flow into the same scheduler codebase so server improvements will indirectly improve the desktop scheduler and vice versa.

Sounds like a nice idea, doesn't it?

pluggable schedulers vs. tunable schedulers

Posted Sep 8, 2009 13:59 UTC (Tue) by paragw (guest, #45306) [Link] (14 responses)

No reboots needed, only a single scheduler needs to be maintained, only a single scheduler needs bugfixes - and improvements to both workloads will flow into the same scheduler codebase so server improvements will indirectly improve the desktop scheduler and vice versa. Sounds like a nice idea, doesn't it? Well no, I don't think so. My line of thinking was that making one scheduler balance the arbitrary needs of multiple workloads leads to complexity and suboptimal behavior. If we had a nice modular scheduler interface that allows us to load a scheduler at runtime or choose which scheduler to use at boot time or runtime that would solve the complexity problem and it will work well for the workloads it was designed for. As a bonus I will not have to make decisions on values of tunables - we can make the particular scheduler implementation make reasonable assumptions for the workload it was servicing. And if you ask me I will take 5 different code modules that each do one simple thing rather than taking 1 code module that tries to achieve 5 different things at once. After all, if we can have multiple IO schedulers why cannot we have multiple selectable CPU schedulers? Are there technical limitations or complexity issues that make us not want to go to pluggable schedulers?

pluggable schedulers vs. tunable schedulers

Posted Sep 8, 2009 14:01 UTC (Tue) by paragw (guest, #45306) [Link] (13 responses)

[ Gaah - Here is a better looking copy of above commment ]

No reboots needed, only a single scheduler needs to be maintained, only a single scheduler needs bugfixes - and improvements to both workloads will flow into the same scheduler codebase so server improvements will indirectly improve the desktop scheduler and vice versa. Sounds like a nice idea, doesn't it?

Well no, I don't think so. My line of thinking was that making one scheduler balance the arbitrary needs of multiple workloads leads to complexity and suboptimal behavior.

If we had a nice modular scheduler interface that allows us to load a scheduler at runtime or choose which scheduler to use at boot time or runtime that would solve the complexity problem and it will work well for the workloads it was designed for. As a bonus I will not have to make decisions on values of tunables - we can make the particular scheduler implementation make reasonable assumptions for the workload it was servicing.

And if you ask me I will take 5 different code modules that each do one simple thing rather than taking 1 code module that tries to achieve 5 different things at once.

After all, if we can have multiple IO schedulers why cannot we have multiple selectable CPU schedulers? Are there technical limitations or complexity issues that make us not want to go to pluggable schedulers?

pluggable schedulers vs. tunable schedulers

Posted Sep 9, 2009 16:31 UTC (Wed) by martinfick (subscriber, #4455) [Link] (12 responses)

If we had a nice modular scheduler interface that allows us to load a scheduler at runtime or choose which scheduler to use at boot time or runtime that would solve the complexity problem and it will work well for the workloads it was designed for. As a bonus I will not have to make decisions on values of tunables - we can make the particular scheduler implementation make reasonable assumptions for the workload it was servicing.

How does moving your tunable to boot time make it less of a tunable?

pluggable schedulers vs. tunable schedulers

Posted Sep 9, 2009 23:10 UTC (Wed) by paragw (guest, #45306) [Link] (11 responses)

How does moving your tunable to boot time make it less of a tunable?

Where did I say move the tunable to boot time? I said the particular modular scheduler can make reasonable assumptions that are best for the objective it is trying to meet - low latency for Xorg and its clients for example at the expense of something else (throughput) on the desktop systems.

pluggable schedulers vs. tunable schedulers

Posted Sep 10, 2009 9:50 UTC (Thu) by mingo (guest, #31122) [Link] (10 responses)

Note that what you propose is not what has been proposed on lkml under 'pluggable schedulers' before - that effort (PlugSched) was a build time / boot time scheduler selection approach.

Your model raises a whole category of new problems. For example under what model would you mix these pluggable schedulers on the same CPU? Add a scheduler of schedulers? Or can a CPU have only one pluggable scheduler defined at a time?

Also, how is this different from having per workload parameters in a single scheduler? (other than being inherently more complex to implement)

pluggable schedulers vs. tunable schedulers

Posted Sep 10, 2009 11:57 UTC (Thu) by paragw (guest, #45306) [Link]

[ Warning - long winded thoughtlets follow ]

About the plugsched - since it was a boot time selectable it could do what I was proposing just not at runtime (which is no big deal really). And I wasn't suggesting mixing schedulers per CPU. My thought was to have one CPU scheduler exactly as we have it today - either selectable at boot time or based on how much complex it would be to implement, at runtime.

If we talk about CFS as it is in mainline - I think its objective of being completely fair is a noble one on paper but does not work well on desktops with workloads that demand interactivity bias in favor of only a certain set of apps. Like many people have reported CFS causes movie skips and does worse than BFS for interactivity. I am not saying the problems with CFS are 100% due to it being completely fair by design but it is not hard to imagine it will try to be fair to all tasks and that in itself will not be enough for mplayer to keep running the movie without skips if there are enough processes and not enough CPUs. If it favored running mplayer it would not be completely fair unless we also started renicing the processes - which if you think of it, is fundamentally broken from usability standpoint unless it was made fully automatic which in turn is impossible without user involvement. (Desktop user is simply not going to renice every desktop process he works on and then one has to select what gets more interactivity bonus apart from Xorg - now the browser, later the mail client, etc. you get the idea. I explain more problems with nice a little further down.)

Now if we think about the CPU(s) as a finite resource - if people start running more tasks than there are CPUs it becomes clear that a bunch of tasks have to be scheduled less frequently and given less time slice than a bunch of other tasks if we are to maintain interactivity. (In Windows for example - one can set a scheduler switch that either favors foreground tasks (desktop workload) or background (server) tasks.)

So if we were to do something like build a scheduler with only goal of latency for interactive processes - we then would not have to worry about throughput in that scheduler. I.e. no conflicting goals, so less complexity and better results. Then one can think of a per process flag which Xorg and its clients can set that tells the desktop scheduler when the process window is foreground and interactive (when it is the topmost window or when a window needs user input) and the scheduler will ensure that it meets its goal of giving that process enough CPU resources to keep it running smoothly. This would solve the ugly problem of making the scheduler guess which process is interactive/needs user input or needs to be given interactivity boost so that the desktop feels responsive for the user. In my opinion making a scheduler with conflicting goals also making it guess processes to give interactivity boost simply does not work as the scheduler doesn't have enough data to know for sure what process needs the most interactivity at any given point of time - at least it is not straight forward to make that guess reliably every time, without any hint from the applications themselves.

Similarly for servers we could simplify CFS to make sure it remains completely fair and goes after throughput and latency comes second.

The benefit of having two schedulers is that of course users can choose one that does what they need - interactivity or fairness. So if someone complains my desktop is jerky when I run make -j128 kernel build, we can tell them to use the desktop scheduler and stop worrying about kernel build times if they are also going to play a movie at the same time. And for people needing fairness they can go with CFS and we can tell them to stop complaining about desktop jerkiness when running kernel builds as long as it is not anomalously jerky -i.e. not completely fair per goal.

We then also keep complexity in each scheduler to minimum without penalizing server workloads with interactivity logic and desktop workloads with fairness logic.

In short the point I am trying to make is that doing all things in one scheduler as we do it today, without any notion of what process needs user interaction or what process needs to be boosted in order to make the user feel the desktop is more interactive - it is never going to be a 100% success for all parties. (Correct me if I am wrong but I don't think we have any separate treatment for multimedia applications - they are just another process from the scheduler's PoV and it fails when there are also other 128 runnable processes that need to run on vastly less than 128 CPUs). Which means that the scheduler needs to be biased to the apps user cares most about - and nice does not work as long as it is a static, one time, user controlled thing. I don't want my browser to be nice -10 all the times - if it is minimized and not being used I want it to be nice +5 and instead have mplayer in the foreground nice'd to -5. Who decides what amount of nice in relation to other nice'd processes is sufficient so mplayer plays without skipping? We need something absolute there unlike nice - if a multimedia application is playing in the foreground - it gets all resources that it needs no matter what - that IMHO is the key to making the desktop users happy.

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 7:50 UTC (Sat) by trasz (guest, #45786) [Link] (8 responses)

Just do what Solaris does - schedulers are pieces of code that calculate thread priorities. This way you can assign different schedulers to different processes.

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 8:37 UTC (Sat) by mingo (guest, #31122) [Link] (7 responses)

That does not answer the fundamental questions though.

Who schedules the schedulers? What happens if multiple tasks are on the same CPU with different 'schedulers' attached to them? For example a Firefox process scheduled by BFS and Thunderbird scheduled by CFS. How would it behave on the same CPU for it to make sense?

Really, i wish people who are suggesting 'pluggable schedulers!!!' spent five minutes thinking through the technical issues involved. They are not trivial.

Programming the kernel isnt like LEGO where you can combine bricks physically and have a nice fire station in addition to your police car ;-)

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 8:46 UTC (Sat) by trasz (guest, #45786) [Link] (2 responses)

Let me repeat - in Solaris, schedulers are the parts of code that calculate priorities. They don't do other things - specifically, they don't switch threads. You don't have to schedule them in any way - just switch threads conforming to the priorities calculated by the schedulers.

And if you don't like this approach, you could still do what FreeBSD has been doing for several years now - implement schedulers changeable at compile time.

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 9:00 UTC (Sat) by mingo (guest, #31122) [Link]

Let me repeat - in Solaris, schedulers are the parts of code that calculate priorities. They don't do other things - specifically, they don't switch threads. You don't have to schedule them in any way - just switch threads conforming to the priorities calculated by the schedulers.

That's not pluggable schedulers. It's one scheduler with some flexibility in calculating priorities. The mainline Linux scheduler has something like that too btw: we have 'scheduling classes' attached to each process. See include/linux/sched.h::struct sched_class.

And if you don't like this approach, you could still do what FreeBSD has been doing for several years now - implement schedulers changeable at compile time.

It's not about me 'liking' anything. My point is that i've yet to see a workable model for pluggable schedulers. (I doubt that one can exist - but i have an open mind about it and i'm willing to be surprised.)

Compile-time is not a real pluggable scheduler concept: which would be multiple schedulers acting _at once_. See the example i cited: that you can set Firefox to BFS one and Thunderbird to CFS.

Compile-time (plus boot time) schedulers is what the PlugSched patches did for years.

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 12:24 UTC (Sat) by nix (subscriber, #2304) [Link]

But you still have to figure out which processes get their priorities
decided by which 'schedulers' (it is not very useful to jump into a Linux
discussion assuming that the terminology used is that of some other
kernel's development community, btw).

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 14:44 UTC (Sat) by paragw (guest, #45306) [Link] (3 responses)

I don't really understand it when you say "think through the technical issues involved [ in designing pluggable schedulers ] not being trivial" since you already mentioned PlugSched did just that prior to CFS.

It might be a terminology difference that is getting in the way - when I say "pluggable" I imply choice more than anything else. In other words it would be perfectly OK for the scheduler to be selectable only at compile and boot time and not at runtime just like PlugSched was.

We are advertising a completely fair scheduler that will do all things (ponies included ;) for everybody but no one has so far explained, HOW fundamentally, on the conceptual level, on the design level are we going to ensure that when resources get scarce (2CPU cores, 130 runnable processes - most CPU heavy jobs and one mplayer doing video and other doing audio encoding) we make sure we give enough, continuous CPU share to mplayer and the audio encoder and the whole desktop as such so it feels fluid to the user without the user having to play the nice games.

Making it even simpler, asking the same question differently - what logic in the current scheduler will hand out the most resources to mplayer, the audio encoding process and the desktop window manager (switching between windows needs to be fluid as well) when user is interacting with them? You can say the scheduler will be completely fair and give an equal chunk to every process but desktop users get pissed if that means mplayer is going to skip - not enough CPUs and lot of processes to run.

In other words - if I hand out $100 to a charity and ask them to be completely fair while distributing the amount to everyone equally and 200 people turn up for help - the charity did the fair thing and gave out 50c to everyone without considering the fact that 3 people out of the 200 badly needed at least 2$ so they could not only eat but also buy their pill and stay alive, that would be an unfair result at the end. So the charity has to have some notion of bias to the most needy and for that it needs to figure who are the most needy.

The point I am trying to make is we need to have a scheduler that is both completely fair (server workloads) and desktop friendly and these conflicting objectives can only be met by having 2 different user selectable schedulers. The desktop scheduler can get into the details of foreground and background Xorg and non-Xorg, multimedia vs. non-multimedia processes and fight hard to keep the desktop fluid without bothering about the background jobs taking longer or bothering about scaling to 1024 CPUs. The CFS scheduler can stay fair and moderately interactive and scalable as it is and server people can select it.

So again why do we not want to bring PlugSched back and have user select BFS or CFS or DS (Desktop Scheduler) (at compile or boot time)? If we do want CFS to do everything while being fair - I don't think we have explained on paper how it would ensure desktop interactivity without having a notion of what constitutes the desktop. We have to question the CFS goals/design/implementation if we are to go by the reports that after substantial development interactivity issues with CFS still remain. (Please don't say the nice word - I have explained already that it doesn't work well practically.) If it turns out that it is hard to meet conflicting goals well or if it turns out we need to add more complexity to CFS to meet those conflicting goals even in "most" workloads - it is still prudent to ask why not just have 2 different schedulers each with one, non-conflicting goal?

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 15:28 UTC (Sat) by mingo (guest, #31122) [Link] (1 responses)

What i believe you are missing relates to the very first question i asked: wouldnt it be better if a scheduler had nice runtime tunables that achieved the same?

Your original answer was (in part and way up in the discussion):

If we had a nice modular scheduler interface that allows us to load a scheduler at runtime or choose which scheduler to use at boot time or runtime that would solve the complexity problem and it will work well for the workloads it was designed for. As a bonus I will not have to make decisions on values of tunables - we can make the particular scheduler implementation make reasonable assumptions for the workload it was servicing.

What you are missing is that 'boot time' or 'build time' schedulers (i.e. what PlugSched did in essence) are build time / boot time tunables. A complex one but still a knob as far as the user is concerned.

Furthermore they are worse tunables than nice runtime tunables. They inconvenience the user and they inconvenince the distro. Flipping to another scheduler would force a reboot. Why do that?

For example, it does not allow the example i suggested: to run Firefox under BFS while Thunderbird under another scheduler.

So build-time/boot-time pluggable schedulers have various clear usage disadvantages and there are also have various things they cannot do.

So if you want tunability then i cannot understand why you are arguing for the technically worse solution - for a build time or boot time solution - versus a nice runtime solution.

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 18:48 UTC (Sat) by paragw (guest, #45306) [Link]

Surely one single tunable (I want the desktop scheduler for example in the case of PlugSched) is better (i.e. less complex) from user standpoint rather than having to figure out say 5 complex numerical things such as granularity and what not?

Or do we have one single tunable for CFS that converts it into desktop friendly? If it does have such a knob then the next and most important question is how well does it work for desktops. From the reports I think we are still some way from claiming excellent "automatic" interactivity for desktops. Note that I am excluding the nicing games and making the user do a complex dance of figuring out how to make his/her desktop interactive. I am sure you agree that does not work well.

To your point, if we have to have one tunable for the CFS scheduler to make it desktop friendly - essentially a single knob (like sched=desktop in the PlugSched case) it is easy to see how that would fail to work satisfactorily for all desktop workloads. For one thing unless the user messes with nice levels of each process that he/she opens, minimizes or closes or brings to foreground (that is out of question from usability standpoint) the scheduler has no way to distinguish the foreground process from a background one, it has no way of distinguishing mplayer from dekstop window manager from some system daemon going bad and eating CPU.

For another, the scheduler seems to have no reliable way to know what processes it needs to favor. Window manager and the process of the foreground window need to be registered with the scheduler as foreground processes, each minimized window needs to be registered with scheduler as background. Then as long as the window manager and the process owning the foreground window are not runnable everyone else gets CPU. Multimedia applications need to be registered with the scheduler as such - automatically, so that Mplayer always gets CPU when it needs it, even favoring it over the window manager and other process of another foreground window if there is only one available CPU. Until this co-ordination happens I think we will be away from achieving great desktop interactivity which works for most desktop workloads.

Then the question would be that do we want to put all this "only needed on desktop" complexity into the completely fair scheduler or do we want to keep both separate. That is sort of a secondary question - the first question is how do we get the desktop to hint the scheduler as to which processes the user is actively interacting with, which ones are the ones he/she is likely to interact with (minimized windows) and then the scheduler favoring those accordingly - that ought to solve the interactivity problems in an automatic fashion.

[ Windows has this notion of distinguishing between "Programs" (which are running desktop applications) and background services (things without desktop interaction) and in its default configuration on the desktop it favors "Programs" and on Servers it favors "Background services" (Web Server service for e.g.). And it certainly helps interactivity. It can do this because it can distinguish between what is a desktop application and which is foreground or background and what is a non-desktop, background application.]

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 18:31 UTC (Sat) by khc (guest, #45209) [Link]

I already have a compile time way to select scheduler:

patch -p1 < 2.6.31-sched-bfs-211.patch

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 4:07 UTC (Tue) by asdlfiui788b (guest, #58839) [Link] (1 responses)

Hi, wouldn't M:N threading fix this most of this?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 9:52 UTC (Tue) by mingo (guest, #31122) [Link]

Hm, what would M:N threading fix and how would it fix it?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 19:37 UTC (Tue) by maxbg (guest, #60713) [Link] (1 responses)

Hello, first post here :)

I would really like to find out the actual algorithm the BFS uses. Reading the patches gives me little information as I
am not a kernel hacker (yet :).
I know it uses a global runqueue for all CPUs ... and it does not measure sleep time. What are those deadlines?
How does it differ from the SD and RSDL?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 21:04 UTC (Tue) by ikm (guest, #493) [Link]

I think you can just try asking Con nicely.


Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds