Interview with Con Kolivas (APC)

Posted Jul 25, 2007 11:49 UTC (Wed) by jbouzane (guest, #43125)
In reply to: Interview with Con Kolivas (APC) by jwb
Parent article: Interview with Con Kolivas (APC)

I didn't believe what you said, so I set out to prove it myself. I was actually rather surprised at the results, though I still think you're exaggerating.

The labels on the left say which IO scheduler was used. The first column below is the time taken for oowriter by itself to start. The next column is oowriter startup time with dd running copying a 4 GB DVD image from /dev/sda to /dev/sdc (SATA). The third column is the same test, except with the dd process running at nice -n +19

Anticipatory     13s    154s   135s
CFQ              11s    200s    90s

Note that for all tests over 95 seconds or so, the dd was run in a loop because it completed in about that amount of time.

The machine is an Intel Core 2 Duo 2.4 GHz with 2 GB of RAM and 2 Seagate 400 GB hard drives on a 2.6.21.5 SMP kernel with preemption (including BKL) and RT mutexes. All tests were run after clearing the page cache using /proc/sys/vm/drop_caches

The same tests with xterm instead of OpenOffice give:

Anticipatory      4s     25s    22s
CFQ               5s     50s    50s

So therefore, I think your 5 minute claim is wrong. However, I do see 10x slowdowns, sometimes 20x with CFQ.

pluggable I/O schedulers, pluggable CPU schedulers

Posted Jul 25, 2007 13:00 UTC (Wed) by mingo (guest, #31122) [Link] (2 responses)

So therefore, I think your 5 minute claim is wrong. However, I do see 10x slowdowns, sometimes 20x with CFQ.

Yes - and note that such problems are one of the reasons why some of the Linux IO subsystem maintainers are today (partly) regretting that they exposed pluggable I/O schedulers to user-space.

The IO subsystem now has two reasonably good IO schedulers: AS and CFQ, but the maintainers would like to use CFQ exclusively. The problem is, they cannot do that anymore: some key apps are still running best on AS and there's not enough user pressure to move this issue. So it will be very hard to create one good I/O scheduler that will work well out of the box, without the user having to tweak anything. (and most users and admins dont tweak their systems.) I/O schedulers did help development and prototyping.

In the CPU scheduler space the same question and maintainance issue comes up but with a tenfold magnitude: unlike disks, CPUs are a fundamentally more "shared" and a fundamentally more "stateless" resource (despite caching) for which we have to offer robust multitasking. Disks store information in a much more persistent way, workloads are bound to particular disks in a much more persistent way (than tasks are bound to CPUs) and disks are also a lot less parallel due to the fundamental physical limitations of rotating platters.

The default CPU scheduler has to be good enough for all purposes and we dont want to splinter our technology into too many "this app works best with that scheduler on that hardware and with that kernel config" niches.

So the technological case for pluggable CPU schedulers was never truly strong, and it's even weaker today now that we've got direct experience with pluggable I/O schedulers.

[Sidenote: i think Con made a small mistake in equating CFS's modularization to PlugSched: the modularization within CFS is of a fundamentally different type: it modularizes the scheduling policies, which are already distinct part of the ABI. (SCHED_FIFO, SCHED_RR, SCHED_OTHER, SCHED_BATCH, SCHED_IDLE) This was a nice internal cleanup to the scheduler. PlugSched never did that, it was always submitted as an additional complication that allows build and boot time switching to a completely different CPU scheduler, not as a cleanup to the already pretty complex scheduler code. I have recently suggested to the current PlugSched maintainer (Peter Williams) to rework PlugSched along similar lines - that will result in a lot cleaner approach. ]

pluggable I/O schedulers, pluggable CPU schedulers

Posted Jul 25, 2007 18:37 UTC (Wed) by flewellyn (subscriber, #5047) [Link] (1 responses)

This may be a foolish question, but never let it be said that I'm not willing to ask those.

Would it be possible to merge the I/O schedulers into one, and then expose the different behaviors as config knobs? Y'know, with "AS" and "CFQ" as particular groups of settings?

Or is that just as bad if not worse?

pluggable I/O schedulers, pluggable CPU schedulers

Posted Jul 25, 2007 19:55 UTC (Wed) by mingo (guest, #31122) [Link]

Would it be possible to merge the I/O schedulers into one, and then expose the different behaviors as config knobs? Y'know, with "AS" and "CFQ" as particular groups of settings? Or is that just as bad if not worse?

You suggestion makes sense and i think it would likely result in fundamentally better code that gives us one codebase, but it still doesnt give us an IO scheduler that does the right thing no matter what we throw at it (the user would still have to turn that knob). So in that sense it would be little change from the current state of affairs.

The problem is not primarily the kernel-internal code duplication - we can handle such things pretty well, the kernel's nearly 8 million lines of code now and growing fast. We've got 3 SLAB implementations and that's not a problem because it was never made user (and application) visible.

The problem is the externally visible imperfection of the kernel's behavior, and when end-users learn to depend on it. If we try to remove such a knob we promised to users at a stage, those who are affected negatively complain (and rightfully so).

It should also be seen in perspective: these issues are not the end of the world, our I/O schedulers are pretty damn good already, and modularity is not an issue at all compared to some other kernel problems we are facing - but if code is submitted to an otherwise well-working subsystem these little factors are what make or break a particular patch's upstream acceptance.

Interview with Con Kolivas (APC)

Posted Jul 25, 2007 16:22 UTC (Wed) by malor (guest, #2973) [Link] (1 responses)

I noticed a similar problem with OSX on the Mac Pro; when running a dd, the system slows to a dead stop. dd consumes all available I/O and the system essentially stops responding until it's done. And while I don't have dd on this Windows box, I've noticed that the system gets very, very slow when VMWare is creating an image.

They're different systems, but they're both using Intel chipsets, and because of that, I'm wondering if it might be something about Intel's SATA controllers. My Athlons never did this; they maintained much better responsiveness under load.

I also found that the Mac stayed useful if I didn't give dd a blocksize. Up to some amount, which I think was 256 but don't remember absolutely for sure, it maintained decent performance. It started slowing at X+1 (I think 257), and got worse linearly up until 512. It didn't get any worse after that, but it's hard to get any worse than total system lockout.

I haven't tested blocksizes under Windows, but between seeing both OSX and Windows do the same no-response-under-heavy-IO thing, and hearing your story, I'm wondering if Intel is doing something dumb with I/O.

If you're not on Intel, of course, that blows that idea out of the water. :)

Interview with Con Kolivas (APC)

Posted Jul 26, 2007 11:58 UTC (Thu) by rwmj (subscriber, #5474) [Link]

Well, I get similar problems on my Athlon machines :-(

I suspect that the problem may lie with SATA itself. It certainly
feels much worse than IDE did, but again just subjectively - I
haven't got any hard figures.

Rich.