The CFQ "low latency" mode
The new mode (initially called "desktop" before being renamed "low_latency") is enabled by default; it can be adjusted by setting the iosched/low_latency attribute associated with each block device in sysfs. When set, some of the delays for "synchronous operations" (reads, generally) no longer happen. The result should be more responsive I/O and, one would hope, happier users.
Note: please see the comments for a description of this change which is more, um, accurate. Your editor blames the Death Flu that his kids brought home.
Posted Oct 8, 2009 7:11 UTC (Thu)
by axboe (subscriber, #904)
[Link] (1 responses)
The low_latency knob doesn't impact delays or merging, one of the key aspects to getting low latency for a series of operations (like starting your firefox while other IO is happening) is actually making sure we get the delays right. If we take the classic case of reader vs writer, the writer dirty speed will greatly outnumber the writeback speed. So we always have tons of dirty pages waiting to be written. The normal reader, however, does dependent reads that are serialized by each other. When one read finishes, another will be issued by the reader very shortly. Achieving good throughput and latency for the reader in CFQ is accomplished by briefly waiting for another IO when one has completed. In CFQ, this is called idling.
The two primary changes in behaviour for CFQ in -rc3 is letting seeky IO also idle, even if the hardware does command queuing (which most does these days) and limiting the damage that the async IO can do while sync IO is also happening. With the 'low_latency' knob switched to on, CFQ will only slowly build up a queue depth of async writes. This greatly helps reduce the impact that a writer will have on the system interactiveness, since when we miss a sync idle window only slightly, the amount of async writeback sent to the device will be limited by the time since that last sync IO.
The end result is that the desktop experience should be less impacted by background IO activity. It's also worth mentioning that the 'low_latency' setting defaults to on.
Posted Oct 8, 2009 12:44 UTC (Thu)
by Yenya (subscriber, #52846)
[Link]
I had a pretty bad experience with CFQ on my FTP server (ftp.linux.cz, SW RAID-5 over 8x 1TB SATA drives) - the resync of the array with CFQ took about 3 days with the overall system responsiveness being pretty bad, while with the deadline iosched (which is what I am running now) it takes less than a day, and even then the system latency for things like typing commands to a ssh session is good (read: no noticeable change against the fully reconstructed array).
-Yenya
Posted Oct 8, 2009 23:20 UTC (Thu)
by giraffedata (guest, #1954)
[Link] (7 responses)
No, that's not clear at all. Minimizing disk seeks and maximizing I/O request size is clearly good for disk efficiency -- minimizing disk utilization for a given workload -- but for throughput to be meaningful, utilization has to be about 100%. When that's the case, I/O backs up into the Linux I/O queue so that no extra delays are necessary in order to join requests with other requests.
You just can't improve throughput by deliberately letting the disk sit idle.
Posted Oct 9, 2009 7:12 UTC (Fri)
by Yenya (subscriber, #52846)
[Link] (6 responses)
In fact, you can. Think avoiding some seeks and issuing sequential operations with a shorter-than-a-seek-time delays in between.
Posted Oct 9, 2009 15:48 UTC (Fri)
by giraffedata (guest, #1954)
[Link] (5 responses)
You'll have to be more specific.
It sounds like you're talking about a strategy for improving response time for a bursty workload, whereas throughput is meaningful only for a non-bursty unlimited supply of work.
Posted Oct 12, 2009 7:56 UTC (Mon)
by Yenya (subscriber, #52846)
[Link] (4 responses)
[A]-pause-[A]-seek-[B]-pause-[B]-seek-[A]-pause-[A]-...
than the "no-pause" variant of
[A]-seek-[B]-seek-[A]-seek-[B]-seek-...
It is not a bursty workload or a response-time-critical workload. It is an "unlimited supply of work" batch workload by my definition. And it has higher throughput with the pauses added than without them.
Posted Oct 13, 2009 0:22 UTC (Tue)
by giraffedata (guest, #1954)
[Link] (1 responses)
What I was thinking is that when people ask about disk throughput (capacity), it's usually on a system that drives the disk a lot harder than that -- i.e. the disk's basic capacity is in question. That means
requesters throw large amounts of I/O at the disk and the speed is then determined by how quickly the disk can move the I/Os through. In the A-B scenario you describe, I would ask about the disk's response time, not its throughput, because it's the waiting for a response that governs the speed of this system.
Posted Oct 13, 2009 18:06 UTC (Tue)
by dlang (guest, #313)
[Link]
remember that seeks are _expensive_, you can transfer a LOT of data in the time of one seek that you can avoid doing.
so throughput optimizations like this can be relevant to the total disk response capabilities.
Posted Oct 15, 2009 14:50 UTC (Thu)
by guest (guest, #2027)
[Link] (1 responses)
Anyway, if you have such workloads and you do *not* pause, what happens? You perform the first seek to A, read, A, seek to B, read B and in the meantime, more requests for A have arrived. If it's only one, you still seem to be fast enough despite seeking - just seek back to A and go on. If the seek takes too long, multiple request should have been queued already and you can coalesque them and handle them with one seek.
IMHO, letting a disk stay idle when there's work to do is wrong!
Posted Oct 15, 2009 17:57 UTC (Thu)
by efexis (guest, #26355)
[Link]
What the 'low_latency' knob does
What the 'low_latency' knob does
The CFQ "low latency" mode
Normally the scheduler will try to delay many new I/O requests for a short time in the hope that they can be joined with other requests which may come shortly thereafter. This behavior will minimize disk seeks and maximize I/O request size, so it is clearly good for throughput.
The CFQ "low latency" mode
The CFQ "low latency" mode
You just can't improve throughput by deliberately letting the disk sit idle.
In fact, you can. Think avoiding some seeks and issuing sequential operations with a shorter-than-a-seek-time delays in between.
The CFQ "low latency" mode
OK, I'll buy that. Letting the disk sit idle can improve the throughput capacity for a limited workload like that (limited not because there are times when there is no work available but because there are only two streams and each apparently doesn't want to have more than one I/O at a time outstanding).
The CFQ "low latency" mode
The CFQ "low latency" mode
The CFQ "low latency" mode
"letting a disk stay idle when there's work to do is wrong!"
The CFQ "low latency" mode
Except that the disk is as good as idle while it's seeking... you can't read or write while it's happening.
I already know this to be true, I come across it on a server I part manage, which tries to schedule backups for several sites all at once. The disk thrashes, it grinds to a halt, and it takes forever to finish. So, I wrote a small bash script that when the load gets high, sends a STOP signal to all the backup processes, and then sends just one of them a CONT signal, so only that one process is running. Every few seconds it will STOP that task, and CONT a different one. The backups complete in a much shorter time, and system responsiveness is much better while it's happening. Why? Because the heads don't move away from the current reader's position as often, even though it does that same issue-read, wait, process, issue-read, wait, process etc... read pattern. So, with the amount of time the drive spends seeking reduced, the SAME drive is able to complete the SAME amount of work in LESS time with LESS effect on the rest of the system.
This is just a fact, it's real, it works, as much as it may sound counter-intuitive to you, the numbers don't lie.