The CFQ "low latency" mode

Posted Oct 12, 2009 7:56 UTC (Mon) by Yenya (subscriber, #52846)
In reply to: The CFQ "low latency" mode by giraffedata
Parent article: The CFQ "low latency" mode

No, you can increase also the _throughput_ (= the number of sectors handled in a given -large- period of time) by adding pauses shorter than the seek time. To be more specific - let's have two readers: A and B, each reading from its own part of the disk [A], and [B], respectively. For the sake of simplicity let's assume that two subsequent operations within the area [A] or within the area [B] do not require seek and are fast, while the read from the area [A] followed by the read from the area [B] requires seek, which is much slower. Then it is definitely better from the throughput point of view to issue the operations in the following order:

[A]-pause-[A]-seek-[B]-pause-[B]-seek-[A]-pause-[A]-...

than the "no-pause" variant of

[A]-seek-[B]-seek-[A]-seek-[B]-seek-...

It is not a bursty workload or a response-time-critical workload. It is an "unlimited supply of work" batch workload by my definition. And it has higher throughput with the pauses added than without them.

The CFQ "low latency" mode

Posted Oct 13, 2009 0:22 UTC (Tue) by giraffedata (guest, #1954) [Link] (1 responses)

OK, I'll buy that. Letting the disk sit idle can improve the throughput capacity for a limited workload like that (limited not because there are times when there is no work available but because there are only two streams and each apparently doesn't want to have more than one I/O at a time outstanding).

What I was thinking is that when people ask about disk throughput (capacity), it's usually on a system that drives the disk a lot harder than that -- i.e. the disk's basic capacity is in question. That means requesters throw large amounts of I/O at the disk and the speed is then determined by how quickly the disk can move the I/Os through. In the A-B scenario you describe, I would ask about the disk's response time, not its throughput, because it's the waiting for a response that governs the speed of this system.

The CFQ "low latency" mode

Posted Oct 13, 2009 18:06 UTC (Tue) by dlang (guest, #313) [Link]

it's not necessarily as different as you are making it out to be.

remember that seeks are _expensive_, you can transfer a LOT of data in the time of one seek that you can avoid doing.

so throughput optimizations like this can be relevant to the total disk response capabilities.

The CFQ "low latency" mode

Posted Oct 15, 2009 14:50 UTC (Thu) by guest (guest, #2027) [Link] (1 responses)

That only works if both readers issue requests without waiting for results. That's not how programms usually work - if they issue a read request, they wait for it to succeed before sending the next read.

Anyway, if you have such workloads and you do *not* pause, what happens? You perform the first seek to A, read, A, seek to B, read B and in the meantime, more requests for A have arrived. If it's only one, you still seem to be fast enough despite seeking - just seek back to A and go on. If the seek takes too long, multiple request should have been queued already and you can coalesque them and handle them with one seek.

IMHO, letting a disk stay idle when there's work to do is wrong!

The CFQ "low latency" mode

Posted Oct 15, 2009 17:57 UTC (Thu) by efexis (guest, #26355) [Link]

"letting a disk stay idle when there's work to do is wrong!"

Except that the disk is as good as idle while it's seeking... you can't read or write while it's happening.

I already know this to be true, I come across it on a server I part manage, which tries to schedule backups for several sites all at once. The disk thrashes, it grinds to a halt, and it takes forever to finish. So, I wrote a small bash script that when the load gets high, sends a STOP signal to all the backup processes, and then sends just one of them a CONT signal, so only that one process is running. Every few seconds it will STOP that task, and CONT a different one. The backups complete in a much shorter time, and system responsiveness is much better while it's happening. Why? Because the heads don't move away from the current reader's position as often, even though it does that same issue-read, wait, process, issue-read, wait, process etc... read pattern. So, with the amount of time the drive spends seeking reduced, the SAME drive is able to complete the SAME amount of work in LESS time with LESS effect on the rest of the system.

This is just a fact, it's real, it works, as much as it may sound counter-intuitive to you, the numbers don't lie.