LWN.net Logo

What happened to disk performance in 2.6.39

What happened to disk performance in 2.6.39

Posted Feb 2, 2012 20:45 UTC (Thu) by dlang (✭ supporter ✭, #313)
In reply to: What happened to disk performance in 2.6.39 by alankila
Parent article: What happened to disk performance in 2.6.39

the problem is the queue size

if the queue size is not large enough, then you can't fit enough requests into the queue to have them available to combine later.

If the queue size is too large, then a new process making a request will not get it's request serviced until everything ahead of it in the queue gets processed (unless you have some fairness process to not put the new processes request at the end of the queue)

I don't like the concept of plugging, but it seems to be a hack that tends to work.

as an example.

In rsyslog, when the ability to process multiple messages from the queue at once was added (so that multiple messages could be inserted to a database in a single transaction for example), we discussed delaying pulling the first message from the queue to give the queue a chance to build up several messages that would then be handled more efficiently (in one pass), but we decided to not do this because the process ended up being self-regulating.

If the messages arrived slowly enough, they are handled one at a time.

If the messages arrive faster than this, some messages queue up while the prior messages are handled and then the backlog gets processed at one time (up to a limit)

This is very good for latency, but the trade-off is that the output is doing far more work than it would need to do if the work was batched more. As the load builds up, it will ramp up the utilisation of the output in the most inefficient mode (one message at a time), and then when it saturates the output, it will become more efficient to process more messages while keeping the output at max utilisation.

Networks have the same type of problem (the too large buffer situation is what's called bufferbloat), the answer there seems to be to put in a more complex queuing engine (SFQ seems to be the winner right now) that priorities packets from new or sparse connections ahead of heavy connections.

I wonder if a similar approach could work for disk I/O? If this would allow for significantly larger queue sizes without the latency problems that usually come with large queues, it may give almost the same long-term effect of plugging, without the problems that plugging introduces.


(Log in to post comments)

What happened to disk performance in 2.6.39

Posted Feb 3, 2012 4:54 UTC (Fri) by raven667 (subscriber, #5198) [Link]

> Networks have the same type of problem

It would be interesting to see more sharing of notes between network and disk IO systems because some of the problems they solve are broadly similar. IO throughput and contention behaviors are a matter of science and I'm sure share a lot of math.

What happened to disk performance in 2.6.39

Posted Feb 20, 2012 23:35 UTC (Mon) by jmm82 (guest, #59425) [Link]

Networking does have this same concept build into tcp called Nagle's algorithm.

What happened to disk performance in 2.6.39

Posted Feb 3, 2012 13:05 UTC (Fri) by alankila (subscriber, #47141) [Link]

Plugging or not, I'm pretty sure there are still queues involved just the same. Reading the other links in this article, it seems that plugging goes away as soon as the system determines that it has any work in its internal queues to do, therefore it's strictly a "first request" optimization. In any case, it doesn't seem to improve throughput (because it gets disabled) and worsens latency (because it delays first request service time), so it sounds useless to me in every case.

Disk schedulers already use their own variant of fair queueing, afaik CFQ gives all processes their chance to do some disk transaction when it comes their turn, in this being fairly similar to SFQ which arranges outbound network traffic into number of pre-existing queues, submitting the head element of each queue in turn (if any), giving all flows a fairly equal chance to progress.

What happened to disk performance in 2.6.39

Posted Feb 4, 2012 21:04 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

I've always been a fierce opponent of queue plugging. I'm not saying there's no case where it's good, but everywhere I see it, it's based on the misconception that capacity matters when you're not using it all. I'm talking about the principle that a 10,000 liter tank is no better than a 5,000 liter tank for an application that never stores more than 2,000 liters.

Sending small scattered I/Os to a disk drive is not a problem as long as the drive is keeping up with it, and if the drive isn't, then your queue is building up anyway, without a plug.

I've seen plugging used to overcome a defect in the thing serving the queue wherein it improperly speed-matches. I think this is what's going on with the network "buffer bloat" issue. I saw it more simply in a disk storage server that thought it was doing its client a favor by accepting I/Os as fast as the client could send them and sticking them in a buffer, then passing them one by one, FIFO, to the disk arms. The server was essentially lying and saying it had capacity when it was really overloaded.

This was fixed with queue plugging in the client, but later fixed better just by making the client send ahead enough work to overwhelm the server's buffer and make it admit that it couldn't keep up.

dlang, in your defense of an application of queue plugging:

the trade-off is that the output is doing far more work than it would need to do if the work was batched more
you omit an important factor: what is wrong with the output doing more work than it otherwise would? In many cases, that doesn't make any difference.

What happened to disk performance in 2.6.39

Posted Feb 6, 2012 2:48 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

I agree that most of the time it really doesn't matter that the resource is working a little harder than it would need to be. But that is the only justification I can see for plugging.

the resource being busier can make it take more power.

the resource being busier could cause added latency for a new request.

there are probably other ways that the resource being busier can cost, even if it's not completely maxed out.

but overall I agree that these are probably not significant in almost all conditions.

I think the biggest problem is that large queues have not been handled sanely in the past, which has made "large queue" == "high latency" in many people's minds

what's needed is a large queue to gather possible work, but then smart management of that queue.

In the case of disk I/O that smart management has to do with combining work that's in the queue, but not together, prioritizing reads over rights (except for writes with sync dependencies), elevator reordering, etc.

If you have a raid array it can mean trying to schedule work so that different spindles can be working at the same time.

if you have a SSD or raid array, it can mean trying to do things in larger blocks (stripe size and alignment, eraseblock size and alignment)

In the case of network buffers, it has to do with prioritizing interactive, traffic management, and blocking packets ahead of bulk transfers, dropping packets that you aren't going to be able to get through before they are worthless (which is not just bulk transfers that will get there after a retry has already been sent, but also VoIP packets that have been deleyed too much)

As processors get faster compared to the I/O, it becomes possible to spend more effort in smart queue management while still keeping up with the I/O channel.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds