User: Password:
Subscribe / Log in / New account

What happened to disk performance in 2.6.39

What happened to disk performance in 2.6.39

Posted Feb 4, 2012 21:04 UTC (Sat) by giraffedata (subscriber, #1954)
In reply to: What happened to disk performance in 2.6.39 by dlang
Parent article: What happened to disk performance in 2.6.39

I've always been a fierce opponent of queue plugging. I'm not saying there's no case where it's good, but everywhere I see it, it's based on the misconception that capacity matters when you're not using it all. I'm talking about the principle that a 10,000 liter tank is no better than a 5,000 liter tank for an application that never stores more than 2,000 liters.

Sending small scattered I/Os to a disk drive is not a problem as long as the drive is keeping up with it, and if the drive isn't, then your queue is building up anyway, without a plug.

I've seen plugging used to overcome a defect in the thing serving the queue wherein it improperly speed-matches. I think this is what's going on with the network "buffer bloat" issue. I saw it more simply in a disk storage server that thought it was doing its client a favor by accepting I/Os as fast as the client could send them and sticking them in a buffer, then passing them one by one, FIFO, to the disk arms. The server was essentially lying and saying it had capacity when it was really overloaded.

This was fixed with queue plugging in the client, but later fixed better just by making the client send ahead enough work to overwhelm the server's buffer and make it admit that it couldn't keep up.

dlang, in your defense of an application of queue plugging:

the trade-off is that the output is doing far more work than it would need to do if the work was batched more
you omit an important factor: what is wrong with the output doing more work than it otherwise would? In many cases, that doesn't make any difference.

(Log in to post comments)

What happened to disk performance in 2.6.39

Posted Feb 6, 2012 2:48 UTC (Mon) by dlang (subscriber, #313) [Link]

I agree that most of the time it really doesn't matter that the resource is working a little harder than it would need to be. But that is the only justification I can see for plugging.

the resource being busier can make it take more power.

the resource being busier could cause added latency for a new request.

there are probably other ways that the resource being busier can cost, even if it's not completely maxed out.

but overall I agree that these are probably not significant in almost all conditions.

I think the biggest problem is that large queues have not been handled sanely in the past, which has made "large queue" == "high latency" in many people's minds

what's needed is a large queue to gather possible work, but then smart management of that queue.

In the case of disk I/O that smart management has to do with combining work that's in the queue, but not together, prioritizing reads over rights (except for writes with sync dependencies), elevator reordering, etc.

If you have a raid array it can mean trying to schedule work so that different spindles can be working at the same time.

if you have a SSD or raid array, it can mean trying to do things in larger blocks (stripe size and alignment, eraseblock size and alignment)

In the case of network buffers, it has to do with prioritizing interactive, traffic management, and blocking packets ahead of bulk transfers, dropping packets that you aren't going to be able to get through before they are worthless (which is not just bulk transfers that will get there after a retry has already been sent, but also VoIP packets that have been deleyed too much)

As processors get faster compared to the I/O, it becomes possible to spend more effort in smart queue management while still keeping up with the I/O channel.

Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds