I agree that most of the time it really doesn't matter that the resource is working a little harder than it would need to be. But that is the only justification I can see for plugging.
the resource being busier can make it take more power.
the resource being busier could cause added latency for a new request.
there are probably other ways that the resource being busier can cost, even if it's not completely maxed out.
but overall I agree that these are probably not significant in almost all conditions.
I think the biggest problem is that large queues have not been handled sanely in the past, which has made "large queue" == "high latency" in many people's minds
what's needed is a large queue to gather possible work, but then smart management of that queue.
In the case of disk I/O that smart management has to do with combining work that's in the queue, but not together, prioritizing reads over rights (except for writes with sync dependencies), elevator reordering, etc.
If you have a raid array it can mean trying to schedule work so that different spindles can be working at the same time.
if you have a SSD or raid array, it can mean trying to do things in larger blocks (stripe size and alignment, eraseblock size and alignment)
In the case of network buffers, it has to do with prioritizing interactive, traffic management, and blocking packets ahead of bulk transfers, dropping packets that you aren't going to be able to get through before they are worthless (which is not just bulk transfers that will get there after a retry has already been sent, but also VoIP packets that have been deleyed too much)
As processors get faster compared to the I/O, it becomes possible to spend more effort in smart queue management while still keeping up with the I/O channel.