You don't normally have to wait for the next request. You may, however, have to wait for a group of requests to complete before issuing some of the requests after that.
Explicit dependencies would be the most flexible of course, but the problem is that arbitrary dependencies are difficult for a block layer to process. A barrier on the other hand means always execute requests (within some context) issued before the barrier before the requests issued after the barrier (within the same context).
That can be inefficient if an (upper layer) barrier is at the level of an entire block device, because the block layer cannot re-order and merge barrier independent requests across the barrier. One way to solve that problem is with I/O thread identifiers and I/O thread specific barriers. That is not quite as flexible as explicit request level dependencies, but it is much easier to implement, and solves the same problem.
The method described in this article is to require the filesystems to implement their own concept of I/O threads without passing I/O thread identity information down to the block layer, and waiting on request completion for the requests from that upper layer I/O thread and write cache flushes instead. All modern journalled filesystems have at least one identifiable I/O thread - that associated with serialization of the journal commits. In this case it is just implicit instead of explicit, and the lower layer doesn't know anything about it.