At the extreme high end of PCIe SSDs, a system trying to do lots of small (4k) reads with high parallelism will be limited by having any queue locking at all. Running without a request queue remains an attractive option for these devices.
Another future improvement to watch out for is MSI-X interrupts. With MSI-X, it is possible to statically assign an interrupt to a single CPU core in such a way that an I/O retirement could interrupt the originating CPU directly; over about 600K IOPS it becomes important to spread out the interrupt/retirement workload as much as possible.