the problem is that the system has no way of knowing when you submit all this I/O if you mean
do all of these, and minimize the overall time
I need these to all make progress at the same time, even if it means taking longer overall.
current algorithms tend to assume the second, they try to split the available I/O bandwidth between all the requests, since this ends up resulting in lots of seeks, this hurts on traditional media with massive parallel requests
a small amount of parallelism helps by giving the drive something to do when it would otherwise be idle, however once you pass the saturation point it hurts because it adds additional seeks as the system jumps from one set of requests to the next.
this is the same sort of thing that makes hyperthreading be anywhere from a noticable benifit to a mild loss depending on the workload