I think the 'IOPS' in the above graphs are the result of benchmarking. So from 2 to 4 threads, the ext4 benchmark result goes up a little, but the number of IO's hitting the disks explodes tenfold.
Which means that ext4 starts to produce inefficient I/O patterns with multiple threads, while XFS is better at combining the I/O's.
Compare to the CPU load while running a disk benchmark; you want that to be as low as possible, compared to throughput.