Most of the metadata reads come from tracking the extents. All the backrefs we store for each extent get expensive in these workloads. We're definitely reducing this, starting with just using bigger btree blocks. Longer term if that doesn't resolve things we'll make a optimized extent record just for the metadata blocks.
I only partially agree on the crcs. The intel crc32c optimizations do make it possible for a reasonably large server to scale to really fast storage. But the part where we hand IO off to threads introduces enough latencies to notice in some benchmarks on fast SSDs.
Also, since we have to store the crc for each 4KB block, we do end up tracking much more metadata on the file with crcs on (this is a much bigger factor than the computation time).
With all of that said, there's no reason Btrfs with crcs off can't be as fast as XFS for huge files on huge arrays. Today though, xfs has decades of practice and infrastructure in those workloads.