There's several issues that I can see. The big one is that BTRFS metadata trees grow very large and as they grow larger they get slower because it takes more IO to get to any given piece of metadata in the filesystem. When you have a metadata tree that contains 150GB of metadata (what the 8-thread benchmarks I was running ended up with), finding things can take some time and burn a lot of IO and CPU.
This shows up with workloads like the directory traversal - btrfs has a lot more dependent reads to get the directory data from the tree than XFS or ext4 and so is significantly slower at such operations. Whether that can be fixed or not is an open question. Rebalancing (expensive) and larger btree block sizes (mkfs option) are probably ways of reducing the impact of this problem, but metadata tree growth can't actually be avoided.
Another problem is that as the BTRFS filesystem ages, it becomes fragmented due to all the COW that is done. sequential read IO performance will degrade over time as the data gets moved around more widely. Indeed, as the filesystem fills from the bottom up, the distance between where the file data was first written and where the next COW block is written will increase. Hence on spinning rust, seek times will also increase when reading as physical distance between sequential data also increases as the filesystem ages. automatic defrag is the usual way to fix this, but that can be expensive if it occurs at the wrong time...
Then there is the amount of IO that BTRFS does - for a COW filesystem that is supposed to be able to do sequential write IOs, it does an awful lot of small writes and a lot of seeks. Indeed, the limiting performance in all my testing was that BTRFS rapidly got IOPS bound at about 6000 IOPS - sometimes even on single threaded workloads. Part of that is the RAID1 metadata, but even when I turned that off it still drove the disk way harder than XFS and was IOPS bound more than half the time. I'm sure this is fixable to some extent, but I'd suggest there's lots of work to be done here because it ties into the transaction reservation subsystem and how it drives writeback.
[ As an aside, that was one of the big changes I talked about for XFS - making metadata writeback scale. In most cases for XFS, that is driven by the transaction reservation subsystem just like BTRFS does. It's not a simply problem to solve :/ ]
The last thing I'll mention briefly because I've already said some stuff about it is the scalability of the data transformation algorithms in BTRFS. There is already considerable effort going into reducing the overhead of tranformations, but the problem that may not be solvable for everyone - you can only make compression/CRCs/etc so fast and use only so much memory.
I could keep going, but this will give you an idea of some of the problems that are apparent from the scalability testing I was doing....