The linked paper is the best explanation I've seen for why largish buffers are needed at all.
One thing I wonder, though: would it make sense for TCP implementations to space out multiple-segment sends if those sends are the result of a congestion window increase or a received ack that does not increase the advertised receive window size?
The idea is that, if the receiver stopped reading for awhile (due to load, for example), then it would ack a segment and decrease its window. When it catches up, the window will increase and the sender should send as much data as will fit.
On the other hand, if the connection is starting up or if the receiver only acks full windows, there's no real benefit to sending the full cwnd all at once, since it's unlikely to reach the receiver any faster than spacing the data out. The latter would improve latency of competing flows.