There's a better solution than falling back to buffered io

Posted Nov 12, 2025 17:41 UTC (Wed) by koverstreet (✭ supporter ✭, #4296)
Parent article: The intersection of unstable pages and direct I/O

you just _bounce_ - drastically cheaper, especially considering that if you're checksumming you're touching the buffer anyways

There's a better solution than falling back to buffered io

Posted Nov 12, 2025 18:59 UTC (Wed) by nickodell (subscriber, #125165) [Link] (4 responses)

Could you clarify why this is better than going through the page cache?

Hellwig's initial patch says

>This series tries to address this by falling back to uncached buffered I/O. Given that this requires an extra copy it is usually going to be a slow down, especially for very high bandwith use cases, so I'm not exactly happy about.

I assume the bounce buffer also requires an additional copy, so what makes it faster than the approach here?

There's a better solution than falling back to buffered io

Posted Nov 12, 2025 19:26 UTC (Wed) by koverstreet (✭ supporter ✭, #4296) [Link] (1 responses)

Buffered IO is only fast when IO is mostly staying in cache. When it's not it's _significantly_ slower; you incur all the overhead of walking and managing the page cache radix tree, backround eviction, background writeback - doing all that stuff asynchronously is significantly more constantly when you're not amortizing it by letting things stay in cache.

O_DIRECT tends to be used where the application knows the pagecache is not going to be useful - ignoring what the application communicated and using buffered IO is a massive behavioral change and just not a good idea.

If you just bounce IOs, the only extra overhead you're paying for is allocating/freeing bounce buffers (generally quite fast, thanks to percpu freelists), and the memcpy - which, as I mentioned, is noise when we're already touching the data to checksum.

And you only have to pay for that on writes. Reads can be normal zero copy O_DIRECT reads: if you get a checksum error, and the buffer is mapped to userspace (i.e. might have been scribbled over), you retry it with a bounce buffer before treating it like a "real" checksum error.

(This is all what bcachefs has done for years).

There's a better solution than falling back to buffered io

Posted Nov 12, 2025 21:05 UTC (Wed) by Wol (subscriber, #4433) [Link]

Wasn't that benchmarked a few years back? Somebody disabled caching on file copies, and especially with larger files the uncached copy was maybe ten times faster?

Cheers,
Wol

There's a better solution than falling back to buffered io

Posted Nov 12, 2025 19:28 UTC (Wed) by Paf (subscriber, #91811) [Link] (1 responses)

At least for small pages, much of the cost of using the page cache is actually in setup, insertion, and removal from the cache, *not* in memory allocation or data copying. Depending on the use case and hardware, simple bounce buffers can be *much* faster. And they can be done in parallel across many threads without conflicting on the tree locking for the page cache. (Even a read is a tree insert unless the data is already present.)

But this becomes proportionally less true with large folios though, probably to the point of not really true for larger sizes, since the overhead is spread over much more data.

There's a better solution than falling back to buffered io

Posted Nov 12, 2025 19:36 UTC (Wed) by koverstreet (✭ supporter ✭, #4296) [Link]

IOPs keep going up, though. Forcing everything through the buffered IO paths is just crippling.

There's a better solution than falling back to buffered io

Posted Nov 12, 2025 19:35 UTC (Wed) by quwenruo_suse (subscriber, #124148) [Link] (1 responses)

I doubt. As I tried both, bouncing and falling back to buffered IO on btrfs, no observable difference.

The huge performance drop in my previous observations is caused by unoptimized checksum implementation (kvm64 has no hardware accelerated CRC32C).

There's a better solution than falling back to buffered io

Posted Nov 12, 2025 19:57 UTC (Wed) by koverstreet (✭ supporter ✭, #4296) [Link]

and you thought that test would be representative...?