BIO vs DIO

Posted Jun 6, 2024 15:10 UTC (Thu) by Paf (subscriber, #91811)
Parent article: Measuring and improving buffered I/O

I don’t think switching from write back to write through would have a huge impact, it would be great to see some numbers on this. In my experience the issues are around the mapping lock for getting pages in/out of one file - that’s really rough, a few GiB/s max - or the lru management for many files, which can be 10s of GiB/sec.

FWIW, the file system I work - Lustre - is moving towards a “hybrid” model where larger buffered IOs are redirected to do direct IO via an internal bounce buffer to solve the alignment requirement. Since there’s no cache, the required copy and allocation for that buffer can be multithreaded and the performance results are excellent - can hit 20 GiB/s from one user thread and scales when adding threads:
https://www.depts.ttu.edu/hpcc/events/LUG24/slides/Day2/L...

BIO vs DIO

Posted Jun 6, 2024 16:04 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

> I don’t think switching from write back to write through would have a huge impact, it would be great to see some numbers on this. In my experience the issues are around the mapping lock for getting pages in/out of one file - that’s really rough, a few GiB/s max - or the lru management for many files, which can be 10s of GiB/sec.

I don't know where I remember this from, but somebody did some tests on large file copies. By turning off the normal linux cache or something like that, I think the actual copy sped up by an order of magnitude. And system responsiveness during the copy didn't take anything like the usual hit, either.

Makes sense in a way - by disabling linux' habit of stashing everything in the cache, you're not thrashing the memory subsystem.

Cheers,
Wol

BIO vs DIO

Posted Jun 6, 2024 16:08 UTC (Thu) by Paf (subscriber, #91811) [Link]

Hmm, "turning off the normal Linux cache" is underspecified - some folks mean "forcing a form of direct IO", which is totally different than "using write through". Write through can just mean "we put it in the page cache and forced it out", which isn't expected to help that much, since the same basic locking behavior is maintained.

BIO vs DIO

Posted Jun 6, 2024 17:02 UTC (Thu) by joib (subscriber, #8541) [Link] (1 responses)

Does the "Hybrid I/O" mean that it does some small buffered I/O for the head and tail of the data, and then nicely aligned direct I/O for the middle part?

BIO vs DIO

Posted Jun 7, 2024 17:25 UTC (Fri) by Paf (subscriber, #91811) [Link]

No, unfortunately, because unalignment doesn't work like that. (That's what I hoped for when I started looking at it...)

Alignment means "byte N in this page of memory is byte N in a block on disk".

So let's say you want to do I/O from a 1 MiB malloc, and this 1 MiB buffer starts at 100 bytes into a page.

*Every* byte in the IO is 100 bytes off, and that means there's not a 1-to-1 mapping between pages in memory and page on disk. So you have to shift them *all*. Allocate a 1 MiB buffer and copy everything to it.

This is the same for the page cache, FWIW. It has to shift *everything*.

But since you can do the copying in parallel, it's still *really* fast.

BIO vs DIO

Posted Jun 7, 2024 8:13 UTC (Fri) by Homer512 (subscriber, #85295) [Link] (2 responses)

My day job is streaming 20+ Gbit/s to disk for hours. One of my applications has 8 U.2 SSDs (3.7 GB/s each) being fed from 2x100 GBit connections. Buffered IO absolutely does not work for this.

If we could have this happen automatically, it would help so much. For example we could go back to using standard file formats without having to reimplement them. Right now I'm using a custom TIFF file writer for one application simply because there is no way of getting libtiff to do direct IO. Same for things like HDF5.

Heck, I can't even use cp or rsync for backups at the moment since for example rsync does not go faster than 800 MB/s on that server.

BIO vs DIO

Posted Jun 7, 2024 9:57 UTC (Fri) by Wol (subscriber, #4433) [Link]

This sounds like you'd benefit massively from that half-remembered article of mine.

Exactly the use case - shifting a huge volume of data that is going to be written to disk and that's it. What you do NOT want is linux sticking it in the disk cache "just in case". And I think the speedup was (low) orders of magnitude.

I don't think it was even buffered i/o - if you can get linux to disable the disk cache on that machine, see what results you get with standard tiff, cp etc.

Cheers,
Wol

BIO vs DIO

Posted Jun 7, 2024 17:29 UTC (Fri) by Paf (subscriber, #91811) [Link]

Yeah, that sort of thing is exactly what this is for. For applications that can't or won't modify to use DIO, or where you don't know your IO pattern in advance. Libraries like HDF5 are exactly the sort of thing this is targeting, though honestly it's aimed pretty broadly.

Some of the marketing folks at a Lustre vendor put together something showing improvements in PyTorch checkpointing, for example.

It's something that would presumably be useful in other file systems too, but Lustre is out of tree, so it'll stay within Lustre for now. (It's GPLv2, just out of tree.)

If someone working in upstream wants to copy the idea, I won't object. (Would like credit, of course!)