BIO vs DIO
BIO vs DIO
Posted Jun 6, 2024 15:10 UTC (Thu) by Paf (subscriber, #91811)Parent article: Measuring and improving buffered I/O
FWIW, the file system I work - Lustre - is moving towards a “hybrid” model where larger buffered IOs are redirected to do direct IO via an internal bounce buffer to solve the alignment requirement. Since there’s no cache, the required copy and allocation for that buffer can be multithreaded and the performance results are excellent - can hit 20 GiB/s from one user thread and scales when adding threads:
https://www.depts.ttu.edu/hpcc/events/LUG24/slides/Day2/L...
Posted Jun 6, 2024 16:04 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (1 responses)
I don't know where I remember this from, but somebody did some tests on large file copies. By turning off the normal linux cache or something like that, I think the actual copy sped up by an order of magnitude. And system responsiveness during the copy didn't take anything like the usual hit, either.
Makes sense in a way - by disabling linux' habit of stashing everything in the cache, you're not thrashing the memory subsystem.
Cheers,
Posted Jun 6, 2024 16:08 UTC (Thu)
by Paf (subscriber, #91811)
[Link]
Posted Jun 6, 2024 17:02 UTC (Thu)
by joib (subscriber, #8541)
[Link] (1 responses)
Posted Jun 7, 2024 17:25 UTC (Fri)
by Paf (subscriber, #91811)
[Link]
Alignment means "byte N in this page of memory is byte N in a block on disk".
So let's say you want to do I/O from a 1 MiB malloc, and this 1 MiB buffer starts at 100 bytes into a page.
*Every* byte in the IO is 100 bytes off, and that means there's not a 1-to-1 mapping between pages in memory and page on disk. So you have to shift them *all*. Allocate a 1 MiB buffer and copy everything to it.
This is the same for the page cache, FWIW. It has to shift *everything*.
But since you can do the copying in parallel, it's still *really* fast.
Posted Jun 7, 2024 8:13 UTC (Fri)
by Homer512 (subscriber, #85295)
[Link] (2 responses)
If we could have this happen automatically, it would help so much. For example we could go back to using standard file formats without having to reimplement them. Right now I'm using a custom TIFF file writer for one application simply because there is no way of getting libtiff to do direct IO. Same for things like HDF5.
Heck, I can't even use cp or rsync for backups at the moment since for example rsync does not go faster than 800 MB/s on that server.
Posted Jun 7, 2024 9:57 UTC (Fri)
by Wol (subscriber, #4433)
[Link]
Exactly the use case - shifting a huge volume of data that is going to be written to disk and that's it. What you do NOT want is linux sticking it in the disk cache "just in case". And I think the speedup was (low) orders of magnitude.
I don't think it was even buffered i/o - if you can get linux to disable the disk cache on that machine, see what results you get with standard tiff, cp etc.
Cheers,
Posted Jun 7, 2024 17:29 UTC (Fri)
by Paf (subscriber, #91811)
[Link]
Some of the marketing folks at a Lustre vendor put together something showing improvements in PyTorch checkpointing, for example.
It's something that would presumably be useful in other file systems too, but Lustre is out of tree, so it'll stay within Lustre for now. (It's GPLv2, just out of tree.)
If someone working in upstream wants to copy the idea, I won't object. (Would like credit, of course!)
BIO vs DIO
Wol
BIO vs DIO
BIO vs DIO
BIO vs DIO
BIO vs DIO
BIO vs DIO
Wol
BIO vs DIO