|
|
Subscribe / Log in / New account

BIO vs DIO

BIO vs DIO

Posted Jun 7, 2024 8:13 UTC (Fri) by Homer512 (subscriber, #85295)
In reply to: BIO vs DIO by Paf
Parent article: Measuring and improving buffered I/O

My day job is streaming 20+ Gbit/s to disk for hours. One of my applications has 8 U.2 SSDs (3.7 GB/s each) being fed from 2x100 GBit connections. Buffered IO absolutely does not work for this.

If we could have this happen automatically, it would help so much. For example we could go back to using standard file formats without having to reimplement them. Right now I'm using a custom TIFF file writer for one application simply because there is no way of getting libtiff to do direct IO. Same for things like HDF5.

Heck, I can't even use cp or rsync for backups at the moment since for example rsync does not go faster than 800 MB/s on that server.


to post comments

BIO vs DIO

Posted Jun 7, 2024 9:57 UTC (Fri) by Wol (subscriber, #4433) [Link]

This sounds like you'd benefit massively from that half-remembered article of mine.

Exactly the use case - shifting a huge volume of data that is going to be written to disk and that's it. What you do NOT want is linux sticking it in the disk cache "just in case". And I think the speedup was (low) orders of magnitude.

I don't think it was even buffered i/o - if you can get linux to disable the disk cache on that machine, see what results you get with standard tiff, cp etc.

Cheers,
Wol

BIO vs DIO

Posted Jun 7, 2024 17:29 UTC (Fri) by Paf (subscriber, #91811) [Link]

Yeah, that sort of thing is exactly what this is for. For applications that can't or won't modify to use DIO, or where you don't know your IO pattern in advance. Libraries like HDF5 are exactly the sort of thing this is targeting, though honestly it's aimed pretty broadly.

Some of the marketing folks at a Lustre vendor put together something showing improvements in PyTorch checkpointing, for example.

It's something that would presumably be useful in other file systems too, but Lustre is out of tree, so it'll stay within Lustre for now. (It's GPLv2, just out of tree.)

If someone working in upstream wants to copy the idea, I won't object. (Would like credit, of course!)


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds