Fixing asynchronous I/O, again

Posted Jan 15, 2016 12:02 UTC (Fri) by andresfreund (subscriber, #69562)
In reply to: Fixing asynchronous I/O, again by pbonzini
Parent article: Fixing asynchronous I/O, again

> Seriously: the number of such writebacks you can do per second is slow enough that you probably won't get much benefit from using kernel threads and from batching submissions.

I rather doubt that. I mean with a decent pcie attached enterprise ssd you can do a *lot* of flushes/sec. But to actually utilize the hardware, you always need several writes to be in progress in parallel. While you probably need several submission threads (best one per actual core) for full utilization, using a thread pool large enough to have the required number of writes in progress at the same time, introduces too much context switching.

At the moment you can't even really utilize the actual potential of "prosumer" SSDs for random write workloads. Sequential IO is fine because it's quickly bottlenecked by the bus anyway. But if you are e.g. a RDBMS (my corner), and you want to efficiently flush victim pages from an in-memory buffer back to disk, you'll quickly end up being bottlenecked on latency.

Obviously this is only really interesting for rather IO intensive workloads.

> I would like to see numbers

Fair enough.

> # of ops per second on *real-world* usecases

I can only speak from the PostgreSQL corner here. But 50-100k 8192byte diry blocks written back/sec is easily achievable. At that point, in my testing, we're bottlenecked at sync_file_range(SYNC_FILE_RANGE_WRITE) latency because it starts blocking quite soon (note we're doing a separate fsync for actual durability later, the s_f_r is just to keep the amount of work done by fsync bounded).

> CPU utilization for kernel workqueue vs. userspace threadpool, etc.) before committing to a large change such as asynchronous system calls.

To some degree that does require a decent kernelspace implementation in a usable state for comparison.

Fixing asynchronous I/O, again

Posted Jan 15, 2016 12:08 UTC (Fri) by andresfreund (subscriber, #69562) [Link]

https://lkml.org/lkml/2015/10/28/878 has some interesting numbers. Particularly the number fsyncs & journal writes in the synchronous vs. the asynchronous case are kinda impressive.

Fixing asynchronous I/O, again

Posted Jan 21, 2016 18:22 UTC (Thu) by Wol (subscriber, #4433) [Link]

> But if you are e.g. a RDBMS (my corner), and you want to efficiently flush victim pages from an in-memory buffer back to disk, you'll quickly end up being bottlenecked on latency.

My reaction entirely. For a database server, it's all very well saying "it won't make much of an improvement overall", but if it's applicable to 90% of the workload of a dedicated server, then it's going to make one heck of a difference to that server.

And if those dedicated servers are a class where they are typically under heavy load, then this becomes a pretty obvious scalability issue - it bites when heavy-duty hardware is under heavy load - so the option of "throwing hardware at the problem" is not available ...

Cheers,
Wol