Costa: Designing a Userspace Disk I/O Scheduler for Modern Datastores: the Scylla example (Part 1)

Posted Apr 19, 2016 12:09 UTC (Tue) by rnsanchez (guest, #32570)
In reply to: Costa: Designing a Userspace Disk I/O Scheduler for Modern Datastores: the Scylla example (Part 1) by jospoortvliet
Parent article: Costa: Designing a Userspace Disk I/O Scheduler for Modern Datastores: the Scylla example (Part 1)

The kernel's I/O scheduler is supposed to be one-size-fits-all. For heavy workloads, it is common to run into conflicts with the metrics (i.e., what to prioritize when the world is collapsing), and the tunables are of little help. Not to mention that it is rather troublesome to cancel async-I/O. It is not impossible, it is just not fast enough when you put enough pressure across the entire I/O subsystem. Also, it is a good scheduler for throughput, NOT latency.

They know their workload better than the kernel's I/O scheduler, so it makes sense for them to schedule on their own (according to whatever metrics might be critical at a specific point in time), and then submit to the kernel what they really need and when.