The balance between features and performance in the block layer

Posted Nov 8, 2021 12:14 UTC (Mon) by k3ninho (subscriber, #50375)
Parent article: The balance between features and performance in the block layer

>optimising the [anything] for maximum [single statistic] at the expense of everything else
That sounds like premature optimisation -- but the point I'd rather make here, is about whole-context optimisation, where we must make a habit of improving the system-as-a-whole. Especially when you're only optimising one measure without clarifying the assumption that it's the best proxy for all the other things you're not taking into account.

K3n.

The balance between features and performance in the block layer

Posted Nov 8, 2021 19:45 UTC (Mon) by jezuch (subscriber, #52988) [Link]

This, and Amdahl's law. I guess at this point they're not increasing performance, but reducing overhead. In an increasingly marginal way.

The balance between features and performance in the block layer

Posted Nov 9, 2021 1:38 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (2 responses)

The purpose of an operating system is not to score well on benchmarks. Even a whole suite of numbers is not necessarily dispositive. If the OS can't do what the user wants it to do, then performance is wholly irrelevant.

The balance between features and performance in the block layer

Posted Nov 11, 2021 9:54 UTC (Thu) by wtarreau (subscriber, #51152) [Link] (1 responses)

It's more complicated, Linux suffers from being everyone's OS, and everyone has different use cases and priorities. For some it's useless without performance and for others it's useless without new features.

In haproxy we're facing this dilemma all the time, but we try to stay reasonable. We know that users want features, and we try to group slow operations in slow paths, or to implement bypass mechanisms. Sometimes the cost of checking one flag is okay but not two or three, so we arrange for grouping them under a same mask and use a slow path to test each of them. Other times we have high-level checks that decide what path to take, with some partially redundant code, which is more of a pain but occasionally needed. And we try to always keep in mind that saved performance is not just there to present numbers, but also to leave more room to welcome new features at zero cost. For sure it's never pleasant to work 3 months to save 5% and see those 5% disappear 3 months later, but if we're back to previous performance numbers for a nice feature improvement, well, it's not that bad.

One thing that developers tend to forget is that doing nothing can be extremely fast, but in the real world there are more operations around what they've optimized, so their savings that double the performance have in fact only cut the overhead in half, and that when placed in field, a lot more overhead will replace the one they removed. So their savings only become a few percent in the end. That's what I'm often trying to explain "in practice nobody runs at this level of performance due to other factors so the loss will be much lower".

The balance between features and performance in the block layer

Posted Nov 15, 2021 5:20 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

Of course performance matters. Half my job is (after the pager goes off) figuring out why we can't serve 99% of our RPCs within X milliseconds. But surprisingly often, the answer turns out to be "because the client asked us to do something inherently expensive, and we're lumping it in with the cheap requests," and so we end up changing the monitoring rather than improving the hot path (i.e. we change our definition of "good performance" to exclude the expensive operations, or to give them additional time).