| From: |
| Andrew Morton <akpm-AT-linux-foundation.org> |
| To: |
| Jens Axboe <axboe-AT-kernel.dk> |
| Subject: |
| Re: [PATCH 00/14][V5] Introduce io.latency io controller for cgroups |
| Date: |
| Mon, 2 Jul 2018 14:47:57 -0700 |
| Message-ID: |
| <20180702144757.0138984124f97bf0b8f8de31@linux-foundation.org> |
| Cc: |
| Josef Bacik <josef-AT-toxicpanda.com>, kernel-team-AT-fb.com, linux-block-AT-vger.kernel.org, hannes-AT-cmpxchg.org, tj-AT-kernel.org, linux-kernel-AT-vger.kernel.org, linux-fsdevel-AT-vger.kernel.org |
| Archive-link: |
| Article |
On Mon, 2 Jul 2018 15:41:48 -0600 Jens Axboe <axboe@kernel.dk> wrote:
> On 7/2/18 3:26 PM, Andrew Morton wrote:
> > On Fri, 29 Jun 2018 15:25:28 -0400 Josef Bacik <josef@toxicpanda.com> wrote:
> >
> >> This series adds a latency based io controller for cgroups. It is based on the
> >> same concept as the writeback throttling code, which is watching the overall
> >> total latency of IO's in a given window and then adjusting the queue depth of
> >> the group accordingly. This is meant to be a workload protection controller, so
> >> whoever has the lowest latency target gets the preferential treatment with no
> >> thought to fairness or proportionality. It is meant to be work conserving, so
> >> as long as nobody is missing their latency targets the disk is fair game.
> >>
> >> We have been testing this in production for several months now to get the
> >> behavior right and we are finally at the point that it is working well in all of
> >> our test cases. With this patch we protect our main workload (the web server)
> >> and isolate out the system services (chef/yum/etc). This works well in the
> >> normal case, smoothing out weird request per second (RPS) dips that we would see
> >> when one of the system services would run and compete for IO resources. This
> >> also works incredibly well in the runaway task case.
> >>
> >> The runaway task usecase is where we have some task that slowly eats up all of
> >> the memory on the system (think a memory leak). Previously this sort of
> >> workload would push the box into a swapping/oom death spiral that was only
> >> recovered by rebooting the box. With this patchset and proper configuration of
> >> the memory.low and io.latency controllers we're able to survive this test with a
> >> at most 20% dip in RPS.
> >
> > Is this purely useful for spinning disks, or is there some
> > applicability to SSDs and perhaps other storage devices? Some
> > discussion on this topic would be useful.
> >
> > Patches 5, 7 & 14 look fine to me - go wild. #14 could do with a
> > couple of why-we're-doing-this comments, but I say that about
> > everything ;)
>
> I want to queue this up for 4.19 shortly - is the above an acked-by? Andrewed-by?
> Which do you prefer? :-)
Quacked-at-by: Andrew
Hannes's acks are good. Feel free to add mine as well ;)