|
|
Subscribe / Log in / New account

Re: [PATCH tip/core/rcu 4/5] sys_membarrier: Add expedited option

From:  "Paul E. McKenney" <paulmck-AT-linux.vnet.ibm.com>
To:  Peter Zijlstra <peterz-AT-infradead.org>
Subject:  Re: [PATCH tip/core/rcu 4/5] sys_membarrier: Add expedited option
Date:  Tue, 25 Jul 2017 12:36:12 -0700
Message-ID:  <20170725193612.GW3730@linux.vnet.ibm.com>
Cc:  linux-kernel-AT-vger.kernel.org, mingo-AT-kernel.org, jiangshanlai-AT-gmail.com, dipankar-AT-in.ibm.com, akpm-AT-linux-foundation.org, mathieu.desnoyers-AT-efficios.com, josh-AT-joshtriplett.org, tglx-AT-linutronix.de, rostedt-AT-goodmis.org, dhowells-AT-redhat.com, edumazet-AT-google.com, fweisbec-AT-gmail.com, oleg-AT-redhat.com

On Tue, Jul 25, 2017 at 08:53:20PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 25, 2017 at 10:17:01AM -0700, Paul E. McKenney wrote:
> 
> > > munmap() TLB invalidate is limited to those CPUs that actually ran
> > > threads of their process, while this is machine wide.
> > 
> > Or those CPUs running threads of any process mapping the underlying file
> > or whatever.
> 
> That doesn't sound right. munmap() of a shared file only invalidates
> this process's map of it.
> 
> Swapping a file page otoh will indeed touch the union of cpumasks over
> all processes mapping that page.

There are a lot of variations, to be sure.  For whatever it is worth,
the original patch that started this uses mprotect():

https://github.com/msullivan/userspace-rcu/commit/04656b4...

> > And in either case, this can span the whole machine.  Plus
> > there are a number of other ways for users to do on-demand full-system
> > IPIs, including any number of ways to wake up large numbers of CPUs,
> > including from unrelated processes.
> 
> Which are those? I thought we significantly reduced those with the nohz
> full work. Most IPI uses now first check if a CPU actually needs the IPI
> before sending it IIRC.

If the task being awakened is higher priority than the task currently
running on a given CPU, that CPU still gets an IPI, right?  Or am I
completely confused?

> > But I do plan to add another alternative that is limited to threads of
> > the running process.  I will be carrying both versions to enable those
> > who have been bugging me about this to do testing.
> 
> Sending IPIs to mm_cpumask() might be better than expedited, but I'm
> still hesitant. Just because people want it doesn't mean its a good
> idea. We need to weight this against the potential for abuse.
> 
> People want userspace preempt disable, no matter how hard they want it,
> they're not getting it because its a completely crap idea.

Unlike userspace preempt disable, in this case we get the abuse anyway
via existing mechanisms, as in they are already being abused.  If we
provide a mechanism for this purpose, we at least have the potential
for handling the abuse, for example:

o	"Defanging" sys_membarrier() on systems that are sensitive to
	latency.  For example, this patch can be defanged by booting
	with the rcupdate.rcu_normal=1 kernel boot parameter, which
	causes requests for expedited grace periods to instead use
	normal grace periods.

o	Detecting and responding to abuse.  For example, perhaps if there
	are more than (say) 50 expedited sys_membarrier()s within a given
	jiffy, the excess sys_membarrier()s are non-expedited.

o	Batching optimizations allow large number of concurrent requests
	to be handled with fewer grace periods -- and both normal and
	expedited grace periods already do exactly this.

This horse is already out, so trying to shut the gate won't be effective.

							Thanx, Paul




to post comments


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds