|From:||Andrea Arcangeli <aarcange-AT-redhat.com>|
|To:||Peter Zijlstra <a.p.zijlstra-AT-chello.nl>|
|Subject:||Re: [RFC][PATCH 00/26] sched/numa|
|Date:||Mon, 19 Mar 2012 15:07:01 +0100|
|Cc:||Avi Kivity <avi-AT-redhat.com>, Linus Torvalds <torvalds-AT-linux-foundation.org>, Andrew Morton <akpm-AT-linux-foundation.org>, Thomas Gleixner <tglx-AT-linutronix.de>, Ingo Molnar <mingo-AT-elte.hu>, Paul Turner <pjt-AT-google.com>, Suresh Siddha <suresh.b.siddha-AT-intel.com>, Mike Galbraith <efault-AT-gmx.de>, "Paul E. McKenney" <paulmck-AT-linux.vnet.ibm.com>, Lai Jiangshan <laijs-AT-cn.fujitsu.com>, Dan Smith <danms-AT-us.ibm.com>, Bharata B Rao <bharata.rao-AT-gmail.com>, Lee Schermerhorn <Lee.Schermerhorn-AT-hp.com>, Rik van Riel <riel-AT-redhat.com>, Johannes Weiner <hannes-AT-cmpxchg.org>, linux-kernel-AT-vger.kernel.org, linux-mm-AT-kvack.org|
On Mon, Mar 19, 2012 at 02:26:31PM +0100, Peter Zijlstra wrote: > On Mon, 2012-03-19 at 14:04 +0100, Andrea Arcangeli wrote: > > If you boot with memcg compiled in, that's taking an equivalent amount > > of memory per-page. > > > > If you can bear the memory loss when memcg is compiled in even when > > not enabled, you sure can bear it on NUMA systems that have lots of > > memory, so it's perfectly ok to sacrifice a bit of it so that it > > performs like not-NUMA but you still have more memory than not-NUMA. > > > I think the overhead of memcg is quite insane as well. And no I cannot > bear that and have it disabled in all my kernels. > > NUMA systems having lots of memory is a false argument, that doesn't > mean we can just waste tons of it, people pay good money for that > memory, they want to use it. > > I fact, I know that HPC people want things like swap-over-nfs so they > can push infrequently running system crap out into swap so they can get > these few extra megabytes of memory. And you're proposing they give up > ~100M just like that? If they run 20% faster absolutely they will give up the 100M. You may want to check how many gigabytes they swap... going through the mess of swap-over-nfs to swap _only_ ~100M would be laughable. If they push to swap several gigabytes ok, but then 100M more or less won't matter. If you intend to proof AutoNUMA design isn't ok, do not complain about the memory use per page, do not complain about the pagetable scanner, only complain about the cost of the numa hinting page fault in presence of virt and vmexists. That is frankly my only slight concern and it largely depends on hardware and not enough benchmarking has been done to give it a green light yet. I am optimistic though because worst case the page fault numa hinting fault frequency should be reduced for tasks with mmu notifier attached to it and in turn secondary mmus and higher page fault costs. Pagetable scanner and memory use will be absolutely ok. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to firstname.lastname@example.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"email@example.com"> firstname.lastname@example.org </a>
Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds