User: Password:
Subscribe / Log in / New account

Re: [RFC][PATCH 00/26] sched/numa

From:  Andrea Arcangeli <>
To:  Peter Zijlstra <>
Subject:  Re: [RFC][PATCH 00/26] sched/numa
Date:  Mon, 19 Mar 2012 15:07:01 +0100
Message-ID:  <>
Cc:  Avi Kivity <>, Linus Torvalds <>, Andrew Morton <>, Thomas Gleixner <>, Ingo Molnar <>, Paul Turner <>, Suresh Siddha <>, Mike Galbraith <>, "Paul E. McKenney" <>, Lai Jiangshan <>, Dan Smith <>, Bharata B Rao <>, Lee Schermerhorn <>, Rik van Riel <>, Johannes Weiner <>,,
Archive-link:  Article

On Mon, Mar 19, 2012 at 02:26:31PM +0100, Peter Zijlstra wrote:
> On Mon, 2012-03-19 at 14:04 +0100, Andrea Arcangeli wrote:
> > If you boot with memcg compiled in, that's taking an equivalent amount
> > of memory per-page.
> > 
> > If you can bear the memory loss when memcg is compiled in even when
> > not enabled, you sure can bear it on NUMA systems that have lots of
> > memory, so it's perfectly ok to sacrifice a bit of it so that it
> > performs like not-NUMA but you still have more memory than not-NUMA.
> > 
> I think the overhead of memcg is quite insane as well. And no I cannot
> bear that and have it disabled in all my kernels.
> NUMA systems having lots of memory is a false argument, that doesn't
> mean we can just waste tons of it, people pay good money for that
> memory, they want to use it.
> I fact, I know that HPC people want things like swap-over-nfs so they
> can push infrequently running system crap out into swap so they can get
> these few extra megabytes of memory. And you're proposing they give up
> ~100M just like that?

If they run 20% faster absolutely they will give up the 100M.

You may want to check how many gigabytes they swap... going through
the mess of swap-over-nfs to swap _only_ ~100M would be laughable. If
they push to swap several gigabytes ok, but then 100M more or less
won't matter.

If you intend to proof AutoNUMA design isn't ok, do not complain about
the memory use per page, do not complain about the pagetable scanner,
only complain about the cost of the numa hinting page fault in
presence of virt and vmexists. That is frankly my only slight concern
and it largely depends on hardware and not enough benchmarking has
been done to give it a green light yet. I am optimistic though because
worst case the page fault numa hinting fault frequency should be
reduced for tasks with mmu notifier attached to it and in turn
secondary mmus and higher page fault costs.

Pagetable scanner and memory use will be absolutely ok.

To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to  For more info on Linux MM,
see: .
Fight unfair telecom internet charges in Canada: sign
Don't email: <a href=mailto:""> </a>

(Log in to post comments)

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds