LWN.net Logo

improve the first percpu chunk allocation

From:  Tejun Heo <tj@kernel.org>
To:  mingo@elte.hu, rusty@rustcorp.com.au, tglx@linutronix.de, x86@kernel.org, linux-kernel@vger.kernel.org, hpa@zytor.com, jeremy@goop.org, cpw@sgi.com, nickpiggin@yahoo.com.au, ink@jurassic.park.msu.ru
Subject:  [PATCHSET x86/core/percpu] improve the first percpu chunk allocation
Date:  Tue, 24 Feb 2009 12:11:31 +0900
Message-ID:  <1235445101-7882-1-git-send-email-tj@kernel.org>
Archive-link:  Article, Thread

Hello, all.

This patchset improves the first percpu chunk allocation.  The problem
is that the dynamic percpu area allocation maps the whole percpu area
into vmalloc area using 4k mappings which adds considerable amount of
TLB pressure.

This patchset modularizes the first percpu chunk allocation and uses
different allocation schemes to optimize TLB usage.

* On !NUMA, the first chunk is allocated directly using
  alloc_bootmem() thus adding no TLB pressure whatsoever.

* On NUMA, the first chunk is remapped using large pages and whatever
  is left in the large page is given back to the bootmem allocator.
  This makes each cpu use an additional large TLB entry for the first
  chunk but still is much better than using many 4k TLB entries.

This patchset contains the following ten patches.

  0001-percpu-fix-pcpu_chunk_struct_size.patch
  0002-bootmem-clean-up-arch-specific-bootmem-wrapping.patch
  0003-bootmem-reorder-interface-functions-and-add-a-missi.patch
  0004-vmalloc-add-align-to-vm_area_register_early.patch
  0005-x86-update-populate_extra_pte-and-add-populate_ex.patch
  0006-percpu-remove-unit_size-power-of-2-restriction.patch
  0007-percpu-give-more-latitude-to-arch-specific-first-ch.patch
  0008-x86-separate-out-setup_pcpu_4k-from-setup_per_cpu.patch
  0009-x86-add-embedding-percpu-first-chunk-allocator.patch
  0010-x86-add-remapping-percpu-first-chunk-allocator.patch

0001 fixes a bug introduced by earlier patch.  0002-0006 prepares for
better first chunk allocation.  0007 updates make percpu allocator
initialization more flexible.  0008-0010 modularizes and adds better
allocation schemes for x86.

This patchset is available in the following git tree.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu

Diffstat follows.

 arch/alpha/mm/init.c             |    2 
 arch/avr32/Kconfig               |    2 
 arch/x86/Kconfig                 |    2 
 arch/x86/include/asm/mmzone_32.h |   43 ----
 arch/x86/include/asm/pgtable.h   |    3 
 arch/x86/kernel/setup_percpu.c   |  373 ++++++++++++++++++++++++++++++++++-----
 arch/x86/mm/init_32.c            |   13 +
 arch/x86/mm/init_64.c            |   75 ++++---
 include/linux/bootmem.h          |   36 +--
 include/linux/percpu.h           |   39 +++-
 include/linux/vmalloc.h          |    2 
 mm/bootmem.c                     |   14 +
 mm/percpu.c                      |  178 +++++++++++++-----
 mm/vmalloc.c                     |   11 -
 14 files changed, 607 insertions(+), 186 deletions(-)

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds