LWN.net Logo

2003 Kernel Summit: NUMA management

This article is part of LWN's 2003 Kernel Developers' Summit coverage.
This session, led by Andi Kleen, was titled "NUMA APIs," but talked little about APIs. The real question was how processes should be managed on NUMA systems.

NUMA architectures are, by definition, multiprocessor systems which are organized into nodes of one or more processors. Each node includes a certain amount of local memory. That memory is accessible across the entire NUMA system, but accesses from the local node are much faster than accesses from farther away. The NUMA architecture creates some interesting challenges for operating systems; those which allocate their memory properly will perform better than the others.

There are, according to Andi, three basic allocation techniques which may be used in NUMA systems:

  • "Membind" locks process allocations to a specific node. Keeping all process memory on the local node should yield better performance, as long as other references to that memory remain local as well. The membind scheme can lead to allocation failures, however, if the node runs low on local memory.

  • "Home node" is like membind, except that non-local memory can be allocated if local memory is not available.

  • "Interleaved" allocations rotate through the nodes, allocating memory from each. In some situations (shared memory regions, perhaps), interleaved allocation schemes can yield better performance.

Different policies work better in different situations; the real question is how those policies should be specified and controlled. One could set up a scheme where each process virtual memory area (VMA) has its own allocation scheme. Linus, however, is opposed to the idea of applications controlling how their own memory is allocated. Rather than having each allocation try to optimize its performance on NUMA systems, this sort of decision is best left to system administrators. There is also, it seems, a shortage of publicly available numbers on just how much performance benefit is to be had from complicated NUMA allocation schemes.

NUMA is an issue that will not go away, but more work must be done before we'll see how Linux will support NUMA in the future.


(Log in to post comments)

NUMA and Mosix

Posted Jul 22, 2003 15:00 UTC (Tue) by hazelsct (subscriber, #3659) [Link]

Mosix, or Mosix-like clustering with "shared" memory and process migration, would seem to be a special case of NUMA.

Are their plans to use pluggable transport layers (e.g. TCP/IP, Infiniband, etc.) in the Linux NUMA architecture, to bring these scalability algorithms to next-generation "Mosix" based on NUMA? It would seem this would simplify application development if the same process affinity user space calls could be used, rather than having one set for NUMA and another for Mosix.

Mosix and the Linux kernel

Posted Jul 22, 2003 16:05 UTC (Tue) by StevenCole (guest, #3068) [Link]

From this Linux Journal Interview with Dr. Moshe Bar:
LJ: openMosix is tightly integrated with Linux. In fact, the benefit each other quite a bit. Is there any plan to merge openMosix into official Linux kernel tree? How about porting to another platform, such as *BSD, Mac or maybe Windows?

MB: I don't want to merge openMosix into the kernel. I have talked about it with Linus, Alan Cox, Ingo Molnar and others, and I feel it is best to keep the two separate. I would love to port openMosix or part of its functionality to Windows. But I will wait for Microsoft to approach us with a proposal.

It's too bad that the interviewer didn't follow up and ask why Dr. Bar was taking that position. Sure, he said it was best to keep the two separate, but that's the same thing as not merging. The real question is why not? Does he not want to scare off Microsoft by having it merged into the Linux kernel?

Mosix and the Linux kernel

Posted Jul 29, 2003 12:19 UTC (Tue) by daniel (subscriber, #3181) [Link]

"MB: I don't want to merge openMosix into the kernel. I have talked about it with Linus, Alan Cox, Ingo Molnar and others, and I feel it is best to keep the two separate. I would love to port openMosix or part of its functionality to Windows. But I will wait for Microsoft to approach us with a proposal."

That is the nice thing about open source: Moshe Bar or anyone else cannot unilaterally keep this work out of mainline. (And incidently, I do not see how a Windows port would in any way benefit the open source community.)

Clustering is going to take off over the next few years, and even this year we're going to see the rise of home clusters. That means lots of new people will be using Mosix. When the time is right, it will go into mainline by popular demand. IMHO, the time is not yet right, it would be better to wait a while for more widespread usage, to allow the API to mature.

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds