LWN.net Logo

Re: [discuss] Re: x86_64: 2.6.14-rc4 swiotlb broken

From:  Yasunori Goto <y-goto-AT-jp.fujitsu.com>
To:  Linus Torvalds <torvalds-AT-osdl.org>
Subject:  Re: [discuss] Re: x86_64: 2.6.14-rc4 swiotlb broken
Date:  Tue, 18 Oct 2005 11:29:18 +0900
Cc:  Muli Ben-Yehuda <mulix-AT-mulix.org>, Andi Kleen <ak-AT-suse.de>, discuss-AT-x86-64.org, Ravikiran G Thirumalai <kiran-AT-scalex86.org>, Andrew Morton <akpm-AT-osdl.org>, linux-kernel-AT-vger.kernel.org, tglx-AT-linutronix.de, shai-AT-scalex86.org, clameter-AT-engr.sgi.com, muli-AT-il.ibm.com, jdmason-AT-us.ibm.com
Archive-link:  Article, Thread

Hello. Linus-san.

> NOTE! Even if the machine has 4GB or more of memory, it's entirely likely 
> that the quick "use NODE(0)" hack will work fine. 
> 
> Why? Because the bootmem memory should still be allocated low-to-high by 
> default, which means that as logn as NODE(0) has _enough_ memory in the 
> DMA range, we should be ok.
> 
> So I _think_ the simple one-liner NODE(0) patch is sufficient, and should 
> work (and is a lot more acceptable for 2.6.14 than switching the node 
> ordering around yet again, or doing bigger surgery on the bootmem code).
> 
> So the only thing that worried me (and made me ask whether there might be 
> machines where it doesn't work) is if some machines might have their high 
> memory (or no memory at all) on NODE(0). It does sound unlikely, but I 
> simple don't know what kind of strange NUMA configs there are out there.
> 
> And I'm definitely only interested in machines that are out there, not 
> some theoretical issues.

In our making IA64 machine node 0 might not have any low-memory, and
another node can have low-memory instead.

This cause comes from hotplug whole of one node.
For example, please imagine following case.

1) In this case, firmware remembers pxm 1's node has low memory.

                 node 0             node 1 
               +--------------+  +-----------+
               |  pxm = 1     |  | pxm = 2   |
               |  low memory  |  |           |
               +--------------+  +-----------+


2) If one node is hot-added at pxm = 0 (pxm is decided from physical
   locate by firmware.), new node will be node 2.

  node 2          node 0          node 1 
+-----------+  +--------------+  +-----------+
| pxm = 0   |  |  pxm = 1     |  | pxm = 2   |
|           |  |  low memory  |  |           |
+-----------+  +--------------+  +-----------+

3) If user reboots the machine, Linux decides node id from pxm's order.
   But firmware still remembers which node has low memory.
   So, node 0 will not have any low memory.

  node 0          node 1          node 2
+-----------+  +--------------+  +-----------+
| pxm = 0   |  |  pxm = 1     |  | pxm = 2   |
|           |  |  low memory  |  |           |
+-----------+  +--------------+  +-----------+

So, just "use NODE(0)" is not enough hack for our machine.
If "use NODE(0)" is selected, kernel must sort pgdat link and
node id by memory address. I think that hot add code will be a 
bit messy instead.

Thanks.

-- 
Yasunori Goto 



(Log in to post comments)

Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds