Re: [discuss] Re: x86_64: 2.6.14-rc4 swiotlb broken
[Posted October 18, 2005 by corbet]
| From: |
| Yasunori Goto <y-goto-AT-jp.fujitsu.com> |
| To: |
| Linus Torvalds <torvalds-AT-osdl.org> |
| Subject: |
| Re: [discuss] Re: x86_64: 2.6.14-rc4 swiotlb broken |
| Date: |
| Tue, 18 Oct 2005 11:29:18 +0900 |
| Cc: |
| Muli Ben-Yehuda <mulix-AT-mulix.org>, Andi Kleen <ak-AT-suse.de>,
discuss-AT-x86-64.org, Ravikiran G Thirumalai <kiran-AT-scalex86.org>,
Andrew Morton <akpm-AT-osdl.org>, linux-kernel-AT-vger.kernel.org,
tglx-AT-linutronix.de, shai-AT-scalex86.org, clameter-AT-engr.sgi.com,
muli-AT-il.ibm.com, jdmason-AT-us.ibm.com |
| Archive-link: |
| Article,
Thread
|
Hello. Linus-san.
> NOTE! Even if the machine has 4GB or more of memory, it's entirely likely
> that the quick "use NODE(0)" hack will work fine.
>
> Why? Because the bootmem memory should still be allocated low-to-high by
> default, which means that as logn as NODE(0) has _enough_ memory in the
> DMA range, we should be ok.
>
> So I _think_ the simple one-liner NODE(0) patch is sufficient, and should
> work (and is a lot more acceptable for 2.6.14 than switching the node
> ordering around yet again, or doing bigger surgery on the bootmem code).
>
> So the only thing that worried me (and made me ask whether there might be
> machines where it doesn't work) is if some machines might have their high
> memory (or no memory at all) on NODE(0). It does sound unlikely, but I
> simple don't know what kind of strange NUMA configs there are out there.
>
> And I'm definitely only interested in machines that are out there, not
> some theoretical issues.
In our making IA64 machine node 0 might not have any low-memory, and
another node can have low-memory instead.
This cause comes from hotplug whole of one node.
For example, please imagine following case.
1) In this case, firmware remembers pxm 1's node has low memory.
node 0 node 1
+--------------+ +-----------+
| pxm = 1 | | pxm = 2 |
| low memory | | |
+--------------+ +-----------+
2) If one node is hot-added at pxm = 0 (pxm is decided from physical
locate by firmware.), new node will be node 2.
node 2 node 0 node 1
+-----------+ +--------------+ +-----------+
| pxm = 0 | | pxm = 1 | | pxm = 2 |
| | | low memory | | |
+-----------+ +--------------+ +-----------+
3) If user reboots the machine, Linux decides node id from pxm's order.
But firmware still remembers which node has low memory.
So, node 0 will not have any low memory.
node 0 node 1 node 2
+-----------+ +--------------+ +-----------+
| pxm = 0 | | pxm = 1 | | pxm = 2 |
| | | low memory | | |
+-----------+ +--------------+ +-----------+
So, just "use NODE(0)" is not enough hack for our machine.
If "use NODE(0)" is selected, kernel must sort pgdat link and
node id by memory address. I think that hot add code will be a
bit messy instead.
Thanks.
--
Yasunori Goto
(
Log in to post comments)