I already tried home nodes in 2.4. They didn't work.
There was also another implementation from NEC. They saw some success
on very large systems -- with large numa factors -- but they did
poorly on the more common low NUMA factor two and four socket servers.
The reason is that on these systems not using a core is always much
worse than using remote memory. And if you give the scheduler too
many conflicting inputs it will become schizo and schedule poorly and
not use all cores well anymore.
This is worst on dynamic workloads, for more static workloads it's
not quite as bad.
A better approach is some form of automatic migration, e.g. as
implemented by Lee Schermerhorn: http://permalink.gmane.org/gmane.linux.kernel.numa/590
This can actually fix up imbalances and also allow some other
optimizations. Unfortunately it also doesn't work for all workloads,
so it would need to be an optional knob.
Posted Apr 6, 2011 20:45 UTC (Wed) by riel (subscriber, #3142)
[Link]
The solutions we tried in the past seem to be "big hammer" style solutions, that try to be fairly rigid in what the kernel is allowed to do.
I want to see how little change we can get away with, and still get a decent performance improvement. A home node would only be the node that memory allocations start on, and that the process is preferentially run on - the CPU scheduler does need to be able to run processes elsewhere temporarily.
Only when a node is permanently overloaded, is it time to move some tasks elsewhere and eventually migrate over some of their memory (maybe with Lee's patches, or something based on them).
My plan is to start small and only add things as needed, trying to stay away from a large, complete & heavy plan.
Home nodes
Posted Apr 6, 2011 23:13 UTC (Wed) by andikleen2 (guest, #52506)
[Link]
Actually the home nodes patches were quite simple.
Good luck reinventing the flat tire.
Home nodes
Posted Apr 6, 2011 23:51 UTC (Wed) by martinfick (subscriber, #4455)
[Link]
While I have no idea if you are right or not, don't you think that attempting to improve where other's have failed is potentially worthy of many tries? Especially if there is no fundamental proof that something won't work? And even more when a real unsolved problem is attempting to be solved?
The analogy to reinventing the wheel is inappropriate, since in the case of a working solution, it is a waste of time to reinvent it. But, in the case of failures, "reinventing it" (and potentially no longer failing), should be praised, not ridiculed, no? (again, with the proof caveat above, and even then some... proofs can sometimes be disproved)
Home nodes
Posted Apr 7, 2011 20:49 UTC (Thu) by cmccabe (guest, #60281)
[Link]
Are there any tools out there to show Linux programmers a timeline of when their threads have been migrated between CPUs?
I know that valgrind can show you cache misses, but I'm not really aware of any tools that can display where the scehduler has put your threads over time. Maybe LTT-ng?
Tools
Posted Apr 7, 2011 23:02 UTC (Thu) by corbet (editor, #1)
[Link]
perf timechart can generate some nice output which shows thread migration.
Tools
Posted Apr 8, 2011 14:01 UTC (Fri) by sbohrer (subscriber, #61058)
[Link]
I personally much prefer to use kernelshark for this.