|| ||Linus Torvalds <torvalds-AT-linux-foundation.org>|
|| ||Ingo Molnar <mingo-AT-elte.hu>|
|| ||Re: [git pull] cpus4096 fixes|
|| ||Sun, 27 Jul 2008 13:15:26 -0700 (PDT)|
|| ||Linux Kernel Mailing List <linux-kernel-AT-vger.kernel.org>,
Andrew Morton <akpm-AT-linux-foundation.org>,
Mike Travis <travis-AT-sgi.com>,
Rusty Russell <rusty-AT-rustcorp.com.au>|
On Sun, 27 Jul 2008, Ingo Molnar wrote:
> Please pull the latest cpus4096-fixes git tree from:
No. Not without explanations.
Quite frankly, this "fix" looks like a huge stinking pile of sh*t.
I can't follow that thread on lkml.org (horrible web interface with
hard-to-follow threading), and I'm too lazy to bother to look in my lkml
email archives, but whoever said
"The simple version is just a static array of [NR_CPUS] cpumask_t's."
and then implemented this piece of shit is a complete and utter moron.
I'm sorry, but guys, I really expect people to have better taste than
this, and also expect people to be able to _think_ better than this.
Am I right, and all you want is NR_CPU constant bitmasks that have just a
single big set in each (for that single CPU)?
And I further right, adn you are so STUPID that you cannot see that you
can share all the zero words?
In other words, on a 64-bit architecture, you only ever need 64 of these
arrays - with a different bit set in ONE SINGLE WORD (with enough zero
words around it so that you can create any bitmask by just offsetting in
that big array). And then you just put enough zeroes around it that you
can point _every_single_cpumask_ to be one of those things.
So when you have 4k CPU's, instead of having 4k arrays (of 4k bits each,
with one bit set in each array - 2MB memory total), you have exactly 64
arrays instead, each 8k bits in size (64kB total).
And then you just point cpumask(n) to the right position (which you can
calculate dynamically). Once you have the right arrays, getting
"cpumask(n)" ends up being something like
static const cpumask_t *cpumask_of_cpu(int cpu)
/* Get the array with the right bit set */
unsigned long *p = array[cpu % BITS_PER_LONG];
/* Offset it so that it's in the right word */
p += (NR_CPUS-n)/BITS_PER_LONG;
/* Return it as a cpumask_t */
return (cpumask_t) p;
And once you're not being a total idiot about wasting memory that is just
filled with a single bit in various different places, you don't need all
those games to re-create the arrays in some dense format, because they're
already going to be dense enough. If you compile a kernel for up to 4k
CPU's, "wasting" that 64kB of memory is a non-issue (especially since by
doing this "overlapping" trick you probbaly get better cache behaviour
Ok, so now that I've insulted you and your pets (they're ugly!), show me
wrong, and then call me a d*ckhead. ("Linus - you're a d*ckhead, and you
didn't understand the problem, so you're a _stupid_ d*ckhead. And my
pet may be ugly, but yours _smells_ bad!").
Or say "Uh, yeah, we're morons, and here's the much better patch, and we
won't do that again".
to post comments)