Quotes of the week

[Posted April 14, 2010 by corbet]

Hey, all my other theories made sense too.. They just didn't work.

But as Edison said: I didn't fail, I just found three other ways to not fix your bug.

To be honest I think 4K stack simply has to go. I tend to call it "russian roulette" mode.

It was just a old workaround for a very old buggy VM that couldn't free 8K pages and the VM is a lot better at that now. And the general trend is to more complex code everywhere, so 4K stacks become more and more hazardous. It was a bad idea back then and is still a bad idea, getting worse and worse with each MLOC being added to the kernel each year.

-- Andi Kleen

What's the benefit of 4K stacks anyway?

Posted Apr 16, 2010 18:29 UTC (Fri) by pr1268 (guest, #24648) [Link] (2 responses)

Okay, call me ignorant (or simply unenlightened). What is the benefit of 4K stacks? Why was 8K chosen way back in the day, and what benefits, real or mythical, do 4K stacks have over 8K (and vice versa)? Pointers to Web pages and discussion threads are surely appreciated (and I'll go look at the kernel documentation for some insight). Thanks!

What's the benefit of 4K stacks anyway?

Posted Apr 16, 2010 18:57 UTC (Fri) by corbet (editor, #1) [Link]

Less memory use, of course, but the real reason is that order-1 (8K) pages are harder to allocate reliably in the kernel. As memory gets more fragmented, contiguous pairs can be hard to find. That situation has improved considerably, but fragmentation can still cause fork() to fail if 8K pages are in use.

What's the benefit of 4K stacks anyway?

Posted Apr 22, 2010 4:33 UTC (Thu) by rhcoe (guest, #1101) [Link]

Each process has a stack space allocated for running in kernel mode.
Having a 4K stack means that it only requires one page of memory per
process instead of two.

The difficulty from a kernel developer point of view is insuring that
no code path ever overruns the one page of stack.

The argument against it is that for most workloads, no process usually needs
more than 4K of stack, ie wasted, un-used space.

4K stack to go

Posted Apr 18, 2010 21:44 UTC (Sun) by man_ls (guest, #15091) [Link] (1 responses)

Pity that some kernel devs (at least Kleen) think that small pages have to go. I would have thought that the right thing to do would be to ensure that 4K pages are enough in any circumstances. Bad recursion algorithms for instance can kill 8K as readily as 4K, so avoiding recursion altogether and giving a hard limit to driver developers looks (to this uninformed non-kernel dev) like good practice.

Maybe keeping the stack within bounds all the time is too hard a problem, but it is a pity to see kernel devs yield to complexity and go back to bigger stacks. We all pay the memory costs, and in some places it really hurts.

4K stack to go

Posted Apr 24, 2010 19:50 UTC (Sat) by efexis (guest, #26355) [Link]

This is a hard limit to driver developers... you don't get away with overflowing your kernel mode stack. If it occurs on the driver developers machine then great, but if it their system's fine, but the combination of things that go into the running of your machine does overflow the stack, then it's you that gets bit. Outside of the embedded world you'd be hard pressed to find a usage where one single 4k page of RAM, per running process (or kernel thread) would really be described as "paying the memory price" as in most workloads, the amount's tiny. The problem was/is finding two pages next to each other, which is required when running in kernel mode for some reason. As much as it would be nice to be able to save 4k per process, if you've got the memory, which you probably have, it seems somewhat ridiculous to not be able to use some of it to ensure a stable running system, without having to sacrifice things kernel mode preemption in places where allowing the kernel to be preempted by a higher priority task requires stack space that you cannot guarentee despite having plenty of free RAM.

So yeah... two sides 'n all that :-)

Quotes of the week

Posted Apr 23, 2010 18:03 UTC (Fri) by jd (guest, #26381) [Link] (1 responses)

PCI allows 4K transfers in a single transaction. Thus, a 4K page can be handled in hardware atomically. Remote memory transfers (such as RDMA) also use 4K block sizes. I'm not certain, but I think most of the other standard memory hardware is also geared to 4K pages. If you can always read/insert from the very start of a block to the very end, it's convenient. No offsets to worry about, no special inserts, no mess, no fuss.

As others have noted, 4K means easier (though more resource-intensive) memory management and all of the complexities that go with juggling memory allocations.

It might actually be nice to have a multi-page-size option, where some memory uses 4K pages (specifically to exploit atomic operation benefits) and some memory uses a larger page size (for when the bottleneck is in the VM, not the bus or hardware).

Quotes of the week

Posted Apr 24, 2010 19:30 UTC (Sat) by efexis (guest, #26355) [Link]

Unless I'm mistaken, this is only kernel mode stack space, which is why it has to be contiguous, so it'd never be used for IO. A process will generally spend most of its time running in userland, where its stack isn't affected by these issues (as its pages can be scattered in real memory). Upon requesting some IO, the kernel mode stack space might just contain a pointer to the address where the IO is to be read to, and the address of where to jump back to in userland after it's finished the IO call etc. This is of course very highly simplified (eg, it'll more likely be pointers to memory structures which will contain list of pointers to read to etc), but I believe it explains the fundamentals.