User: Password:
Subscribe / Log in / New account

2.6.37 merge window, part 1

2.6.37 merge window, part 1

Posted Oct 28, 2010 14:55 UTC (Thu) by nix (subscriber, #2304)
In reply to: 2.6.37 merge window, part 1 by i3839
Parent article: 2.6.37 merge window, part 1

Because you don't need to run very complex stacks at all to exceed 4K. IIRC, NFS-served XFS can break the limit: it doesn't require much.

(Log in to post comments)

2.6.37 merge window, part 1

Posted Oct 28, 2010 20:34 UTC (Thu) by i3839 (guest, #31386) [Link]

Well, then they should fix the stacking happening at all, and make XFS and NFS less stack hungry, instead of pushing 8K stacks and pretending the problem doesn't exist. If there is a stack shortage then 8K might not be enough either. Or let XFS and NFS and others select 8K stacks.

What I dislike is that they take away the 4K stack option altogether.

2.6.37 merge window, part 1

Posted Oct 29, 2010 3:01 UTC (Fri) by nevets (subscriber, #11875) [Link]

You want to see how much stack is being used in the kernel?

Just run the stack_tracer (if enabled).

# mount -t debugfs nodev /sys/kernel/debug
# echo 1 > /proc/sys/kernel/stack_trace_enabled
# cat /sys/kernel/debug/tracing/stack_trace
        Depth    Size   Location    (43 entries)
        -----    ----   --------
  0)     4064     112   __slab_alloc+0x38/0x3f1
  1)     3952      80   kmem_cache_alloc+0x82/0x103
  2)     3872      16   mempool_alloc_slab+0x15/0x17
  3)     3856     144   mempool_alloc+0x5e/0x110
  4)     3712      16   scsi_sg_alloc+0x48/0x4a [scsi_mod]
  5)     3696     112   __sg_alloc_table+0x62/0x103
 38)      768     320   load_elf_binary+0x8a6/0x174a
 39)      448      96   search_binary_handler+0xc0/0x24d
 40)      352     112   do_execve+0x1d0/0x2ba
 41)      240      64   sys_execve+0x43/0x5a
 42)      176     176   stub_execve+0x6a/0xc0

I just about hit 4K immediately after enabling it. That first number is the stack depth (4064 bytes). That is 42 calls deep. This also shows the stack size of each function (the Size field).

2.6.37 merge window, part 1

Posted Oct 29, 2010 15:35 UTC (Fri) by nix (subscriber, #2304) [Link]

Yeah, it was really libata that broke this camel's back. It pulls in the SCSI midlayer for virtually everything: a good idea, because this stuff really *is* SCSI-like, but it makes the call stacks a good bit deeper.

(I stopped using 4kstacks a few years ago when I figured out that it was the cause of my hard lockups when running executables over NFS. That was pre-libata...)

2.6.37 merge window, part 1

Posted Oct 30, 2010 19:00 UTC (Sat) by i3839 (guest, #31386) [Link]

Well, I'd argue that a call stack of 42 is insane and should never happen, but who am I... In such cases any guarantees are off and it's time to actually detect and prevent stack shortage. And not with tracing and such, but a guard page or something.

2.6.37 merge window, part 1

Posted Oct 31, 2010 12:44 UTC (Sun) by nix (subscriber, #2304) [Link]

Uh, a guard page? That would make your 4K stack equivalent to 8K again, only you couldn't use half of it. Not so terribly useful, I think.

Guard pages only make sense if the guarded data is generally much bigger than a page.

2.6.37 merge window, part 1

Posted Oct 31, 2010 23:16 UTC (Sun) by i3839 (guest, #31386) [Link]

Well, I mean reserving a guard page in the virtual address space, not allocating a physical page for it. It would cause a page fault, so I guess it can't work when interrupts are disabled, but the rest of the time it should work now interrupt handlers got their own stack. Except if I'm missing something.

2.6.37 merge window, part 1

Posted Nov 1, 2010 0:18 UTC (Mon) by nix (subscriber, #2304) [Link]

Hm, yeah, that would work, I think... kernel stacks are physically contiguous, but I don't see an obvious reason why they couldn't have a merely-virtually-contiguous unmapped guard page. (There probably is a reason, or they'd have done it.)

2.6.37 merge window, part 1

Posted Nov 1, 2010 10:05 UTC (Mon) by i3839 (guest, #31386) [Link]

Well, I'm pretty sure the kernel doesn't want a virtually mapped stack, so extending it could get a bit tricky. All in all it might be not worth the complexity compared to just using a 8kB stack.

The main advantage of 4kB stack is not the saving of one page, but the added pressure of keeping bloat down. Things like 42 nested function calls are just not good to have.

nevets, I think you could post that trace as a bug somewhere. :-/

2.6.37 merge window, part 1

Posted Nov 1, 2010 10:29 UTC (Mon) by dlang (subscriber, #313) [Link]

I thought that the big advantage of the 4K page was the ability to allocate a single page instead of needing to allocate a pair of pages (order 0 allocation instead of order 1 allocation), greatly reducing the problem of memory fragmentation.

2.6.37 merge window, part 1

Posted Nov 1, 2010 17:33 UTC (Mon) by i3839 (guest, #31386) [Link]

The chance that you can't allocate two contiguous pages is fairly small. we're talking about the stack page here, so it's one per task, which isn't much. Fragmentation is more a problem for bigger allocations than order 1, for allocations that may not fail, and for very frequent allocations. The task stack is neither of those, so it's fine.

2.6.37 merge window, part 1

Posted Nov 7, 2010 11:24 UTC (Sun) by kevinm (guest, #69913) [Link]

I wonder, now that interrupt context now uses its own stack, whether the task stacks couldn't be vmalloc()ed?

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds