Virtually mapped stacks 2: thread_info strikes back
Virtually mapped stacks 2: thread_info strikes back
Posted Jun 30, 2016 8:22 UTC (Thu) by rwmj (subscriber, #5474)Parent article: Virtually mapped stacks 2: thread_info strikes back
Posted Jun 30, 2016 14:06 UTC (Thu)
by corbet (editor, #1)
[Link] (6 responses)
Posted Jul 4, 2016 13:12 UTC (Mon)
by nix (subscriber, #2304)
[Link] (5 responses)
Posted Jul 4, 2016 17:35 UTC (Mon)
by luto (guest, #39314)
[Link] (4 responses)
Also, what happens if allocation fails?
Posted Jul 5, 2016 16:07 UTC (Tue)
by nix (subscriber, #2304)
[Link] (3 responses)
I do suspect that recovery from double faults is, at best, very lightly tested, if at all, so even if it works on one CPU it might well fail on another. Ah well :(
(On allocation failure, you obviously kill the process just like you would on a detected stack overflow in the soon-to-be-current world. That's no different from userspace.)
Posted Jul 5, 2016 21:55 UTC (Tue)
by nybble41 (subscriber, #55106)
[Link] (2 responses)
Userspace has the kernel to clean up things like locks that were held at the time the process was killed. Who cleans up when a kernel thread dies unexpectedly due to an asynchronous out-of-memory condition? You can't just terminate a kernel thread at any arbitrary point, but you can't resume the thread either without extending the stack.
Stack overflow is a programming error; you know how much stack you allocated and—barring unbounded recursion, variable-length arrays, or alloca()—can statically calculate how much you need (in the worst case) to complete a given function call. Stack overflow in a kernel thread could thus be treated as a bug with the potential to halt the system. An out-of-memory condition resulting from delayed allocation of stack pages is an entirely different matter. That can occur at any time, depending on the amount of memory available.
Posted Jul 6, 2016 20:18 UTC (Wed)
by nix (subscriber, #2304)
[Link] (1 responses)
I was hoping we could *get*, not unbounded, but arbitrarily deep recursion: in complex situations like stacks of filesystems it can be very hard to compute how much space might be needed in advance, and possibly impractical to allocate it for every task just in case. What you're saying here is that system administrators can cause kernel bugs by stacking filesystems. That's the situation we have now, but it's very far from ideal: there is no conceptual reason why you shouldn't be able to stack the things fifty or five million deep (though obviously performance would start to suck!)
Posted Jul 6, 2016 21:30 UTC (Wed)
by nybble41 (subscriber, #55106)
[Link]
There are other ways to achieve the same result. One approach would be to employ "trampoline" functions; this only works if one filesystem can hand off to another without any further involvement. Rather than call the other filesystem directly you return a transformed version of the request, unwinding the stack, and a top-level iterative function passes the transformed request on to the other filesystem. When applicable, this approach can handle any number of nested filesystems in constant space.
Another approach which is more compatible with existing code would be to explicitly extend the stack before calling into the other filesystem. Allocation failure at this point could be handled safely like any other out-of-memory condition:
if (!ensure_minimum_stack(8192)) return -ENOMEM;
The problem was in extending the stack implicitly through page faults, where allocation cannot be allowed to fail, not the basic concept of having an extendable stack.
I've seen no talk of increasing the stack size, but allocating them from the vmalloc() area would certainly remove the biggest impediment to doing so.
Larger stacks
Larger stacks
Larger stacks
Larger stacks
Larger stacks
Larger stacks
Larger stacks
nested_filesystem_call();