LWN: Comments on "4K stacks for everyone?" http://lwn.net/Articles/150580/ This is a special feed containing comments posted to the individual LWN article titled "4K stacks for everyone?". hourly 2 4K stacks for everyone? http://lwn.net/Articles/160225/rss 2005-11-16T10:08:35+00:00 DiegoCG I strongly disagree - this patch has been in fedora core for 2 years. Developers are not pushig this out of their ass. The total stack space is actually bigger due to interrupt stacks. Also, it improves scalability<br> <p> Apparently the main issues (Xfs, etc) have been fixed so I'd say that 4KB stacks has actually improved code quality....<br> <p> <p> Larger pages are interesting but they also bring more fragmentation, also lots of no-so-old x86 cpus can't handle pages &gt; 4kb very well AFAIK<br> VM for device drivers? http://lwn.net/Articles/152467/rss 2005-09-20T20:19:39+00:00 renox <font class="QuotedText">&gt;There was a time when virtual address space was in abundant supply and we just worried about real memory, but today the reverse is often true.</font><br> <p> Well, only on 32 bit CPUs. The suggestion of adding guard page seems very valid to me, even if only for 64 bit CPUs: less crash or at least 'controlled crash' are always better.<br> NDISWrapper and 4K stacks http://lwn.net/Articles/151897/rss 2005-09-15T15:36:02+00:00 Luyseyal Agreed. Most ndiswrapper users seem to be wireless card users who won't be expecting stellar performance in any case.<br> <p> -l<br> <p> Conexant drivers from Linuxant also requires >4K http://lwn.net/Articles/151822/rss 2005-09-15T05:25:04+00:00 thleemhuis <font class="QuotedText">&gt; * I cannot use the NTFS drivers from Livna. </font><br> <p> You can easily rebuild the kernel-module-ntfs.srpm from livna. See:<br> <a rel="nofollow" href="http://rpm.livna.org/kernel-modules.html">http://rpm.livna.org/kernel-modules.html</a><br> VM for device drivers? http://lwn.net/Articles/151431/rss 2005-09-13T16:07:15+00:00 giraffedata The kernel can and does detect page faults in kernel space. When the kernel tries to dereference a null pointer, the oops you see is due to the page fault. The same thing would work with the invalid page after the end of the stack (that's called a "guard page"). <p> The earlier comment really meant that the kernel is not set up to handle a page fault in a virtual memory fashion -- i.e. do a pagein and continue as if nothing had happened. <p> But, unfortunately, the guard page has the same problem as 8K stacks -- requires an extra 4K per thread of kernel virtual memory address space and requires 2 contiguous virtual pages. There was a time when virtual address space was in abundant supply and we just worried about real memory, but today the reverse is often true. VM for device drivers? http://lwn.net/Articles/151371/rss 2005-09-13T01:17:11+00:00 mcelrath Dirth isn't a word. The word you wanted is dearth. (back at cha)<br> <p> Anyway, one really needs an oops or panic if the kernel stack is overflowed. A previous poster said page faults in kernel space aren't detectable. You are proposing a 4k page at the end of the stack to check if the stack has overflowed. If there are no page faults in kernel space, then one has to check the stack-overflow page on every process switch? That seems expensive.<br> <p> Then on the other hand this overflow page can probably be only one physical page, shared among all processes (and an oops or panic if ANY process writes it), and if a page fault isn't possible then the task switch could just do an<br> <p> if(stack_overflow[0] != STACK_OVERFLOW_PATTERN) { oops }<br> <p> e.g. just check the first byte. So overall it costs 1 cmp per task switch and 4k. Seems much better than silent stack overflows, and the possible security flaws that might come from them too...<br> 4K stacks for everyone? http://lwn.net/Articles/151183/rss 2005-09-11T01:15:52+00:00 giraffedata <blockquote> But this just begs the question...does the kernel really have no means of detecting or handling stack overflows? </blockquote> <p> It doesn't beg any question. It just raises one. "Beg" means "evade." VM for device drivers? http://lwn.net/Articles/151182/rss 2005-09-11T01:14:13+00:00 giraffedata The kernel uses virtual memory address translation today, and that's all that's required to do this kind of stack overflow protection. Make the stack one page and the page immediately after it invalid. A process tries to overflow the stack, and it gets oopsed. <p> However, I believe the point of 4K stacks is that there is a dirth of kernel virtual memory address space, so have 4K of usable addresses plus 4K of unusable obviates the 4K stack change. <p> To be robust like other OSes in this area, we'd have to go to some complex system with multiple kernel address spaces, and that probably would bring with it a pageable kernel. How exactly would you handle a stack overflow in kernel space? http://lwn.net/Articles/151181/rss 2005-09-11T01:04:59+00:00 giraffedata In most cases, an oops is easy, and significantly more graceful than what we have now. In some cases (e.g. process scheduler), oops isn't possible, but panic is still more graceful than we we have. Conservative Automatic Stack Size Check http://lwn.net/Articles/151168/rss 2005-09-10T21:03:09+00:00 joern FYI: Function pointers are not that hard to follow. See<br> <a href="http://wh.fh-wedel.de/~joern/quality.pdf">http://wh.fh-wedel.de/~joern/quality.pdf</a><br> How exactly would you handle a stack overflow in kernel space? http://lwn.net/Articles/151129/rss 2005-09-10T10:15:33+00:00 pkolloch I think it is not that easy to gracefully deal with stack overflows, even if they get detected. What do you do, if the process scheduler triggers a stack overflow? Disable it?<br> <p> Even for device drivers the general case gets tricky... [besides the fact that at least the device in question would have to stop working]<br> NDISWrapper and 4K stacks http://lwn.net/Articles/151051/rss 2005-09-09T18:57:42+00:00 Duncan NDISWrapper moving to user-space seems the clear-case best choice, here. <br> Not only does it solve its problem with 4k kernel stacks, it also moves <br> not only proprietary, but proprietary MSWormOS even, drivers out of kernel <br> space into more sanely protected userspace. This sounds like exactly the <br> sort of solution the kernel devs would prefer, because it gets those <br> "black-box binary-only things" out of the kernel greatly simplifying and <br> securing things. <br> <br> The cost is of course speed... Mode transfers between user space and <br> kernel space take time, so proprietary-only MSWormOS based drivers will be <br> slower. Somehow, I don't see that as being a big issue either, since the <br> view will be that it encourages transfering to more Linux-friendly <br> hardware, always seen as a good thing. <br> <br> Duncan <br> <br> 4K stacks for everyone? http://lwn.net/Articles/151033/rss 2005-09-09T17:09:38+00:00 Ross Actually the kernel is what expands the userspace stack, not the C library. If you think about it, it is exanded even in a program which doesn't call any functions in the C library.<br> Conservative Automatic Stack Size Check http://lwn.net/Articles/150944/rss 2005-09-09T09:29:16+00:00 pkolloch hmmm, you are right, I knew I had been naive, but I couldn't see what I missed.<br> <p> Since from what I saw about the VFS, it's a shame that it is not expressed in an object oriented fashion. That could at least limit the amount of candidates. Maybe one could provide some annotations?<br> <p> But I can well imagine that especially concepts as unionfs which wrap other file systems could in principle be wrapped around each other infinitely. You would have too make up some clever notation to tell the stack analyzer that this really isn't possible. (If there is even such a check ;) ) Or is it done in some clever fashion that the wrapped and the wrapper are not called in a nested fashion, but in some kind of chaining way for exactly the purpose of saving stack space?<br> <p> [Disclaimer: Again, I have no real clue about the kernel source, so I hope my assumptions are not totally off the beat.]<br> VM for device drivers? http://lwn.net/Articles/150931/rss 2005-09-09T04:36:44+00:00 xoddam <font class="QuotedText">&gt; gcc/libc can allocate more stack pages for userspace programs </font><br> <font class="QuotedText">&gt; if needed, but why not the kernel? </font><br> <br> A defining characteristic of kernel-space programming is that <br> you don't get the benefits of implicit memory protection. <br> Everything has to be done explicitly by the kernel itself. <br> <br> It's possible in principle to give kernel-space tasks virtual <br> memory support, but it would open a big can of worms. If <br> you want deep recursion, do it in userspace. <br> <br> As things stand the kernel-space page map is never changed <br> implicitly, and rarely explicitly. The prospect of giving <br> kernel tasks their own vm maps with holes to fit new pages <br> which are to be faulted in (from where?) when the stack <br> overflows is nightmarish! Performance and maintainability <br> are much, much more important than a growable stack. <br> <br> If stack usage in kernel space isn't demonstrably finite <br> then the code is broken. The best solution is explicit <br> management of the resources (eg. using a queue) so that <br> the stack size ceases to be an issue. <br> <br> Conservative Automatic Stack Size Check http://lwn.net/Articles/150920/rss 2005-09-09T01:58:13+00:00 sethml Clever idea, but you missed a case that's hard to deal with: calling through function pointers. The kernel uses function pointers extensively, especially for device drivers. I suspect the case mentioned involving RAID involves calling through quite a few levels of function pointers. Figuring out the maximum possible call stack depth, even very conservatively, is probably pretty difficult, and the conservative answer is probably "infinite" because there are pathways you could construct that would recurse, even if that never happens in practice.<br> 4K stacks for everyone? http://lwn.net/Articles/150924/rss 2005-09-09T01:06:05+00:00 mcelrath I noticed the other day while configuring my 2.6.13 kernel that there is now an experimental option to use register arguments for function calls. I imagine this should seriously reduce stack usage. Perhaps a default 4k stack should require register arguments.<br> <p> But this just begs the question...does the kernel really have no means of detecting or handling stack overflows? That just seems like bad design. Can't the stack be set up so that if it is over-written it will trigger a page fault, and the kernel could handle it? gcc/libc can allocate more stack pages for userspace programs if needed, but why not the kernel?<br> Conexant drivers from Linuxant also requires >4K http://lwn.net/Articles/150866/rss 2005-09-08T18:58:25+00:00 rahulsundaram <p> Well its not really a question of speed. Many kernel updates fix security issues. Custom ones will have to rebuild everytime. For filesystems, FUSE has recently been merged in the upstream kernel and will be included in 2.6.14 version. So it might be possible to use a pure user space solution which wouldnt potentially break with every new kernel update<br> <p> Rahul<br> Binary graphics drivers? http://lwn.net/Articles/150861/rss 2005-09-08T18:25:34+00:00 flewellyn As I recall, the nVidia graphics drivers were fixed to work with 4k stacks at around the time 2.6.7 came out. Either .7 or .8. It's been awhile, though. I have been running with 4k stacks and the nVidia driver for several kernel versions with no incident.<br> Conexant drivers from Linuxant also requires >4K http://lwn.net/Articles/150858/rss 2005-09-08T17:56:34+00:00 astrand On my laptop with the ICH6 chipset, which is running FC4, I'm using the HSF softmodem driver from Linuxant. After many kernel panics, I found out that this driver doesn't work with 4K stacks. I replaced the kernel with a version from Linuxant and now things works, but some drawbacks remains:<br> <p> * I won't recieve any kernel updates via "yum"<br> <p> * I cannot use the NTFS drivers from Livna. <br> <p> To sum up, this problem has been boring and frustrating. I don't care if the kernel is a few percents slower, as long as things works...<br> <p> Binary graphics drivers? http://lwn.net/Articles/150820/rss 2005-09-08T14:27:39+00:00 alspnost Yes - NVidia, at least, fixed their driver to work with 4k stacks back at the time. I used it for many months on my old x86 system, before switching to AMD64.<br> Binary graphics drivers? http://lwn.net/Articles/150809/rss 2005-09-08T13:00:48+00:00 sbergman27 Although I am currently running x86_64, which uses 8k stacks, I have run the same machine on i386 Fedora (4k stacks) with the NVidia drivers with no problem.<br> Conservative Automatic Stack Size Check http://lwn.net/Articles/150798/rss 2005-09-08T11:42:54+00:00 farnz <a href="http://www.gccsummit.org/2005/2005-GCC-Summit-Proceedings.pdf#page=99">The paper starts on page 99 of the proceedings PDF</a>. I've not found it split separately, and the PDF file is quite large (around 1.7MB). Binary graphics drivers? http://lwn.net/Articles/150786/rss 2005-09-08T10:57:28+00:00 NAR I seem to remember that there was a warning in the kernel configuration option about breaking binary-only modules such as NVidia and ATI drivers if the stack is only 4k. Was this problem fixed? <P> <CENTER>Bye,NAR</CENTER> Conservative Automatic Stack Size Check http://lwn.net/Articles/150784/rss 2005-09-08T09:53:18+00:00 pkolloch After a moderate amount of web searching, I could find the abstract of the <br> presentation, but not the paper itself. Any pointers? <br> <br> BTW I did not say that it "easy" for the general case, but for the kernel <br> without dynamic stack allocations and recursion. And OK, I was probably <br> naive and will agree that it is probably also difficult for this special <br> case ;) But both feasible and desirable. I hope Olivier Hainque will be <br> successful in his quest and his work will be applied to the kernel. <br> <br> <font class="QuotedText">&gt; TBH I'd expect that kernel developers' own hunches would be as reliable. </font><br> <br> And predict which variables are being stored in registers and which on the <br> stack and considering all call paths? No, I think humans would miss a lot <br> of special cases on that one. Additionally, not anyone would actually <br> endeavor to do this for anything but some core functions. Am I wrong? <br> Conservative Automatic Stack Size Check http://lwn.net/Articles/150779/rss 2005-09-08T09:12:26+00:00 nix <blockquote> Most stack allocation should be easily statically determinable </blockquote> Some static determination is possible but not easy and not reliable (nor can it ever be reliable in the general case), and the error bars are large. See Olivier Hainque's paper in the GCC 2005 Summit proceedings for a pile of info on this. <p> TBH I'd expect that kernel developers' own hunches would be as reliable. Conservative Automatic Stack Size Check http://lwn.net/Articles/150767/rss 2005-09-08T08:27:48+00:00 pkolloch <font class="QuotedText">&gt; The current proposal is sheer madness. The developers have NO IDEA what the maximum kernel stack usage is, and no way of determining it.</font><br> <p> Then the current state is desperate as well: They don't have a clue if the current stack size limit is sufficient. Your dynamic stack size check would be a step into the right direction, but:<br> <p> Most stack allocation should be easily statically determinable (with only small conservative overapproximations). Things like alloca (if there is a kernel equivalent) or any other means which change the stack size by a dynamically computed amount are more tricky. However, these should be avoided anyways if stack conservation has such a priority.<br> <p> At least conceptionally, computing a call graph with conservative stack usage annotations should be fairly easy (using existing code in GCC). In the absense of recursion, one could easily determine the largest stack size in use. And again, if you value the stack size so much, you should not use recursion. (well, there might be valid use cases with a known maximal recursion depth of 3 or so which might be hard to check statically for machines and even if that is the case, you will need something slightly smarter than plain call graphs.)<br> <p> Without such an automatic check, I pretty much agree with you.<br> <p> [Disclaimer: I have basically no clue about the kernel source except of what I read occasionally on this page.]<br> <p> 4K stacks for everyone? http://lwn.net/Articles/150755/rss 2005-09-08T07:02:04+00:00 jwb A friend relayed this excellent suggestion. Instead of causing great pain among users of ndiswrapper, raid, cryptoloop, xfs, nfs, lustre, and a great many other kernel features, why not accelerate the move to 16KiB soft pages on x86? Then the stack could be kept in a single softpage, with the last 4KiB hardware backing page unallocated. That leaves 12KiB in the stack, and a reliable means of determining when the stack overflows. In addition you get all the other efficiency benefits of larger pages.<br> <p> The current proposal is sheer madness. The developers have NO IDEA what the maximum kernel stack usage is, and no way of determining it. They who are proposing mandatory 4KiB stacks are just crossing their fingers and saying "fuckit, it seems to run on my laptop." That's not a very modern method of software development, especially when the only beneficiaries are a couple of large [elided] customers with over-threaded Java apps.<br> 4K stacks for everyone? http://lwn.net/Articles/150751/rss 2005-09-08T05:43:03+00:00 wtogami NDISWrapper actually requires 16K stacks for reliable operation. 8K isn't always enough.<br>