LWN: Comments on "Per-system-call kernel-stack offset randomization" https://lwn.net/Articles/816085/ This is a special feed containing comments posted to the individual LWN article titled "Per-system-call kernel-stack offset randomization". en-us Thu, 25 Sep 2025 14:54:45 +0000 Thu, 25 Sep 2025 14:54:45 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Per-system-call kernel-stack offset randomization https://lwn.net/Articles/817524/ https://lwn.net/Articles/817524/ ghane Works on Ubuntu 20.04 too <pre> sanjeev@T450s-disco:~$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu Focal Fossa (development branch) Release: 20.04 Codename: focal sanjeev@T450s-disco:~$ groups sanjeev adm disk lp dialout cdrom sudo dip plugdev lpadmin sambashare docker sanjeev@T450s-disco:~$ dmesg |tail -5 [ 7010.888825] wlp3s0: authenticated [ 7010.890961] wlp3s0: associate with 38:d5:47:80:24:c4 (try 1/3) [ 7010.891989] wlp3s0: RX AssocResp from 38:d5:47:80:24:c4 (capab=0x11 status=0 aid=4) [ 7010.893546] wlp3s0: associated [ 7010.913151] IPv6: ADDRCONF(NETDEV_CHANGE): wlp3s0: link becomes ready sanjeev@T450s-disco:~$ </pre> Wed, 15 Apr 2020 12:34:09 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816911/ https://lwn.net/Articles/816911/ zdzichu Actually there's even a sysctl file: <code>/proc/sys/kernel/dmesg_restrict</code>. It's can be toggled any time. Mon, 06 Apr 2020 16:42:25 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816848/ https://lwn.net/Articles/816848/ tao <div class="FormattedComment"> Debian, at the very least, restricts dmesg by default. Ubuntu, on the other hand, doesn't (at least not 18.04 and 19.04; I don't have any newer systems to test on).<br> </div> Mon, 06 Apr 2020 13:27:13 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816327/ https://lwn.net/Articles/816327/ geuder <div class="FormattedComment"> Thanks for your reply, this is indeed a very interesting use case. A strict interpretation of regulation would require us to use that at my work every day.<br> <p> So you are saying developers have root on their workstation, the daemon is running on their workstation, but still the developer cannot prevent that auditing record to be written to the correct, persistent and unmodifiable log for every usage of the credentials?<br> <p> In practice we would need to solve much more fundamental problems in user space than preventing root from getting kernel stack addresses to prevent them from copying and modifying the daemon. Or having the audit records written to a wrong location where an auditor will not find them. Do you have a pointer to the overall design of such a system?<br> </div> Tue, 31 Mar 2020 05:44:13 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816308/ https://lwn.net/Articles/816308/ simcop2387 <div class="FormattedComment"> You don't have to give it root, just give it CAP_SYSLOG which if it's a tool to gather diagnostic information would probably be needed anyway.<br> <p> <p> </div> Mon, 30 Mar 2020 21:32:01 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816300/ https://lwn.net/Articles/816300/ mjg59 <div class="FormattedComment"> Sure. As an example - the credentials required for a user to be able to access production resources from a development workstation may be held in a daemon that generates an audit record for every access. If that secret can be extracted, it can be used without generating that audit trail and could potentially be copied to a different machine. LSM policy can be written to prevent root from being able to interact with that daemon, but that's not helpful if it's relatively straightforward for root to get code execution in the kernel.<br> </div> Mon, 30 Mar 2020 20:01:15 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816293/ https://lwn.net/Articles/816293/ madscientist <div class="FormattedComment"> I expect there will be repercussions. For example, we have a daemon that runs on systems that can be asked to retrieve diagnostic information about a system, and dmesg output is often a critical aspect of that (for example, determining if processes were killed due to OOM, or hardware issues, etc.) Of course, we do not want such a daemon to have to run as root.<br> <p> Restricting access to important system information to root will just provide incentive to give root access to more things, which seems like an anti-pattern to me.<br> <p> If dmesg output is really a security issue then of course something needs to be done, but some careful thought is appropriate.<br> </div> Mon, 30 Mar 2020 19:07:44 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816292/ https://lwn.net/Articles/816292/ jimi <div class="FormattedComment"> Ah - thank you for enlightening me.<br> <p> So I'm left wondering, why not set the default to y? At least one distro runs with this restricted with no ill effects. What are the reasons to not restrict?<br> </div> Mon, 30 Mar 2020 17:58:08 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816291/ https://lwn.net/Articles/816291/ zdzichu <div class="FormattedComment"> It's upstream kernel option:<br> <p> config SECURITY_DMESG_RESTRICT<br> bool "Restrict unprivileged access to the kernel syslog"<br> default n<br> help<br> This enforces restrictions on unprivileged users reading the kernel<br> syslog via dmesg(8).<br> <p> It's there for over 9 years.<br> </div> Mon, 30 Mar 2020 17:14:38 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816290/ https://lwn.net/Articles/816290/ jimi <div class="FormattedComment"> I would guess that little would break. I base this guess on the fact that Slackware does not allow non-root access to dmesg and has not for a long time. If Slackware can find a way to restrict dmesg without breaking things, surely others can as well.<br> <p> jimi@black:~&gt; dmesg<br> dmesg: read kernel buffer failed: Operation not permitted<br> <p> </div> Mon, 30 Mar 2020 17:09:29 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816235/ https://lwn.net/Articles/816235/ geuder <div class="FormattedComment"> <font class="QuotedText">&gt; a bunch of cases </font><br> <p> Would you mind refreshing my memory on this? I once listened to a kernel lockdown presentation by you, but not working on those questions on a regular basis must have lead to too low refresh rate, I am sorry.<br> <p> Second paragraph of <a href="https://mjg59.dreamwidth.org/50577.html">https://mjg59.dreamwidth.org/50577.html</a> says keeping secrets secret from root. Any example what such secret would be and where it would come from so that root cannot access it already without executing code.<br> <p> TPMs are one way to keep private keys secret from root, even the kernel doesn't have them. Of course they are not applicable everywhere, so I don't intend to doubt that there are more use cases.<br> </div> Mon, 30 Mar 2020 11:17:02 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816234/ https://lwn.net/Articles/816234/ geuder <div class="FormattedComment"> You are correct, dmesg seems still to be open everywhere. I must have mixed that up with the experience that journalctl does not show the system journal unless you add yourself to the approriate group or /var/log/* file are increasingly protected.<br> <p> So what would break if we protect /dev/kmsg? Reading random text messages doesn't look like a desirable design for any purpose. Except for systemd-journald of course, but runs as root already.<br> </div> Mon, 30 Mar 2020 10:53:06 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816231/ https://lwn.net/Articles/816231/ mjg59 <div class="FormattedComment"> There's a bunch of cases where you don't want root to have arbitrary code execution in the kernel, so there's still a benefit in preventing root from knowing this.<br> </div> Mon, 30 Mar 2020 07:44:40 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816230/ https://lwn.net/Articles/816230/ gutschke <div class="FormattedComment"> I just checked on four completely different Linux systems that I could quickly log into. All four of them run a different distribution. Some are old-ish, others are bleeding edge. All of them allow non-privileged users to invoke "dmesg". I don't doubt that this can be restricted. But I suspect that it has unexpected side effects and that's why distributions don't do so by default.<br> <p> If 90%+ of all userland doesn't restrict access to kernel messages, then maybe it is a good idea for the kernel to assume that this type of data is available to an attacker.<br> </div> Mon, 30 Mar 2020 06:12:57 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816227/ https://lwn.net/Articles/816227/ gutschke <p>Picture open-coding <code>alloca()</code> using pseudo-code like this:</p><pre>new_stack = old_stack - (rand() &amp; 0x3F);</pre><p>That would do your stack address randomization, and it would give you 6 bits of randomness, as <code>(0x3F + 1 = 1 &lt;&lt; 6)</code>. But now you need to follow up with the alignment that the x86 ABI requires. Let's mask out any LSB that violate the ABI:</p><pre>new_stack = (old_stack - (rand() &amp; 0x3F)) &amp; 0xF;</pre><p>But that's roughly equivalent to writing:<pre>new_stack = old_stack - (rand() &amp; 0x30);</pre><p>Actually, if you look really closely, the transformation isn't exactly correct and sometimes results in a value that is off by 0x10; but let's not worry about that for now. Fixing that would just make the code needlessly complicated and not contribute anything useful to this discussion.</p><p>In any case, as you can see we now only have two bits of randomization. That sounds barely worth the effort. If we want to regain all six bits of stack address randomization. We need to instead do something like:</p><pre>new_stack = old_stack - (rand() &amp; 0x3F0);</pre><p>And that's the 1kB (aka 0x400 bytes) of potentially wasted space. And again, we have lost 0x10 bytes because of my sloppy math a little earlier. Please forgive me, but it makes things easier to read.</p> Mon, 30 Mar 2020 06:03:42 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816228/ https://lwn.net/Articles/816228/ Cyberax <div class="FormattedComment"> No, randomization can happen only at 16-byte intervals. So you're wasting 64*16 bytes.<br> </div> Mon, 30 Mar 2020 05:45:35 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816225/ https://lwn.net/Articles/816225/ geuder <div class="FormattedComment"> <font class="QuotedText">&gt; there are still numerous kernel messages that will expose addresses of data structures, including the stack, in the kernel log.</font><br> <p> The kernel log? Where is the kernel log readable to non-root in any current system that tries to be somewhat security aware? Once the attacker is root there a probably worse problems.<br> <p> Even without buying that argument I don't say that the approach is useless. Smart crackers will find ways nobody has thought of (or at least not prepared for), so defense in depth should not harm.<br> <p> </div> Mon, 30 Mar 2020 05:41:47 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816226/ https://lwn.net/Articles/816226/ jorgegv <div class="FormattedComment"> Sorry, I don't understand your reasoning. If Up to 64 bytes are wasted by randomization and Up to 16 bytes are wasted due to alignment, that makes a máximum of 80 bytes wasted, not 1kB.<br> <p> I asume the randomization is done once per syscall invocation, right?<br> </div> Mon, 30 Mar 2020 05:35:08 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816222/ https://lwn.net/Articles/816222/ gutschke <div class="FormattedComment"> The point of this patch is that instead of having a single base address for the system call stack, there should be 32 or 64 distinct addresses (depending on architecture). This makes it impossible for an attacker to reliably guess addresses on the stack. And hopefully, that will cause (some) attacks to fail with some sort of kernel crash. Of course, if there is no crash, an attacker can just keep guessing and eventually they'll get lucky.<br> <p> Randomization happens by the virtue of random amounts of data being allocated on the stack. This happens right at the point of the transition from user space to kernel space.<br> <p> But alloca() knows about the x86 ABI. And the ABI requires that stack frames are aligned in 16 byte increments. That's needed, because some CPU instructions want aligned data (I believe this mostly affects SSE instructions). The compiler assumes that stacks are always aligned when the program starts (or in this case, when the system call starts executing in the kernel) and then makes sure the necessary padding is added whenever a function call is made.<br> <p> There really isn't any unused memory that is readily available for other purposes.<br> </div> Mon, 30 Mar 2020 03:05:03 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816219/ https://lwn.net/Articles/816219/ Paf <div class="FormattedComment"> I'm a bit confused by this - If this is indeed the case, why not use the rest of the space rather than leave it unused? Can you explain the calculation that gets you to 1K in more detail?<br> </div> Mon, 30 Mar 2020 02:25:04 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816203/ https://lwn.net/Articles/816203/ gutschke <div class="FormattedComment"> In the discussion of the how many bits of randomness get introduced, there is a statement that randomness comes at the cost of increased memory usage. At first sight, this seems puzzling, as even 6 bits would only yield 64 bytes of added stack usage. But what the article failed to mention is the fact that the x86 ABIs require a 16 byte stack alignment. So, those 6 bits turn into 1kB of extra memory usage. That's quite significant with notoriously small kernel stacks (8kB or 16kB depending on architecture).<br> </div> Sun, 29 Mar 2020 19:10:38 +0000 Per-system-call kernel-stack offset randomization https://lwn.net/Articles/816184/ https://lwn.net/Articles/816184/ marcH <div class="FormattedComment"> Is there any chance this may help reveal "Programming by coincidence" bugs? Crashes due to memory corruption come and go depending on which way the wind blows. This looks like a bit more wind so I'm wondering.<br> </div> Sat, 28 Mar 2020 23:53:15 +0000