LWN: Comments on "The current state of kernel page-table isolation" https://lwn.net/Articles/741878/ This is a special feed containing comments posted to the individual LWN article titled "The current state of kernel page-table isolation". en-us Wed, 22 Oct 2025 03:02:35 +0000 Wed, 22 Oct 2025 03:02:35 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Per-task PTI https://lwn.net/Articles/744282/ https://lwn.net/Articles/744282/ nix <div class="FormattedComment"> This would presumably also mean we could do the inverse, and turn it off for everything other than network-facing programs and things like web browsers, much like many non-hardened distros do for things like sendmail and chromium now.<br> </div> Sun, 14 Jan 2018 12:32:38 +0000 Per-task PTI https://lwn.net/Articles/744259/ https://lwn.net/Articles/744259/ corbet Per-process (or perhaps per-thread, maybe) granularity for PTI is in the works and will show up eventually. I'll write an update early next week that will include this work. Sat, 13 Jan 2018 23:41:22 +0000 The current state of kernel page-table isolation https://lwn.net/Articles/744246/ https://lwn.net/Articles/744246/ oversimplistic <div class="FormattedComment"> For systems/loads that suffer excessively from PTI there is the nopti option that can be used in a trusted environment. Would it make any sense and be practical to support turning it off for a specific process and its children, perhaps with an interface similar to sudo?<br> </div> Sat, 13 Jan 2018 21:24:38 +0000 The current state of kernel page-table isolation https://lwn.net/Articles/742789/ https://lwn.net/Articles/742789/ samiam95124 <div class="FormattedComment"> So forgive the ignorance of a newcomer, but the %5 appears to come from the need to flush TLBs to cross page sets. Intel (as well as AMD) have had advanced virtualization support for a while now (mainly aimed at multiple VM architectures) that allow the TLB to hold "address space identifiers" so that the TLB can, in fact, hold different working sets at the same time, even though only one is active. The idea of that feature was that different VMs could cross from one to the other without the typical TLB flush penalty. However, that feature seems ideal for anytime that two disjoint working sets need to be in use with rapid switching between them. This sounds taylor made for KPTI?<br> <p> The need to hold disjoint working sets for kernel and user is not a new thing. 360/VM did this, and most virtual paged processors outside of the 80x86 series would simply swap the register that holds the root page table on traps, so that you could implement any level of isolation you wanted (by mapping some, most or all of the pages jointly between page sets).<br> </div> Thu, 04 Jan 2018 01:49:40 +0000 The current state of kernel page-table isolation https://lwn.net/Articles/742556/ https://lwn.net/Articles/742556/ mrhines <div class="FormattedComment"> On the issue of the potential 5% performance penalty, is there any thought being given to exception-less system calls as a solution to flushing all the time? <a href="https://www.usenix.org/legacy/events/osdi10/tech/full_papers/Soares.pdf">https://www.usenix.org/legacy/events/osdi10/tech/full_pap...</a><br> </div> Tue, 02 Jan 2018 20:29:57 +0000 The current state of kernel page-table isolation https://lwn.net/Articles/742327/ https://lwn.net/Articles/742327/ zlynx <div class="FormattedComment"> It's still better than container runtimes last updated in 2010.<br> </div> Wed, 27 Dec 2017 16:47:29 +0000 The current state of kernel page-table isolation https://lwn.net/Articles/742301/ https://lwn.net/Articles/742301/ joib <div class="FormattedComment"> <font class="QuotedText">&gt; And what about risc-v? Have they managed to avoid this or are they also vulnerable? I see that the privileged isa is still in draft status, so maybe they can still fix it? </font><br> <p> Replying to myself, seems they haven't. Here's a proposal by Christoph Hellwig to fix it: <a href="https://groups.google.com/a/groups.riscv.org/d/msg/isa-dev/JU0M_vug4R0/YELX92fIBwAJ">https://groups.google.com/a/groups.riscv.org/d/msg/isa-de...</a><br> </div> Tue, 26 Dec 2017 19:53:46 +0000 The current state of kernel page-table isolation https://lwn.net/Articles/742300/ https://lwn.net/Articles/742300/ farnz <p>SPARC and S390 support "Address Space Identifiers" (ASIs). In this world, the hardware extends the address supplied to the MMU with an ASI to tell the MMU what sort of access is being made - and the only way to set your ASI is to be in privileged code. <p>In SPARC v8, for example, there are privileged instructions to use an arbitrary 8-bit ASI; user mode instruction fetch is ASI 0x08, supervisor mode instruction fetch is ASI 0x09, user mode normal data access is ASI 0x0A, and supervisor mode normal data access is ASI 0x0B. The MMU sees the combination of virtual address and ASI, and does translation accordingly; the normal setup to get a KAISER-like situation is to have ASIs 0x08 and 0x0A only able to access user mode memory (kernel memory is simply not mapped in ASIs 0x08 and 0x0A), while ASIs 0x09 and 0x0B can see kernel memory, too. <p>In more complex setups, user memory is not mapped for the supervisor ASIs, and you use the advanced instructions to override ASI when you're doing to <tt>copy_(to|from)_user</tt> functions, so that an erroneous access to user memory just fails. Tue, 26 Dec 2017 19:04:35 +0000 The current state of kernel page-table isolation https://lwn.net/Articles/742278/ https://lwn.net/Articles/742278/ roc <div class="FormattedComment"> And "redzone? what redzone?"<br> </div> Mon, 25 Dec 2017 20:20:54 +0000 The current state of kernel page-table isolation https://lwn.net/Articles/742245/ https://lwn.net/Articles/742245/ joib <div class="FormattedComment"> The comments in the linked KAISER article mentioned that SPARC and s390 already use separate address spaces for user and kernel. Do those have some hw feature that allows them to do it with less overhead (if so, what?) , or is it just a convention?<br> <p> And what about risc-v? Have they managed to avoid this or are they also vulnerable? I see that the privileged isa is still in draft status, so maybe they can still fix it? <br> </div> Mon, 25 Dec 2017 16:09:35 +0000 The current state of kernel page-table isolation https://lwn.net/Articles/742234/ https://lwn.net/Articles/742234/ joib <div class="FormattedComment"> Ah, it's mentioned in the linked KAISER article. Doh. <br> </div> Mon, 25 Dec 2017 07:44:57 +0000 The current state of kernel page-table isolation https://lwn.net/Articles/742233/ https://lwn.net/Articles/742233/ joib <div class="FormattedComment"> Is this similar to the 4G/4G patches of yore? <br> </div> Mon, 25 Dec 2017 07:21:17 +0000 The current state of kernel page-table isolation https://lwn.net/Articles/742148/ https://lwn.net/Articles/742148/ luto <div class="FormattedComment"> I think so. It's actually very little code, and the PGD entry isn't allocated until someone actually calls modify_ldt() <br> </div> Fri, 22 Dec 2017 17:57:55 +0000 The current state of kernel page-table isolation https://lwn.net/Articles/742051/ https://lwn.net/Articles/742051/ luto <div class="FormattedComment"> Indeed: <a href="https://github.com/golang/go/issues/14795">https://github.com/golang/go/issues/14795</a><br> <p> The Go runtime is, in my experience, really quite crappy. This isn't the first time it's been caught using a wildly outdated kernel feature.<br> </div> Thu, 21 Dec 2017 15:25:52 +0000 The current state of kernel page-table isolation https://lwn.net/Articles/742037/ https://lwn.net/Articles/742037/ dvrabel <div class="FormattedComment"> The go runtime uses (or has used, I've not checked if the latest version of go still does this) the LDT, so there are probably fewer systems out there that don't need LDT support than you think.<br> </div> Thu, 21 Dec 2017 14:35:52 +0000 The current state of kernel page-table isolation https://lwn.net/Articles/741967/ https://lwn.net/Articles/741967/ josh <div class="FormattedComment"> Does the additional LDT handling mentioned in the article (having an extra PGD for LDTs) get skipped and compiled out in that case?<br> </div> Wed, 20 Dec 2017 20:06:40 +0000 The current state of kernel page-table isolation https://lwn.net/Articles/741966/ https://lwn.net/Articles/741966/ luto <div class="FormattedComment"> It's already there :) CONFIG_SYSCALL_MODIFY_LDT or something like that.<br> </div> Wed, 20 Dec 2017 19:59:34 +0000 The current state of kernel page-table isolation https://lwn.net/Articles/741956/ https://lwn.net/Articles/741956/ josh <div class="FormattedComment"> Would it be reasonable to add a CONFIG option to completely disable LDT support, for the *large* number of systems that don't need it?<br> </div> Wed, 20 Dec 2017 18:43:50 +0000 The current state of kernel page-table isolation https://lwn.net/Articles/741921/ https://lwn.net/Articles/741921/ luto <div class="FormattedComment"> <font class="QuotedText">&gt; That allows space for a large number of LDTs, needed on systems with many CPUs</font><br> <p> Not quite. The reserved space is per-process and contains at most two LDTs. I reserved all that space because the pagetable management for that space is more like user memory than kernel memory, and mixing the two styles in the same PGD entry could lead to nasty synchronization issues.<br> <p> The reason that there are two LDTs per process is to keep atomic LDT switches simple. The old and new LDTs are both mapped and then all affected CPUs are notified of the change.<br> </div> Wed, 20 Dec 2017 16:06:11 +0000