LWN: Comments on "A full task-isolation mode for the kernel" https://lwn.net/Articles/816298/ This is a special feed containing comments posted to the individual LWN article titled "A full task-isolation mode for the kernel". en-us Thu, 02 Oct 2025 16:38:40 +0000 Thu, 02 Oct 2025 16:38:40 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net A full task-isolation mode for the kernel https://lwn.net/Articles/828493/ https://lwn.net/Articles/828493/ rezete22 <div class="FormattedComment"> Hello <br> <p> Any update on this patchset. Is it pushed upstream? Which version of Linux works with this patchset?<br> <p> R/<br> </div> Tue, 11 Aug 2020 00:46:54 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/817600/ https://lwn.net/Articles/817600/ jithu83 <div class="FormattedComment"> <font class="QuotedText">&gt; Some applications require more than just a CPU core's resources to itself, memory contention (L3, or beyond), common IO paths etc... can produce slow or even starved applications. Another app on another core can hog and consume L3 and DRAM bandwidth at the detrement to others for example.</font><br> <p> <p> X86_CPU_RESCTRL kernel config option does provide some fine grained control over memory bandwidth, L2/L3 cache partitioning/locking etc. This is available only on certain newer processors and requires additional effort to correctly provision these to the appropriate process etc<br> </div> Wed, 15 Apr 2020 18:43:25 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/817334/ https://lwn.net/Articles/817334/ dave4444 <div class="FormattedComment"> <p> Looks like some good progress here, but what about events that may (or may not) be out of the control of the kernel? Such as:<br> <p> SMM/SMI/NMI on that CPU, this may not be preventable, but could it be detected?<br> <p> ECC errors can cause very unpredictable slowdowns (especially correctable ones).<br> <p> Some applications require more than just a CPU core's resources to itself, memory contention (L3, or beyond), common IO paths etc... can produce slow or even starved applications. Another app on another core can hog and consume L3 and DRAM bandwidth at the detrement to others for example.<br> <p> <p> </div> Sun, 12 Apr 2020 16:57:48 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/817087/ https://lwn.net/Articles/817087/ erkki <div class="FormattedComment"> Right, that makes sense. At least the kernel could allow for writeback without guaranteeing per page atomicity. In that case writeback CPU and the isolated CPU can operate on the page concurrently. <br> </div> Thu, 09 Apr 2020 00:30:35 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/817025/ https://lwn.net/Articles/817025/ luto <div class="FormattedComment"> You could have a little daemon that periodically copies the file from tmpfs to the real backing store and have the isolated code write to tmpfs.<br> <p> Years ago, I worked a little bit on reducing the stalls from writing to a recently written-back mmapped file.  I made some progress but never upstreamed it.  Some day I'll finish the job.<br> <p> </div> Wed, 08 Apr 2020 02:01:17 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/817024/ https://lwn.net/Articles/817024/ ncm <div class="FormattedComment"> Ok, thank you. It sounds like the multi-millisecond stall is not necessary, but flushing the TLB entry for the page is.<br> <p> In practice, though, a mapped stats file will *always* be dirty, no matter how frequently it is flushed. Maybe what we need is an mmap flag to tell the kernel that it should always assume a given mapping is dirty, and should skip the dance of checking. At least mmap flags are discoverable, unlike (e.g.) ioctls.<br> <p> Mapping from an unbacked fs and copying from there works, but is a deployment problem. The user wants the stats file where they want it, and it is not easy to discover whether the place they want it is backed.<br> <p> <p> </div> Tue, 07 Apr 2020 23:45:33 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/817023/ https://lwn.net/Articles/817023/ luto <div class="FormattedComment"> In your model, if an isolcpus task maps a file writably, the kernel will have to write back every mapped page every 30 seconds or so regardless of whether it’s written. Or, at least, every page that has ever been written. I don’t think this is wise.<br> </div> Tue, 07 Apr 2020 22:35:29 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/817009/ https://lwn.net/Articles/817009/ josh <div class="FormattedComment"> What would it take to eliminate the periodic wakeups for vmstats entirely, not just on isolated CPUs but in general? That seems like something the kernel could account for incrementally as it performs operations, and then just sum up when something wants the statistics.<br> </div> Tue, 07 Apr 2020 17:03:06 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/817004/ https://lwn.net/Articles/817004/ liralon <div class="FormattedComment"> It seems to me that raising a fatal signal to the isolated task (which will kill it) when the task invokes a system call is an issue.<br> <p> It should be quite common for a fast-path running as an isolated task, to use a system call to wake-up some slow-path. For example, GCP Andromeda (<a href="https://www.usenix.org/system/files/conference/nsdi18/nsdi18-dalton.pdf">https://www.usenix.org/system/files/conference/nsdi18/nsd...</a>) use system calls to wake-up the coprocessor thread or set irqfd (To raise a virtual interrupt to guest).<br> <p> I would have expected instead, that the fatal signal will be raised to the isolated task when the kernel will reach some code that is about to block the task. E.g. wait for completion of some I/O request or sleep until some eventfd is set.<br> </div> Tue, 07 Apr 2020 14:52:35 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/816949/ https://lwn.net/Articles/816949/ caliloo <div class="FormattedComment"> I wonder how this interacts with the new system calls that can be placed on a buffer ring, sounds to me like you’d be able to place system calls without leaving isolation, which is kind of nice.<br> </div> Tue, 07 Apr 2020 13:07:36 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/816948/ https://lwn.net/Articles/816948/ ncm <div class="FormattedComment"> There is no reasonable expectation of coherence anyway, because there is no sychronization available. It will get dirty again, and get copied out again, later.<br> <p> If I wanted it clean, I would do something else and take the hit. The whole point of isolcpus is not to stall. If it takes a stall to get magickal feature X, it means I don't want magickal X. Just give me whatever approximation to X can be done without stalling.<br> <p> Isolcpus: not want stalls. What is not clear?<br> </div> Tue, 07 Apr 2020 10:07:35 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/816944/ https://lwn.net/Articles/816944/ luto <div class="FormattedComment"> I’m afraid this can’t really be done. Suppose CPU A runs an isolated task and writes to a mapped page. Now CPU A has a writable dirty TLB entry for the page.<br> <p> Then CPU B starts writeback. Subsequently, CPU A writes to the page again.<br> <p> Without a TLB flush, the kernel has no way to know that the page has been dirtied again.<br> </div> Tue, 07 Apr 2020 03:18:22 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/816943/ https://lwn.net/Articles/816943/ ncm <div class="FormattedComment"> Also, why use SIGKILL, and then provide a back-door way to change it to some other signal, instead of using one of the numerous other defaults-to-terminate, or even defaults-to ignore, signals? That seems like complexity for the sake of complexity. Even inventing a new signal number for the purpose would be simpler.<br> <p> Using a defaults-to-ignore signal would be more compatible with an eventual goal of automatic task isolation for programs that spin. If you want to drop core if your program violates isolation, a handler is the right way to make it happen. We don't need another.<br> </div> Tue, 07 Apr 2020 02:18:49 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/816941/ https://lwn.net/Articles/816941/ ncm <div class="FormattedComment"> TLB shootdowns are the least of the problem. I see multi-millisecond pauses caused by the behavior in question. Such files have to live in a tmpfs, shmfs, or the like, and snapshots of the stats normally kept in them have to be produced by copying to another file in the same fs, and then out, to prevent pathological stalls.<br> <p> The core that is busy writing out the page necessarily generates contention on the isolated core's cache bus, as dirty cache lines are copied out of it, but no TLB shenanigans ought to be needed.<br> </div> Tue, 07 Apr 2020 01:44:24 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/816940/ https://lwn.net/Articles/816940/ ncm <div class="FormattedComment"> Claiming a core for isolation would reasonably require a capability normally provided only to root.<br> <p> Systems that configure isolated cores are rarely shared between organizations, although that will probably change as it becomes increasingly impractical to run without.<br> </div> Tue, 07 Apr 2020 01:34:52 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/816939/ https://lwn.net/Articles/816939/ flussence <div class="FormattedComment"> “Soft” isolation is still useful for HPC work, where you'd like to have something stronger than SCHED_BATCH, but nobody's going to end up taken away in an ambulance if it only gets 99.99% of the CPU.<br> </div> Tue, 07 Apr 2020 01:13:17 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/816938/ https://lwn.net/Articles/816938/ gus3 <div class="FormattedComment"> This could be an exploit waiting to happen.<br> <p> Scenario: a multi-core system with N cores. What's to stop a process from forking N times, then each process isolating itself? Perhaps the Nth isolation attempt will fail, but now, you have N-1 isolated processes, and the last core is saturated as if running on a single-core system.<br> <p> Does this now provide a new opportunity to use Meltdown/Spectre-style exploits against N-1 non-isolated processes?<br> </div> Tue, 07 Apr 2020 01:09:20 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/816936/ https://lwn.net/Articles/816936/ erkki <div class="FormattedComment"> Preventing TLB shootdowns due to writeback of file backed mmaps would really interesting. Maybe through a new mmap flag?<br> </div> Tue, 07 Apr 2020 00:07:10 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/816933/ https://lwn.net/Articles/816933/ ncm <div class="FormattedComment"> OK, but sensible defaults for isolcpus is no more work than crazy defaults.<br> <p> </div> Mon, 06 Apr 2020 23:18:47 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/816932/ https://lwn.net/Articles/816932/ f18m <div class="FormattedComment"> I fully agree with ncm: what's the sense of isolating a CPU core other than the need of using it without any sort of interrupt to emulate a real-time OS ?<br> <p> However I'm unsure whether having the kernel deciding that a taskset should be "fully isolated" after a few milliseconds of zero-system-calls... <br> <p> Anyway this patch set would be greatly useful to e.g. DPDK applications, which are mostly often using isolcpus and nohz options already.<br> Looking forward for it!<br> <p> I'd also love to have this RTOS-like as a post-boot option somewhere (maybe a sysctl setting?) rather than being forced to create scripts that must interact with the bootloader (GRUBv1, GRUBv2, etc) to deploy a new Linux boot option... moreover the reboot required to apply this change may not be acceptable in some contexts...<br> <p> </div> Mon, 06 Apr 2020 23:15:33 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/816929/ https://lwn.net/Articles/816929/ martin.langhoff <div class="FormattedComment"> Gotta walk before you run, but long term I completely agree.<br> <p> Ideally this feature evolves towards essentially being as automagic as possible... <br> </div> Mon, 06 Apr 2020 23:04:35 +0000 A full task-isolation mode for the kernel https://lwn.net/Articles/816923/ https://lwn.net/Articles/816923/ ncm <div class="FormattedComment"> About time.<br> <p> But instead of the isolated process needing to call prctl(), it should happen automatically, for any process running on (and, by inference, bound to) an isolated CPU, after no system calls / page faults have occurred for a few milliseconds. The prctl() call should be needed only if that wouldn't be soon enough.<br> <p> Also, "nohz" and "domain" should be _the_default_ on any isolcpu. I am not going to isolate a core, yet still want a load of crap interruptions sent to it. I isolated it for reasons. If I want interruptions I can ask for them.<br> <p> Files that are mapped to an isolated process image (and the file descriptor closed) should never, ever cause the process to be blocked, even if the kernel decides it is time to copy changes in mapped memory to the spinny disk image. If tearing would be a problem, it is the process's problem to solve.<br> <p> Finally, taskset should be able to designate, _all_by itself_, that the core the process is bound to is to be fully isolated. This business of needing to reserve isolcpus at boot time is nonsense.<br> </div> Mon, 06 Apr 2020 21:43:58 +0000