LWN: Comments on "Short waits with umwait" https://lwn.net/Articles/790920/ This is a special feed containing comments posted to the individual LWN article titled "Short waits with umwait". en-us Mon, 20 Oct 2025 18:53:43 +0000 Mon, 20 Oct 2025 18:53:43 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Short waits with umwait https://lwn.net/Articles/791252/ https://lwn.net/Articles/791252/ intgr <div class="FormattedComment"> The main topic: Um, wait...<br> <p> </div> Mon, 17 Jun 2019 14:28:39 +0000 Short waits with umwait https://lwn.net/Articles/791242/ https://lwn.net/Articles/791242/ jcm <div class="FormattedComment"> I believe recent enough parts allow a small delta in the VMEXIT to be controlled by the Hypervisor. No idea if this is respected for the userspace mwait instructions, but it could be. In any case, you're coming out of the VM and when you go back in you're going to have to restart the pause. Similar for kernel/userspace. Interrupts will preempt anything like this. Otherwise you'd have a DoS opportunity.<br> </div> Mon, 17 Jun 2019 04:08:55 +0000 Short waits with umwait https://lwn.net/Articles/791241/ https://lwn.net/Articles/791241/ ncm <div class="FormattedComment"> The kernel would need to resume the thread after the wait, as the deadline would certainly have passed. It could resume at the instruction, but that would break immediately so there would be no point -- unless sleeps longer than a scheduling interval were permitted, which seems unlikely.<br> <p> But probably you would run this on an isolcpu, with nohz, and hope never to get scheduled out.<br> <p> It appears I have not yet discovered a formula that guarantees no schedule breaks, ever. I would welcome enlightenment.<br> </div> Mon, 17 Jun 2019 03:01:13 +0000 Short waits with umwait https://lwn.net/Articles/791240/ https://lwn.net/Articles/791240/ ncm <div class="FormattedComment"> It occurs to me that there is really no need for this instruction -- micro-op fusion ought to recognize a busy-wait loop, and translate it internally. <br> <p> Since Haswell, Intel already does fusion of two ALU instructions and two branches to one cycle -- on a good day, anyway; when I last checked, Clang was very bad at putting instructions in the right order to get this.<br> </div> Mon, 17 Jun 2019 02:48:19 +0000 Short waits with umwait https://lwn.net/Articles/791239/ https://lwn.net/Articles/791239/ xywang <div class="FormattedComment"> What will happen if the sleeping task should be interrupted for reschedule? Does these instructions temporarily ignore or delay time interrupt?<br> </div> Mon, 17 Jun 2019 00:54:37 +0000 Short waits with umwait https://lwn.net/Articles/791219/ https://lwn.net/Articles/791219/ caliloo <div class="FormattedComment"> Was thinking the same thing. Would be nice to have more details on the limitations of the memory range that can be specified (hopefully one cannot provide a -virtual I suppose- range that starts at 0x0 of length 2ˆ64 ... and how that instruction performs with DMA.<br> </div> Sun, 16 Jun 2019 12:01:01 +0000 Short waits with umwait https://lwn.net/Articles/791200/ https://lwn.net/Articles/791200/ cpitrat Let's <i>return</i> to the <i>main</i> topic. Sat, 15 Jun 2019 08:13:01 +0000 Short waits with umwait https://lwn.net/Articles/791199/ https://lwn.net/Articles/791199/ cpitrat <div class="FormattedComment"> <font class="QuotedText">&gt; with the downside being that microcode is inaccessible and cannot be extended.</font><br> <p> We need eBPF for micro-code!<br> </div> Sat, 15 Jun 2019 08:09:56 +0000 Short waits with umwait https://lwn.net/Articles/791198/ https://lwn.net/Articles/791198/ ncm <div class="FormattedComment"> I will totally use this instruction to watch for a mapped ring buffer's head indicator to be updated after a batch of packets has been DMA'd in.<br> <p> I assume that, instead of hammering the bus as one would in a spin loop, the relevant cache line is just primed to watch for an invalidation notification from the bus, and let the hyperthread proceed. So, the sleep is very like a cache miss stall, and the wake very like delivery of the missed line.<br> <p> It looks to me as if the main desirable result of using this instruction (vs a spin) is that other threads that have a productive use for the memory bus will not driven off of it?<br> </div> Sat, 15 Jun 2019 04:22:05 +0000 Short waits with umwait https://lwn.net/Articles/791183/ https://lwn.net/Articles/791183/ dskoll I'm intrigued... <code>continue</code> please... Fri, 14 Jun 2019 15:49:37 +0000 Short waits with umwait https://lwn.net/Articles/791137/ https://lwn.net/Articles/791137/ nilsmeyer <div class="FormattedComment"> <font class="QuotedText">&gt; Actually it is strange it took so long to add a TPAUSE ("tea pause"? :-) instruction.</font><br> <p> they couldn't use coffee break since the term break is ambiguous ;)<br> </div> Fri, 14 Jun 2019 12:55:32 +0000 Short waits with umwait https://lwn.net/Articles/791126/ https://lwn.net/Articles/791126/ Tov <div class="FormattedComment"> Actually it is strange it took so long to add a TPAUSE ("tea pause"? :-) instruction. There are many places a hardware driver needs a slight (microsecond) delay, where scheduler ticks are much too coarse and NOP loops are too unreliable (being clock speed dependent).<br> <p> I still remember in horror how a number of ISA bus reads were used for small delays, as they were guaranteed to be some amount of slow. :-/ ... Heh! I even found a stackoverflow answer describing that practice:<br> <a href="https://stackoverflow.com/questions/6793899/what-does-the-0x80-port-address-connect-to">https://stackoverflow.com/questions/6793899/what-does-the...</a><br> </div> Fri, 14 Jun 2019 08:59:35 +0000 Short waits with umwait https://lwn.net/Articles/791116/ https://lwn.net/Articles/791116/ evanp <div class="FormattedComment"> The kernel-only versions (monitor/mwait, without the 'u') have been around since SSE3, though tpause is new....<br> </div> Fri, 14 Jun 2019 02:55:48 +0000 Short waits with umwait https://lwn.net/Articles/791113/ https://lwn.net/Articles/791113/ Fowl <div class="FormattedComment"> Interesting that userspace seems to get this before the kernel itself. Surely spinwaits are used more in the kernel?<br> </div> Fri, 14 Jun 2019 01:29:04 +0000 Short waits with umwait https://lwn.net/Articles/791099/ https://lwn.net/Articles/791099/ wahern <div class="FormattedComment"> Does it shy away from it? I think it's just been abstracted away.<br> <p> Conceptually the way an OS is supposed to work is that an ethernet packet arrives, interrupts the CPU which jumps to the scheduler which jumps into the process sleeping on a socket read which reads the data which writes to a pipe which transfers control to a second process sleeping on a read from the pipe... It doesn't get more event oriented than that.<br> <p> What makes this type of instruction different is that it's waiting on memory addresses. But doing this in a generic way--being able to detect changes on any [virtual] memory address--is actually quite expensive to do in the hardware. It's why we don't have fully general LL/SC for proper software transactional memory. You'd have to add an extra bit, at least, to every byte- or cacheline-sized block of memory in the system to track reads/writes. So instead you get interfaces that look general but really have some clever hackery in the microcode which suspiciously looks like the kind of solution you'd usually implement in the kernel, with the downside being that microcode (the new lowest-level software layer) is inaccessible and cannot be extended.<br> <p> Ultimately what this is all about is being able to transfer logical control to different threads of execution. Blocking IPC was the OS-level abstraction that made this simple and intuitive. But things got more complicated and it wasn't as convenient and performant as it used to be, or at least was perceived that way. Some of the alternatives did a better job at abstracting control transfer than others.<br> <p> </div> Thu, 13 Jun 2019 20:07:30 +0000 Short waits with umwait https://lwn.net/Articles/791096/ https://lwn.net/Articles/791096/ flussence <div class="FormattedComment"> It's somewhat weird to me that hardware shies away from this kind of event-driven design, after all they've been on the “performance per watt” drive for over a decade now. But I guess there's some level of paranoia about security implications of timing attacks now, especially within Intel.<br> </div> Thu, 13 Jun 2019 19:04:53 +0000