LWN: Comments on "An Ubuntu kernel bug causes container crashes" https://lwn.net/Articles/899420/ This is a special feed containing comments posted to the individual LWN article titled "An Ubuntu kernel bug causes container crashes". en-us Tue, 30 Sep 2025 09:29:49 +0000 Tue, 30 Sep 2025 09:29:49 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/902711/ https://lwn.net/Articles/902711/ daenzer <div class="FormattedComment"> Any non-trivial CI requires constant development &amp; maintenance effort. I&#x27;d argue it&#x27;s effort well spent though.<br> </div> Thu, 28 Jul 2022 07:45:32 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900805/ https://lwn.net/Articles/900805/ wtarreau <div class="FormattedComment"> Thanks for this, it&#x27;s very important to have as many -rc users as possible, precisely for the reasons you explained well.<br> <p> </div> Tue, 12 Jul 2022 04:30:08 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900743/ https://lwn.net/Articles/900743/ farnz <p>The trouble is that stable kernels <em>do</em> contain bugs all over the shop, some of which are exploitable. So the question becomes not "are there bugs in my EOL kernel?", to which the answer is definitely "yes", but "are the bugs in my EOL kernel of concern to me, given that I do not know the scope and impact of the bugs in my kernel?", which is a much harder question to answer. <p>And it's made exponentially harder by regressions in newer kernels which means that there's no good answer - do you take a newer kernel that fails to boot one time in 10 because your PCIe GPU is left in a bad state by firmware, or stick to the older kernel that has a remotely exploitable bug that you don't know about that lets an intruder escalate privileges on your machine. <p>Ideally, there would simply not be regressions in the kernel, so updating would always be the right thing to do. But that's not the world we live in; my experience is that I'm better off taking Linus's recent release, finding regressions and reporting them ASAP (so that the bug reports go to people who've been working in the right bits of the kernel recently, and bisect is often possible in reasonable time) than putting off updates for as long as possible and then reporting a huge number of regressions in one go, but other people will have had other experiences. Mon, 11 Jul 2022 18:16:55 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900733/ https://lwn.net/Articles/900733/ wtarreau <div class="FormattedComment"> <font class="QuotedText">&gt; EOLing a kernel does not magically cause bugs to appear.</font><br> <p> No but one thing is certain, it will not magically fix all those that are discovered daily and that affect it.<br> For sure the best way not to know about bugs is to use an EOL version that doesn&#x27;t receive fixes.<br> <p> <font class="QuotedText">&gt; In particular, the current stable kernel needs to contain 2000 bugs so that when it will be EOLed, it will miss 2000 fixes.</font><br> <p> Maybe more maybe less, who knows.<br> <p> <font class="QuotedText">&gt; &gt; In particular, &quot;some will corrupt data, cause random hangs, disconnect your WiFi during an audio conf, make your screen disappear after resume, leave phantom USB devices after some errors, let an intruder escalate privileges on your machine, etc.&quot;</font><br> <font class="QuotedText">&gt; This is not reassuring.</font><br> <p> But that&#x27;s why there are LTS kernels for those who want to stick as long as possible to what works best for them. Some people only deploy a kernel on sensitve systems after one year, so that most of the recent regressions are out of the way. I personally deploy new LTS kernels on my laptop so that I can spot changes or bugs early, and have time to get them fixed before these kernels need to reach servers. That&#x27;s reasonable.<br> <p> </div> Mon, 11 Jul 2022 16:54:04 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900732/ https://lwn.net/Articles/900732/ wtarreau <div class="FormattedComment"> Oh I&#x27;ve known such people and even had to combat them because they were refusing to apply the mandatory fixes for a bug that was causing their firewall to leak memory like crazy and crash every two weeks or so. Instead they wanted to profit from a downtime caused by the crash to install more RAM and postpone the next crash! When insisting for applying fixes, to the question &quot;but if it bugs?&quot; I had to respond &quot;in the worst case it will continue not to work&quot;.<br> <p> But what you seem to be ignoring here is that the older the kernel, the harder it is to backport fixes, and the most likely they are to be wrong, particularly when taken out of the context of all other fixes that were surrounding the original patch. When I was a stable maintainer, I used to receive many messages like &quot;do not take this patch without this one&quot; or &quot;I&#x27;ll provide you a different one for this version as it&#x27;s not sufficient&quot; etc. The risk of getting a fix wrong when applying it yourself to a tree without the author&#x27;s approval is quite high. Thus in addition to missing tons of fixes, the few you get (the so called &quot;security fixes&quot; that make vendors sell) are often bogus and are the ones that will take your system down.<br> <p> Really, do not use EOL kernels.<br> </div> Mon, 11 Jul 2022 16:47:21 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900637/ https://lwn.net/Articles/900637/ ballombe <div class="FormattedComment"> EOLing a kernel does not magically cause bugs to appear.<br> In particular, the current stable kernel needs to contain 2000 bugs so that when it will be EOLed, it will miss 2000 fixes. In particular, &quot;some will corrupt data, cause random hangs, disconnect your WiFi during an audio conf, make your screen disappear after resume, leave phantom USB devices after some errors, let an intruder escalate privileges on your machine, etc.&quot;<br> This is not reassuring.<br> </div> Mon, 11 Jul 2022 09:38:35 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900624/ https://lwn.net/Articles/900624/ jordan <div class="FormattedComment"> Apologies; I came to those numbers by comparing release dates and EOL dates, and I&#x27;m already bad enough at math without getting calendars involved.<br> </div> Sun, 10 Jul 2022 19:33:50 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900619/ https://lwn.net/Articles/900619/ mgedmin <div class="FormattedComment"> <font class="QuotedText">&gt; In contrast to the LTS releases, these releases are only supported for about six months, and are declared end-of-life (EOL) a month after a newer release is made available.</font><br> <p> Ubuntu&#x27;s non-LTS releases are supported for 9 months, and are declared EOL three months after a newer release is made available.<br> <p> (This doesn&#x27;t affect the rest of the article in any way, it&#x27;s just that my inner pedant cannot leave inaccurate information alone.)<br> </div> Sun, 10 Jul 2022 14:02:24 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900611/ https://lwn.net/Articles/900611/ smurf <div class="FormattedComment"> <font class="QuotedText">&gt; so much work to ensure it won&#x27;t crap on something we absolutely need to work... </font><br> <p> So up a reasonable CI system. Surprise: you probably need that anyway.<br> <p> Yes, that&#x27;s somewhat more effort … but you only need to spend it once, not with every release.<br> </div> Sun, 10 Jul 2022 09:55:01 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900571/ https://lwn.net/Articles/900571/ jafd <div class="FormattedComment"> Look at this from another side.<br> <p> <font class="QuotedText">&gt; a non-LTS EOL kernel probably misses 2000 fixes, for as many bugs that are fixed in all maintained versions around it, but not that one.</font><br> <p> What if on the systems running that kernel, none of the fixes touched modules actually used in them?<br> <p> <font class="QuotedText">&gt; Some will corrupt data, cause random hangs</font><br> <p> Not experienced once for a year, let&#x27;s say<br> <p> <font class="QuotedText">&gt; disconnect your WiFi during an audio conf, make your screen disappear after resume, leave fantom USB devices after some errors</font><br> <p> Not happened once in the drivers actually used and on that specific hardware.<br> <p> But what&#x27;s more likely to happen is that a newer version, while bringing a minor fix to a module or a subsystem you need, will also bring a mighty regression in a driver or a subsystem your workflow absolutely depends upon. A couple articles ago someone commented about precisely this situation here on LWN [0].<br> <p> That&#x27;s why there exist users (think companies) which find a kernel that doesn&#x27;t crap on their hardware 99.999% of the time, and pin it, and swear to never upgrade it ever. Have you thought they may have had enough of the Russian roulette?<br> <p> Jumping from LTS to LTS can also be akin to jumping centuries in a time travel vehicle. So many changes, so many surprises, so much work to ensure it won&#x27;t crap on something we absolutely need to work... <br> <p> [0] <a href="https://lwn.net/Articles/889787/">https://lwn.net/Articles/889787/</a>, you were in that thread too.<br> </div> Sat, 09 Jul 2022 19:32:54 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900363/ https://lwn.net/Articles/900363/ roc <div class="FormattedComment"> Changing the names of things when their meaning changes helps reduce these issues. But you&#x27;re right, there&#x27;s no way to fully avoid them.<br> </div> Thu, 07 Jul 2022 22:48:58 +0000 Red Hat stable kernels https://lwn.net/Articles/900238/ https://lwn.net/Articles/900238/ hkario <div class="FormattedComment"> The package version indicates the oldest piece of code as-shipped-upstream in a particular package, it doesn&#x27;t tell what&#x27;s the newest piece is. And in particular, the kernel commonly pulls in new versions of whole subsytems, not just individual patches.<br> </div> Thu, 07 Jul 2022 12:41:52 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900232/ https://lwn.net/Articles/900232/ bartoc <div class="FormattedComment"> Heh I play a game (X4: foundations, they have a pretty good linux version) that uses an extremely cursed XML based scripting language (&lt;if condition=&quot;...&quot;&gt;&lt;/if&gt;&lt;else&gt;&lt;/else&gt; etc) with the explicit motivation that it lets them have mods automatically merge with each-other using xpath/xinclude or apply xslt stylesheets to each-other.<br> <p> This ends up going a _little_ better than you would expect, but not much.<br> <p> <p> You can do cocci style semantic diff and merge, where the merge is based on the AST, not the text. But this means you need to parse everything perfectly and it&#x27;s not that much better than a good text-based diff/merge algorithm. I suppose in a way this is what __attribute__((flatten)) or always_inline do.<br> <p> Actually it would be kinda neat to have a compiler plugin or extension that let you inject code &quot;somewhere else&quot;. You could have a compiler plugin that let you write functions to be injected at specific locations. This would be merely &quot;neat&quot; for these kinds of functionality patches but I think it could actually be a useful feature for injecting static tracepoints without cluttering up the code too much (or forgetting to add one someplace).<br> <p> Come to think of it nim-lang has an experimental feature for this called &quot;term rewriting macros&quot;, you can write something like (from the manual):<br> template optMul{`*`(a, 2)}(a: int): int = a+a<br> <p> once this is brought into scope the compiler will rewrite any further expressions like some_ident * 2 as some_ident * some_ident. You can see how this is basically a megaton yield footgun. The matching language is kinda neat, but it&#x27;s very sensitive to the exact structure of the AST, so it can miss stuff (nim has macros too, but these operate on a kind of alternate &quot;stable&quot; ast that converts too and from the &quot;real&quot; AST. That&#x27;s harder for TR macros because any change in the &quot;real&quot; ast is usually something TR macros might be interested in, or might be able to break if they don&#x27;t understand.<br> <p> Because of these issues, and because TR macros are not widely used, and because they have some performance problems they may be removed.<br> <p> Oh Mathamatica&#x27;s &quot;Wolfram Language&quot; is famously based on this concept, and there is a lot of literature out there about how amazing it is. I remain unconvinced, but it is unconventional and interesting for sure. <br> <p> Oh, you can imagine that once you have such a facility and multiple users pop up trying to use it at once things get quite interesting quite fast.<br> </div> Thu, 07 Jul 2022 08:52:29 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900219/ https://lwn.net/Articles/900219/ bartoc <div class="FormattedComment"> well, there are cocci checks for a ton of ownership issues in the kernel, if the internal change had come with such a cocci check and the ubuntu folks had run it then maybe this could have been caught, but why should it have been? It was an internal refactoring after all. <br> <p> I have my doubts about rust’s ability to prevent this kind of thing, but stuff like that does help (thats why the kernel has lifetime related cocci checks)<br> </div> Wed, 06 Jul 2022 22:05:36 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900212/ https://lwn.net/Articles/900212/ atnot <div class="FormattedComment"> I think the only real way of solving this is by reducing the amount of similar programs that also compile but have undesirable behavior. Proving a patch can&#x27;t introduce say a use-after-free is probably about as hard as proving it doesn&#x27;t have a use-after-free in the first place. C is somewhat of a worst case there because it has very few mechanisms for constraining the callers or surrounding code.<br> </div> Wed, 06 Jul 2022 21:40:30 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900216/ https://lwn.net/Articles/900216/ wtarreau <div class="FormattedComment"> Quite frankly, bugs happen, and one can always contest how shameful it is that a given bug was not caught earlier, but bugs are just this, something that evades humans&#x27; logic. And even if it was one of their own in-house patches that caused the breakage, I won&#x27;t necessarily blame their developers for that. All of us caused a mess at least once with a shameful patch.<br> <p> What&#x27;s irritating me however (and has for a while) is their insistence on using EOL kernels. They made a good progress on their LTS branches but honestly, &quot;upgrading&quot; from a maintained kernel to an unmaintained one to get new drivers is really not acceptable. For having produced stable kernels myself, I can say it, **EOL kernels are completely bogus**, because the flow of patches that need to be backported is steady, but what happens when the kernel reaches EOL ? They&#x27;re not merged anymore. After one year a non-LTS EOL kernel probably misses 2000 fixes, for as many bugs that are fixed in all maintained versions around it, but not that one. Some will corrupt data, cause random hangs, disconnect your WiFi during an audio conf, make your screen disappear after resume, leave fantom USB devices after some errors, let an intruder escalate privileges on your machine, etc. There are now huge efforts from the kernel community to provide a wide choice of high-quality LTS kernels, and there is absolutely zero (**ZERO**) excuse nowadays for any distro to ship a kernel that reaches EOL before the end of support of the distro (and even worse before the release here). This bad practice is irresponsible and must stop!<br> <p> </div> Wed, 06 Jul 2022 21:32:15 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900202/ https://lwn.net/Articles/900202/ NYKevin <div class="FormattedComment"> You can&#x27;t do this without either using an incredibly strict language or an incredibly broad definition of &quot;breakage.&quot; Any time code interacts with state, the state is subject to some set of invariants (e.g. &quot;when foo() returns -1, errno can take on these values...&quot;). Most languages explicitly encode only a small subset of those invariants in their type systems, and leave everything else for comments and documentation. Even languages like SQL, where most invariants are encoded as constraints, still have business logic invariants (e.g. &quot;the value in the price column is the real-world price of the item, excluding tax&quot;), which usually cannot be expressed or checked by a type checker (sure, you can check for negative or zero price, but you can&#x27;t check for &quot;the price is too expensive for this item, because somebody changed the price column to include tax&quot;). So you have to use a broad definition of &quot;breakage&quot; - X &quot;breaks&quot; Y if X touches any state which Y later reads. But that means that everything breaks everything, in practice.<br> <p> The other problem is that your graph is probably not acyclic, because most nontrivial programs contain loops. So not only does X break Y, Y also probably breaks X, meaning that you can&#x27;t automatically order X and Y with respect to one another.<br> </div> Wed, 06 Jul 2022 17:39:45 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900187/ https://lwn.net/Articles/900187/ iabervon <div class="FormattedComment"> I think you might be able to get useful results if you&#x27;re not looking for whether the patch is correct (since people already try that), but instead whether the patch works differently in the new kernel as compared to the one it was written for.<br> <p> I could see there being relatively few patches flagged for &quot;between 5.x.y and 5.x.y+1, lines your patch affects are part of data flow analysis that doesn&#x27;t match&quot;. It seems like it would give a definite warning for the reference count of an object the patch passes to fput() necessarily being different in 5.13 as compared to 5.8. It wouldn&#x27;t be able to tell which one is correct, or whether the code had two references and is just releasing them in the opposite order now, but it would be clear that there&#x27;s some sort of conflict resolution needed.<br> </div> Wed, 06 Jul 2022 15:16:41 +0000 Red Hat stable kernels https://lwn.net/Articles/900191/ https://lwn.net/Articles/900191/ pbonzini <div class="FormattedComment"> Yes that&#x27;s pretty accurate. :)<br> </div> Wed, 06 Jul 2022 15:00:34 +0000 Red Hat stable kernels https://lwn.net/Articles/900137/ https://lwn.net/Articles/900137/ nim-nim <div class="FormattedComment"> Well it’s more like doing the work before others, what you see in LTS kernels will have often been exercised as a rh patch in their own products.<br> </div> Wed, 06 Jul 2022 10:14:43 +0000 Red Hat stable kernels https://lwn.net/Articles/900135/ https://lwn.net/Articles/900135/ taladar <div class="FormattedComment"> I think of the Red Hat way of doing things less as &quot;shipping older kernels&quot; and more as &quot;lying about version numbers&quot;.<br> </div> Wed, 06 Jul 2022 07:43:56 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900134/ https://lwn.net/Articles/900134/ smurf <div class="FormattedComment"> Well, there&#x27;s some call flow analysis tools out there that tries to discover ownership issues like the one that caused this problem.<br> <p> However, C isn&#x27;t a particularly nice language to do that with, particularly when you start using these tools after the fact you get a heap of false positives.<br> <p> The real solution is to switch to a language with built-in object lifecycle guarantees. In this case, Rust.<br> </div> Wed, 06 Jul 2022 07:41:16 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900132/ https://lwn.net/Articles/900132/ developer122 <div class="FormattedComment"> I suppose that you might be able to generate metadata for a piece of software in the form of a directed acyclic graph representing a kind of causality or dependency. Something that touches one branch isn&#x27;t blocked by something on another branch, but something upstream blocks downstream patches due to changed downstream-visible behavior. With language fine-tuning you could manage what does and does not cause a downstream-visible change of behavior, and what behavior precisely a patch depends on. Dunno whether most software lends itself to a good directed acyclic graph of behavior causality though.<br> </div> Wed, 06 Jul 2022 05:53:33 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900129/ https://lwn.net/Articles/900129/ derobert <div class="FormattedComment"> I think you&#x27;re right about it being in general impossible: after all, your patch could add a line after some function call f(). If f() doesn&#x27;t return (which is the halting problem), then your patch is fine (it is never executed). So in general, it&#x27;s not doable.<br> <p> You can surely do a bunch of control &amp; data flow analysis, but I suspect you&#x27;ll get far too many false positives. After all, if it tells you to investigate half the patches, it&#x27;s not that useful.<br> <p> If this is really any &quot;docker run&quot;, then that&#x27;s an embarrassing testing failure. (Especially if it hits other container engines, like k8s). <br> <p> Cloud images probably shouldn&#x27;t be getting HWE kernels, at least not until a few months after desktop.<br> </div> Wed, 06 Jul 2022 04:10:01 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900126/ https://lwn.net/Articles/900126/ DSMan195276 <div class="FormattedComment"> I think the issue there is that in this case the patch applied correctly to the right locations, but the internals had changed in some way causing it to be wrong. It&#x27;s entirely possible the relevant changes were in others source files, so unless you&#x27;re hashing an entire directory tree or the entire source you still might not catch it. And at that point you&#x27;re really just saying you should review every patch you&#x27;re applying every time you apply it. Which might very well be true :D but I&#x27;m not sure you need the hashing at that point.<br> <p> I think the more straightforward solution here is just testing after the patch is applied to verify the functionality still works. That&#x27;s what we really care about anyway, staring at the code only gets you so far if you never actually try it. And also, don&#x27;t maintain your own patches if you can avoid it :D<br> </div> Wed, 06 Jul 2022 02:38:37 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900124/ https://lwn.net/Articles/900124/ NightMonkey <div class="FormattedComment"> Naive sysadmin here. Couldn&#x27;t a hash of the section to patch be included in the metadata of the patch? Then, if the hash doesn&#x27;t match, the patch cleanly fails if the section has changed? Or is this already done and there is another angle I&#x27;m missing? Cheers.<br> </div> Wed, 06 Jul 2022 01:38:29 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900118/ https://lwn.net/Articles/900118/ developer122 <div class="FormattedComment"> Kind of a meta observation: Nobody seems to have really solved the theoretical problem of ensuring that a patch &quot;actually&quot; applies cleanly, ie. that what it touches hasn&#x27;t changed in some way. May well be impossible.<br> <p> Best case scenario your tooling is tracing subsystem interactions and noting dependencies in case there&#x27;s any change at all, worst case it&#x27;s trying to parse exactly what some code is doing/trying to do which is halting-problem territory.<br> </div> Tue, 05 Jul 2022 23:41:06 +0000 Red Hat stable kernels https://lwn.net/Articles/900114/ https://lwn.net/Articles/900114/ pbonzini <div class="FormattedComment"> Depends on the directory. Large parts of the RHEL9 kernel are probably closer to 5.16 even.<br> </div> Tue, 05 Jul 2022 22:21:52 +0000 Red Hat stable kernels https://lwn.net/Articles/900110/ https://lwn.net/Articles/900110/ jake <div class="FormattedComment"> <font class="QuotedText">&gt; RHEL-8 is 4.18 based and RHEL-9 is 5.14 based, neither of which are upstream LTS.</font><br> <p> ah, yes, thanks for the correction ... i have adjusted the article text accordingly ...<br> <p> jake<br> </div> Tue, 05 Jul 2022 21:33:32 +0000 Red Hat stable kernels https://lwn.net/Articles/900109/ https://lwn.net/Articles/900109/ willy <div class="FormattedComment"> If you diff the RHEL-9 kernel against 5.14 and 5.15, which diff is smaller?<br> </div> Tue, 05 Jul 2022 21:32:32 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900108/ https://lwn.net/Articles/900108/ snajpa <div class="FormattedComment"> uh, now I feel sorry for ever doubting you guys, I thought that initiative of shipping complete fscks at times was driven by the actual dev culture at Canonical, not the management... hasn&#x27;t even occurred to me you guys might have been pressured to ship that sh*t, I just thought the management doesn&#x27;t care...<br> <p> so, please accept my apologies, I&#x27;m gonna crawl into that corner over there and think long and hard about this :D<br> </div> Tue, 05 Jul 2022 21:11:29 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900107/ https://lwn.net/Articles/900107/ Paf <div class="FormattedComment"> This seems a decently understandable oversight, what I’m a little surprised by is that it wasn’t caught by some form of CI…<br> </div> Tue, 05 Jul 2022 21:07:50 +0000 Red Hat stable kernels https://lwn.net/Articles/900106/ https://lwn.net/Articles/900106/ ribbo <div class="FormattedComment"> Red Hat doesn&#x27;t choose LTS kernels for their LTS distro (RHEL) releases, as stated in the article, RHEL-8 is 4.18 based and RHEL-9 is 5.14 based, neither of which are upstream LTS.<br> </div> Tue, 05 Jul 2022 21:02:21 +0000 An Ubuntu kernel bug causes container crashes https://lwn.net/Articles/900104/ https://lwn.net/Articles/900104/ brauner <div class="FormattedComment"> It should be noted that when I was still working there we were forced to update and carry this fscking patchset. I hate it with a passion. It&#x27;s nickname has always been sh*tfs and we refused any upstreaming. Thankfully we were able to solve this upstream and not as a separate fs.<br> </div> Tue, 05 Jul 2022 20:50:39 +0000