LWN: Comments on "The state of the e1000e bug" https://lwn.net/Articles/301251/ This is a special feed containing comments posted to the individual LWN article titled "The state of the e1000e bug". en-us Thu, 23 Oct 2025 09:09:41 +0000 Thu, 23 Oct 2025 09:09:41 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net The state of the e1000e bug https://lwn.net/Articles/303755/ https://lwn.net/Articles/303755/ Duncan <div class="FormattedComment"> While you've almost certainly stopped watching for replies, perhaps this <br> comment will help someone else coming across this later, perhaps from a <br> google...<br> <p> Fortunately I don't have an e1000e NIC so this particular bug hasn't been <br> a problem here (and it has by the time I write this been traced to the <br> ftrace framework and fixed properly), but I did have a bug with this <br> kernel and used git bisect on it, so the question does pertain, and would <br> have been of immediate interest if I did have the hardware.<br> <p> In general, it's quite possible to revert any specific commit or set of <br> commits, while doing bisect or otherwise testing using git. However, <br> there's a couple problems if trying to do it that way.<br> <p> One, it's quite possible additional commits will have been built upon the <br> problem commit, so one could well end up reverting a decent size swath of <br> commits, to the point one couldn't really be said to be testing the <br> kernel at that particular point anyway, potentially invalidating any <br> conclusions the testing may come to. Perhaps not, indeed, probably not, <br> but it's a complicating consideration, certainly.<br> <p> Two, as was the case with this bug now that it has been traced and <br> looking at it in hindsight, the bug can be in an area entirely unrelated <br> (except via the bug) to where it actually shows up. (parenthetical <br> example: Sort of like a leaky roof; the hole in the roof may be several <br> meters away from where the water drips thru the ceiling!) In this case, <br> it was a bug in the new ftrace functionality, coupled with removing <br> modules, that was eventually found to cause the problem. ftrace has been <br> disabled in 2.6.27.1, but the point is that until the problem is fully <br> traced, there's no guarantee that one would pick the correct commits to <br> revert while bisecting in any case. I've no idea how long ago the last <br> e1000e commits were, but supposing they happened in this kernel, the <br> instinctive thing to do would be to revert all of them while doing the <br> bisect, but that wouldn't have helped in this case, and there was no way <br> of knowing until later what /would/ help, since the ftrace stuff was <br> otherwise entirely unrelated.<br> <p> Thus a bisect with the supposedly offending changes might both lead to <br> the wrong conclusions, and not remove the danger of bricking the hardware <br> in any case.<br> <p> Unfortunately, the fact remains that testing unreleased kernels is risky. <br> Indeed, conservative folks will likely want to stay a full kernel release <br> back, not installing 2.6.26 until 2.6.27 at least, and only then <br> installing whatever happens to be the latest 2.6.26.x stable release. <br> Even distribution kernels were bit by this, altho obviously it was just <br> the most bleeding edge ones, the ones shipped as -rc testing, for those <br> willing to risk their machines and try it, and this -rc series DID point <br> out the very literal meaning of the "risk their machines" bit. It's <br> certainly not for everyone, but as one that does run -rc kernels (tho <br> only from -rc3 or so) myself, it can be rewarding too -- there's nothing <br> quite like feeling of being able to point to a particular -rc bug and <br> say "but for me, that may have made it to release, I played my part in <br> making this kernel a good one", especially for folks (like me) that might <br> do sysadmin level bash scripts, but little more.<br> <p> Duncan<br> </div> Sun, 19 Oct 2008 01:35:38 +0000 The state of the e1000e bug https://lwn.net/Articles/301853/ https://lwn.net/Articles/301853/ jzbiciak <P>I'm not a git user nor am I kernel developer, but this caught my eye:</P><BLOCKQUOTE><I>[E]ven if this bug is fixed tomorrow, it will be present in most of the 2.6.27 history. Anybody bisecting the kernel in an attempt to track down an unrelated bug risks being bitten by a zombie version of the e1000e bug. There may be no way to deal with that threat other than the posting of some big warnings.</I></BLOCKQUOTE> <P>It seems like it would be useful to have a git bisect mode that allowed you to pin some changesets while otherwise warping you to the next kernel in a bisection sequence. In other words, you want to get "version XXX + <I>plus these N changesets</I>." That seems like it might be a generically useful facility. It also seems like it'd let people bisect to find other bugs while holding certain things in the present, such as e1000e.</P> <P>Does git provide for such a thing?</P> Mon, 06 Oct 2008 07:31:53 +0000 The state of the e1000e bug ... 2.6.27 fix available now https://lwn.net/Articles/301538/ https://lwn.net/Articles/301538/ iabervon <div class="FormattedComment"> It seems pretty likely that the actual bug isn't in the kernel, though, and therefore holding up 2.6.27 might not be appropriate now that the latest kernel will prevent userspace misbehaving in a particular way from persistently messing up hardware. I think the current evidence doesn't exclude: some X driver, while probing the system for its hardware, maps the frame buffer too large and writes into it, spilling into whatever's after it, which is generally either nothing or unwritable, which in turn leads to determining correctly that that driver isn't appropriate. So some arbitrary device would get some arbitrary invalid I/O, at a point where things are mostly idle, and it wouldn't get any particular attention unless it happens to do serious damage (i.e., something that would be noticed later). If the kernel gets things back to a state where nothing terrible happens due to the bug, and maybe even some logging occurs, that's enough for 2.6.27.<br> <p> </div> Thu, 02 Oct 2008 19:28:29 +0000 The state of the e1000e bug ... 2.6.27 fix available now https://lwn.net/Articles/301519/ https://lwn.net/Articles/301519/ s0f4r <div class="FormattedComment"> unlikely, as the bug has been hit by several testers in isolated testing labs.<br> </div> Thu, 02 Oct 2008 17:17:34 +0000 The state of the e1000e bug ... 2.6.27 fix available now https://lwn.net/Articles/301408/ https://lwn.net/Articles/301408/ smoogen <div class="FormattedComment"> /me waits to find out that this was caused by the 'TCP' security bug that wipes out all stacks.<br> <p> <a href="http://www.theregister.co.uk/2008/10/01/fundamental_net_vuln/">http://www.theregister.co.uk/2008/10/01/fundamental_net_v...</a><br> <p> and it turns out that the bug is caused by the incoming packets from various 'testers' on the internet.<br> </div> Thu, 02 Oct 2008 05:00:57 +0000 The state of the e1000e bug ... 2.6.27 fix available now https://lwn.net/Articles/301394/ https://lwn.net/Articles/301394/ arjan <div class="FormattedComment"> Yeah.. it only fixes the e1000e part of the story.<br> It doesn't fix the part that is causing the corruption in the first place<br> </div> Thu, 02 Oct 2008 02:05:43 +0000 The state of the e1000e bug ... 2.6.27 fix available now https://lwn.net/Articles/301393/ https://lwn.net/Articles/301393/ corbet The patch is good stuff and will allow things to move ahead, but calling it a "fix" seems like wishful thinking. The patch interposes a barrier between the bug and its effects. That is very much a good thing to do, but it has only mitigated the symptoms of the bug, not "fixed" the bug. I sure hope that a real fix will be forthcoming before 2.6.27 comes out. Thu, 02 Oct 2008 01:45:20 +0000 The state of the e1000e bug ... 2.6.27 fix available now https://lwn.net/Articles/301392/ https://lwn.net/Articles/301392/ arjan <div class="FormattedComment"> The thread in <a href="http://lkml.org/lkml/2008/10/1/368">http://lkml.org/lkml/2008/10/1/368</a> has a patch that will prevent NVM corruption (and has done so in our extensive testing). <br> Linus has already merged this patch.<br> <p> Now, there's something else going on where "something" is overwriting memory.... but now that the NVM no longer corrupts that is likely to be found very quickly.<br> (and very likely unrelated to e1000e itself)<br> </div> Thu, 02 Oct 2008 01:39:57 +0000