The thread in http://lkml.org/lkml/2008/10/1/368 has a patch that will prevent NVM corruption (and has done so in our extensive testing).
Linus has already merged this patch.
Now, there's something else going on where "something" is overwriting memory.... but now that the NVM no longer corrupts that is likely to be found very quickly.
(and very likely unrelated to e1000e itself)
The state of the e1000e bug ... 2.6.27 fix available now
Posted Oct 2, 2008 1:45 UTC (Thu) by corbet (editor, #1)
[Link]
The patch is good stuff and will allow things to move ahead, but calling it a "fix" seems like wishful thinking. The patch interposes a barrier between the bug and its effects. That is very much a good thing to do, but it has only mitigated the symptoms of the bug, not "fixed" the bug. I sure hope that a real fix will be forthcoming before 2.6.27 comes out.
The state of the e1000e bug ... 2.6.27 fix available now
Posted Oct 2, 2008 2:05 UTC (Thu) by arjan (subscriber, #36785)
[Link]
Yeah.. it only fixes the e1000e part of the story.
It doesn't fix the part that is causing the corruption in the first place
The state of the e1000e bug ... 2.6.27 fix available now
Posted Oct 2, 2008 5:00 UTC (Thu) by smoogen (subscriber, #97)
[Link]
/me waits to find out that this was caused by the 'TCP' security bug that wipes out all stacks.
and it turns out that the bug is caused by the incoming packets from various 'testers' on the internet.
The state of the e1000e bug ... 2.6.27 fix available now
Posted Oct 2, 2008 17:17 UTC (Thu) by s0f4r (guest, #52284)
[Link]
unlikely, as the bug has been hit by several testers in isolated testing labs.
The state of the e1000e bug ... 2.6.27 fix available now
Posted Oct 2, 2008 19:28 UTC (Thu) by iabervon (subscriber, #722)
[Link]
It seems pretty likely that the actual bug isn't in the kernel, though, and therefore holding up 2.6.27 might not be appropriate now that the latest kernel will prevent userspace misbehaving in a particular way from persistently messing up hardware. I think the current evidence doesn't exclude: some X driver, while probing the system for its hardware, maps the frame buffer too large and writes into it, spilling into whatever's after it, which is generally either nothing or unwritable, which in turn leads to determining correctly that that driver isn't appropriate. So some arbitrary device would get some arbitrary invalid I/O, at a point where things are mostly idle, and it wouldn't get any particular attention unless it happens to do serious damage (i.e., something that would be noticed later). If the kernel gets things back to a state where nothing terrible happens due to the bug, and maybe even some logging occurs, that's enough for 2.6.27.