Mandriva has sent out an
advisory on the e1000e corruption bug which, by virtue of being the
best compilation of information on this problem so far, is of interest far
beyond the Mandriva user community. If you have an e1000e adapter and run
2.6.27-rc kernels, you probably want to take a look.
(Log in to post comments)
Information on the e1000e corruption bug
Posted Sep 24, 2008 16:49 UTC (Wed) by charris (subscriber, #13263)
[Link]
I lost an e1000 to eeprom corruption about 6 mos ago. Whether it was due to Linux I don't know, but it was suspicious. Fortunately, network cards are cheap...
Information on the e1000e corruption bug
Posted Sep 24, 2008 17:13 UTC (Wed) by Cato (subscriber, #7643)
[Link]
I haven't had this bug, but an onboard gigabit ethernet card (Realtek RTL8111/8168B) has tended to lock up. Older ethernet cards are still more reliable it seems, I haven't had any problems with an Intel eepro100 card.
Information on the e1000e corruption bug
Posted Sep 24, 2008 19:23 UTC (Wed) by hmh (subscriber, #3838)
[Link]
A bug on the e1000 driver locking that could cause EEPROM corruption *was* fixed a while ago, but that doesn't mean it was what caused your issue.
Really, saving a dump of the card's EEPROM is a damn good idea... even if it is not an Intel NIC one should still try to do it. Who knows, the next NVRAM-killing problem might hit devices from other manufacturers...
Information on the e1000e corruption bug
Posted Sep 25, 2008 1:03 UTC (Thu) by BenHutchings (subscriber, #37955)
[Link]
There was a supposed bug fix in locking in e1000, but all the code paths to EEPROM access already held the rtnetlink lock, so I'm sceptical that the supposed fix made any difference.
Information on the e1000e corruption bug
Posted Sep 24, 2008 23:22 UTC (Wed) by mheily (guest, #27123)
[Link]
> If you do have the affected hardware, we (and Intel) recommend you > immediately back up your EEPROM data, using the following command as root:
>
> ethtool -e ethX > savemyeep.txt
>
> Where ethX is the affected interface. Usually it will be eth0. This will
> save the EEPROM data to the savemyeep.txt file. Keep this file safe. If
> you were subsequently to have the problem occur, we can then assist you in
> restoring the data from this backup file.
I call BS on this one. If the card is truly bricked, you can't restore from a backup. If Mandriva has a guaranteed way to restore the EEPROM after it has been corrupted, why don't they document it clearly in the announcement and provide a link?
If there is a chance of permanent bricking, why give people false hope that they can restore from a backup?
"Once this corruption has occurred, recovery may be possible via a BIOS update, but may well require replacement of the hardware."
Information on the e1000e corruption bug
Posted Sep 25, 2008 0:15 UTC (Thu) by nix (subscriber, #2304)
[Link]
My understanding is that they *don't* have a guaranteed way: the comments
on the kernel bug suggest that having an image of the EEPROM will give the
Intel folks some chance of getting it back, but that without one it's hard
to tell what the EEPROM image should be. (Apparently there are a lot of
possible images.)
Information on the e1000e corruption bug
Posted Sep 25, 2008 1:03 UTC (Thu) by corbet (editor, #1)
[Link]
A tool to restore bricked adapters is being developed, stay tuned.
Information on the e1000e corruption bug
Posted Sep 25, 2008 11:05 UTC (Thu) by buchanmilne (guest, #42315)
[Link]
> Where ethX is the affected interface. Usually it will be eth0. This will
> save the EEPROM data to the savemyeep.txt file. Keep this file safe. If
> you were subsequently to have the problem occur, we can then assist you in
> restoring the data from this backup file.
I call BS on this one. If the card is truly bricked, you can't restore from a backup. If Mandriva has a guaranteed way to restore the EEPROM after it has been corrupted, why don't they document it clearly in the announcement and provide a link?
Either way, according to the comments from Intel developers, it seems your chances are better at being able to restore the EEPROM if you have a backup (aka, they are effectively zero without a backup, and non-zero with one).
/me makes a backup
Information on the e1000e corruption bug
Posted Sep 25, 2008 14:28 UTC (Thu) by mheily (guest, #27123)
[Link]
Mandriva suggests an action plan like this:
1. Backup your EEPROM
2. Download an older kernel package
3. Install the older kernel package
4. Reboot
That seems dangerous when you have a kernel bug inside your machine that could be triggered at any time. I would prefer something like this:
1. Close your work
2. Run 'sync' as root.
3. Pull the power plug and remove the laptop battery
4. Boot from a LiveCD
5. Mount your root partition under /mnt and run 'rm -f /mnt/lib/modules/2.6.27*/kernel/drivers/net/e1000e/e1000e.ko.gz'
Information on the e1000e corruption bug
Posted Sep 26, 2008 8:20 UTC (Fri) by job (guest, #670)
[Link]
I don't think you have to be root to call sync...
Information on the e1000e corruption bug
Posted Sep 26, 2008 21:39 UTC (Fri) by spuk (guest, #54294)
[Link]
Latest Mandriva .27 kernel package, included in the RC2 isos, has the
e1000e driver removed.
Your plan sounds a bit "hard", and misses the possibly most important
step of backing up the EEPROM.
Information on the e1000e corruption bug
Posted Sep 25, 2008 9:59 UTC (Thu) by jmvaz (guest, #49567)
[Link]
Can the mmiotrace patch - developed by the nouveau guys - help pinpoint which piece of code is causing the corruption of the e1000 EEPROM ?
Just a thought...
Information on the e1000e corruption bug
Posted Sep 25, 2008 16:01 UTC (Thu) by danielthaler (guest, #24764)
[Link]
I think mmiotrace is in mainline now as part of ftrace.
The problem here is that an trace would become absolutely huge. Even just tracing a single driver for a short time could produce too much output.
Now imagine you have megabytes upone megabytes of ftrace output; how do you find the interesting bit? It's not like it will contain a flashing red line marked "EEPROM corrupted here".