LWN.net Logo

Brown: A Nasty md/raid bug

Brown: A Nasty md/raid bug

Posted Jun 18, 2012 22:21 UTC (Mon) by xorbe (subscriber, #3165)
Parent article: Brown: A Nasty md/raid bug

The best solution sounds like: upgrade kernel, sync, hit reset button!


(Log in to post comments)

Brown: A Nasty md/raid bug

Posted Jun 18, 2012 22:48 UTC (Mon) by alvieboy (subscriber, #51617) [Link]

I guess we can scan for the relevant information, even if most metadata is lost.

Even if you have to do some trial-and-error, automatized. With a few attempts you can eventually derive all needed values, like stripe size and offsets.

Neil however put a lot of effort explaining what was going on, who could be affected, and how to overcome the problem. I wished most bug fixes came with this sort of "tutorial", and explanation of the issue. Kudos to him.

Alvie

Brown: A Nasty md/raid bug

Posted Jun 19, 2012 6:36 UTC (Tue) by Alterego (subscriber, #55989) [Link]

@xorbe: You should read the article that would prevent you from posting a misleading comment.

The article says:
If you decide to upgrade your kernel, you should do so *carefully*. Remember that the bug triggers on shutdown/reboot so you aren't safe until the new kernel is running.

Brown: A Nasty md/raid bug

Posted Jun 19, 2012 7:15 UTC (Tue) by AngryChris (subscriber, #74783) [Link]

How is his comment misleading? It's triggered on reboot/shutdown. If you upgrade the kernel, sync your filesystems, and then hit the reset button, the machine never "shuts down" from an operating perspective. The nasty bug that's hit when rebooting is bypassed.

Brown: A Nasty md/raid bug

Posted Jun 19, 2012 10:47 UTC (Tue) by epa (subscriber, #39769) [Link]

Unfortunately lots of machines these days have a soft reset button that triggers /sbin/shutdown. Even the so-called power switch does not cut power to the machine. You have to reach around the back of the computer and yank the power cord, or remove the battery.

Brown: A Nasty md/raid bug

Posted Jun 19, 2012 11:14 UTC (Tue) by hummassa (subscriber, #307) [Link]

One can press-and-hold the power button; usually it has the intended effect of just powering off the device... I don't know any hardware where it does not work. A "sync; poweroff -f" should work too.

Magic SysRq Keys

Posted Jun 19, 2012 11:55 UTC (Tue) by k3ninho (subscriber, #50375) [Link]

Please don't do the following on your machine without expecting it to sync the disks (wait for it to complete) and immediately reboot:
alt-sysrq-s; alt-sysrq-b.

Source: http://en.wikipedia.org/wiki/Magic_SysRq_key

K3n.

Magic SysRq Keys

Posted Jun 19, 2012 12:27 UTC (Tue) by hummassa (subscriber, #307) [Link]

I usually do ctrl-alt-f1, sysrq-6 (to see the messages), sysrq-s (wait for the all sync message to pop up), sysrq-u (so no processes try to write to the disks again after the last sync), another sysrq-s (should pop the message quickly this time), sysrq-o, wait one minute or so, turn the machine on again (the last part is kind of superstitious but I feel all warm and fuzzy inside knowing that if the power fails, I have seen the machine boot from zero last time, so it "ought" to work).

Magic SysRq Keys

Posted Jun 20, 2012 7:31 UTC (Wed) by jezuch (subscriber, #52988) [Link]

> I usually do ctrl-alt-f1, sysrq-6 (to see the messages), sysrq-s (wait for the all sync message to pop up) (...)

I saw a mnemonic for this sequence somewhere:
Raising Skinny Elephants Is [So] Utterly Boring

(The sysrq-s part I saw recommended after sysrq-r or after sysrq-i so, just for safety, I do it in both places ;) )

Brown: A Nasty md/raid bug

Posted Jun 19, 2012 11:26 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

You can use the "Raising Elephants Is So Utterly Boring" SysRq-combination without E,I and U letters.

Soft reset buttons

Posted Jun 22, 2012 17:07 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

Unfortunately lots of machines these days have a soft reset button that triggers /sbin/shutdown.

Are you sure? Have you seen this? I know the soft power down from the power switch is the norm, but I've never seen the reset button, whose main purpose is not to involve the operating system (unlike power control, if you wanted to restart via the OS, you probably would have used the keyboard instead of a paperclip reset button), do this.

Soft reset buttons

Posted Jun 28, 2012 10:54 UTC (Thu) by epa (subscriber, #39769) [Link]

I might indeed be getting confused between soft-power and reset.

Brown: A Nasty md/raid bug

Posted Jun 20, 2012 22:46 UTC (Wed) by ttonino (subscriber, #4073) [Link]

Well... http://neil.brown.name/blog/20120615073245 tells me that the bug only happens if the array is NOT running on shutdown, as the array state gets overwritten from memory (which in that case has the non-running state).

So, the chance of this happening all by itself is pretty low. That said, Brown gives some good advice about saving the metadata yourself.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds