LWN.net Logo

War story

War story

Posted Sep 11, 2007 6:07 UTC (Tue) by cstanhop (subscriber, #4740)
Parent article: The many faces of fsck

I just want to say that this is a great article. I've enjoyed the others along this line, and I look forward to more articles like this one.

My fsck war story is one where the fsck program was of absolutely no help, and I got to play human fsck for a day.

This was back in '96 on a Caldera Linux distribution with ext2. I had just finished installing and configuring a linux box that acted as the file, email, web server, DNS, dial-up router, and proxy for the three person company I was part of. It was a big deal for us, and it had taken me a better part of a week to get everything running as I had planned. Everything was complete except (feel free to shudder knowingly) we hadn't received the tape drive system we intended to use as the backup for the system. However, I was primarily working as an EE at the time, and as a third of the work force, I needed to get back to my other work.

The system was running fine for a couple of days, but then one morning I came in and my boss told me it refused to boot. My heart sank as I went to the terminal and saw some error about the file system not being recognized. A week's work gone! I had a suspicion somebody had neglected to shut it down properly because they had forgot the root password, but under my baleful gaze oaths were sworn that this was not the case. Anyway pointing fingers wasn't going to get the thing to boot, and I should have figured out a way to back it up even without the tape drive.

So, I tried booting from a floppy and running fsck, but fsck wouldn't have anything to do with the drive. I briefly considered the possibility of rebuilding everything, but I was determined to not lose all that hard work. So, I read up all the information I could find about ext2, and I fired up ext2ed. Working with ext2ed, it appeared that a drive sector in the root inode had become corrupted. Reading that sector gave errors, but I could poke around in other sectors. So, based on what I had been reading about ext2 and knowing how I had laid out the drives, I had ext2ed rewrite the root inode. Once ext2ed wrote the root inode, the drive began to read the sector without errors. Yea me!

So, now I ran fsck. Success again! Absolutely no errors! But we were still spooked, so my boss had run out and bought equivalent geometry drives from some other manufacturer. I "mirrored" the disks with dd, then I booted from the originals just to see if worked. It did, so we swapped in the new drive and tried it, and it worked too. As far as I could tell, nothing had been lost and it was only that sector in the root inode that caused problems. We continued running with the new drives and kept another new drive as a mirrored backup for a while until we could get the tape backup working.

All I can say is "phew!" Luckily it only took me a better part of the day to recover from that potential data disaster. With a couple of earlier exception in the days of DOS, that's the closest I've had to get to my file system. I completely appreciate all the work people put into them so I can get on with the things I want to do. Even the slow fscks are appreciated (when they work). At least I learned the value of backups. Lots and lots of backups...


(Log in to post comments)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds