fsck

Posted Mar 20, 2007 6:03 UTC (Tue) by Ross (guest, #4065)
In reply to: never understood why Linux community continues to ignore ZFS by qu1j0t3
Parent article: The 2007 Linux Storage and File Systems Workshop

It's nice to have fsck even if you don't use it automatically because not all corruption happens in ways that can be repaired by journal replay or unrolling a transaction. Checksums are a great idea, and so is using RAID, but the ability to scan the entire disk for metadata inconsistencies remains valuable despite them.

fsck

Posted Mar 20, 2007 10:46 UTC (Tue) by etienne_lorrain@yahoo.fr (guest, #38022) [Link] (9 responses)

> not all corruption happens in ways that can be repaired by journal replay or unrolling a transaction

I've got a problem few days ago and have not been impressed by journal replay on ext3.
I think I remember the FS would not unmount at the end of the shutdown, but no other illness sign during the few hours Linux session.
No funny setup: no LVM, no RAID, simple partition table, no SMP, ia32... HD SMART good even after crash, no hardware change for a long time.
Result in the root directory + main E3FS descriptor loss and approximately 95 % of lost directories (with all the files inside them) after fsck, most directory inode have lost their name - for sure no way to get /var/log/messages or anything in /boot.
It was a test system with a linux 2.6.21-rc and no real important file on this partition, but sometimes I wonder if I shall not only use the simple ext2fs - when I had crashes (every 3-4 years) it has never been so extensive - maybe less chance to have a wrong journal?

Just my £0.02,
Etienne.

fsck

Posted Mar 20, 2007 11:45 UTC (Tue) by drag (guest, #31333) [Link] (8 responses)

Ouch I had a similar thing happen with a /home directory and XFS with a power outage (well my cat falling down and yanking out wires from behind my system).

Lost my ~/ directory. About 30 gigs worth of thousands of files and directories. All turned into numbers. No way to find out what file was what unless I tried to open them up individually. Was not a fun experiance.

fsck

Posted Mar 20, 2007 12:13 UTC (Tue) by nix (subscriber, #2304) [Link] (4 responses)

So it was a cat-astrophe?

(sorry)

More seriously, if fsck fails or acts in unuseful ways like that, and you can get an image of the disk pre-fsck, tytso might be interested in it...

fsck

Posted Mar 20, 2007 12:53 UTC (Tue) by etienne_lorrain@yahoo.fr (guest, #38022) [Link] (3 responses)

An image of the partition before applying the EXT3 journal would certainly be usefull, so quite a few full DVDs; but you always think: well, I've lost two or three files, just begin the recovery, will not be too bad... well it seems a bit more, give the "always answer yes" option to fsck... well it begin to feel bad... well there was nothing that important on the filesystem... and too late to do it right anyways.
That is at those times that you like the separate partition for valuable files, with simple partition schemes and no optimised filesystem (choice in between FAT and ext2fs).
The thing I should have done is to run an e3fsck after I had few crashs with the floppy driver on this PC (a bug already solved leaving interrupt disabled so power-off in X), but it was 6 to 10 sessions (i.e. few hours of work followed by shutdown/power off) before so it should have been handled by the EXT3 journal recovery.
I still should have forced the check from another distribution to be sure - or the FC6 live CD... too late.

Etienne.

fsck

Posted Mar 20, 2007 14:55 UTC (Tue) by nix (subscriber, #2304) [Link] (2 responses)

If you give e2fsck the -y option, IMHO you deserve everything you get. There's a *reason* it's not the default. (And, yes, when I'm lucky enough to have enough accessible storage and a major filesystem does south without recent backups, I *do* gzip up the filesystem image and back it up before doing a fsck. Perhaps I could use e2image but I've never dared risk it.)

fsck

Posted Mar 20, 2007 16:02 UTC (Tue) by bronson (subscriber, #4806) [Link] (1 responses)

Unless you're an ext3 filesystem engineer, how are you supposed to know what fixes to make?? If fsck asks my mom, "1377 unreferenced nodes, delete? (Y)" (whatever a typical error looks like; it's been a while), what is she supposed to do?

The only two modes that the average Linux user can run fsck in:
- All Y, which you say is a bad idea.
- All N, in which case there's no point.

Maybe fsck could offer an "all trivial" setting, where it would automatically make fixes that it thinks are unlikely to cause data loss. If a bigger problem is found, fsck could bail out saying, "Serious errors found, back up partition before repairing!"

This is the same problem as Windows users splatting "Yes" every time their OS asks, "Do you want to allow a connection to sdlkh.phishing.org" except that the fsck questions are even less understandable!

fsck

Posted Mar 20, 2007 17:02 UTC (Tue) by nix (subscriber, #2304) [Link]

Your points have merit: it's hard to work out which changes are safe. That's why I tend to e2fsck first with -n, and review the list to see if there are a lot of changes or they look intuitively frightening. If they do, it's image-first time, so I can retry with more n answers if fsck goes wrong.

(The `all trivial' option already exists: it's what you get if you run e2fsck with -a. If e2fsck says you must run fsck manually, that means it has nontrivial fixes to make. In my experience this is rare indeed, even in the presence of repeated power failures or qemu kills while doing rm -r.)

This is yet one more place where a CoW device-mapper layer would be useful: instead of doing huge copies, you could just do the e2fsck in a CoWed temporary image (maybe mounted on a loopback filesystem somewhere :) ) and see if that worked...

fsck / xfs

Posted Mar 20, 2007 13:24 UTC (Tue) by rfunk (subscriber, #4054) [Link] (2 responses)

When SGI first introduced XFS for Irix, they talked about "no need for fsck, ever", just as
the ZFS proponent above does. Later they were forced to admit that there are times
when filesystems need to be checked or repaired, and they introduced tools for doing so.

Unfortunately, those tools probably aren't as useful when things go wrong as they would
be if the need for them had been anticipated from the start. Like you, I've discovered the
hard way that XFS is very bad for situations when the disk might accidentally lose power
or get disconnected unexpectedly. SGI apparently designed it for a stable server-room
situation.

I've never had a problem with ext3, and am slowly migrating my XFS filesystems to ext3.

fsck / xfs

Posted Mar 22, 2007 11:47 UTC (Thu) by wookey (guest, #5501) [Link] (1 responses)

I too have found the hard way that yanking the power on XFS (or just hitting reset at a bad time) is a very bad idea. All the files that had pending writes just end up as the correct length of zeros. When this is includes your package database, perl binaries and a load of other libs, this is quite bad.

The xfs_repair tool did do a pretty-good repair job (once I fixed it so it ran! http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=414079) but it did take about 5 hours to do it on a pair of 200GB mirrored drives. Then I got to re-install everything to fix the damage.

About 3 days faff in total. Fair dues though - there was no user-data loss and the system was recoverable, but I've never had this trouble with reiser3 on my laptop or ext3 on other boxes. So, yes, XFS is a really nice filesystem (live resizing, nice and fast) but I'd avoid it unless there is a UPS around.

fsck / xfs - versus ZFS

Posted Apr 17, 2007 14:06 UTC (Tue) by qu1j0t3 (guest, #25786) [Link]

It would be wrong to assume ZFS has the same failure modes as XFS. See, for instance: Bill Moore's blog.