Runtime filesystem consistency checking
Runtime filesystem consistency checking
Posted Apr 12, 2012 12:47 UTC (Thu) by nye (subscriber, #51576)In reply to: Runtime filesystem consistency checking by nix
Parent article: Runtime filesystem consistency checking
My ZFS experience on a ~5TB pool consisting of six commodity HDs under fairly light load (ie. it's a home file server) is that every couple of months scrub detects checksum errors in a block or a small handful of blocks, without any corresponding read/write errors being given by the device.
Not sure if that's the situation you're talking about.
(Also, the same experience has taught me that Western Digital should be avoided like ebola. I actually wonder if their green series might be drives that have failed QC and been re-badged for the unsuspecting consumer.)
Posted Apr 13, 2012 10:33 UTC (Fri)
by etienne (guest, #25256)
[Link] (2 responses)
I am not sure to interpret exactly what is happening on my own PC, but I suspect something like:
I do not know why the sector is not rewritten by the Linux driver, I know that I did solve same problem on another PC by touching a file in a directory, forcing the sector containing the directory entry to be rewritten.
Posted Apr 13, 2012 12:34 UTC (Fri)
by james (subscriber, #1325)
[Link]
And it can do all of that without having to worry about which operating system is running, or it's a database using raw access, or if it's a light layer using BIOS calls but no filesystem. It can preserve this information across reformats.
In your case, by causing the sector containing the directory entry to be rewritten, the disk probably decided that this was a great time to remap in a spare sector, so it actually went to a different part of the disk. (Unless the filesystem you were using put the new directory entry somewhere else anyway.)
And ECC correction doesn't take seconds; re-reading the same sector repeatedly in the hope that you can get a last good read does.
¹ If you've got command queueing turned on, several requests outstanding, and there's a delay, which sector caused the problem?
Posted Apr 13, 2012 14:51 UTC (Fri)
by jzbiciak (guest, #5246)
[Link]
It will also let you fire up background health checks (these can take quite a long time to complete -- as long as a day, as I recall) that may help turn up other problems.
Runtime filesystem consistency checking
- one block of sectors develop a bit fault in the magnetic data
- the ECC correct it each times the PC reads the sector resulting in a *very long* delay of few seconds
- the Linux driver do not noticed there was long ECC correction and do not decide to rewrite the sector identically to get the magnetic data corrected
- long term a second error will appear on the magnetic data and the ECC will no more be sufficient.
I never noticed the problem when the "old" ATA/IDE driver was used, but I am not sure I interpret correctly what happens on my PC during the last few days...
Runtime filesystem consistency checking
Runtime filesystem consistency checking