Quote of the week
      Posted Jul 16, 2020 7:16 UTC (Thu)
                               by ncm (guest, #165)
                              [Link] (15 responses)
       
Most of those bit errors will be in your data, but some will be in the metadata. 
Better file systems can be configured to keep their own checksums on metadata and data, but they can't always guarantee that the contents of a block they find to have a bad checksum can be recovered. With a bit of RAID, there might be another copy. I don't know if they would necessarily have another copy of the metadata. Anyway not all file systems can keep checksums, and not all that can always do. 
So there is always a hint of chaos lurking in any disk drive even when the software is right. 
     
    
      Posted Jul 16, 2020 8:32 UTC (Thu)
                               by anton (subscriber, #25547)
                              [Link] (9 responses)
       Nevertheless, we have recently used ZFS on a server, because it has features we want (snapshots), and because it's better at handling disk failure for RAID-1 setups than btrfs.  Maybe a sales pitch based on features would be better for ZFS.
      
           
     
    
      Posted Jul 16, 2020 11:36 UTC (Thu)
                               by ianmcc (subscriber, #88379)
                              [Link] 
       
All of my disks now are in mirrored pairs.  Upgrading them to larger space is just a matter of replacing one of the pair with a larger disk, resilvering, and replacing the other disk.  (Or something fancier, such as striping).  I haven't used snapshots much yet, but that looks very good for having some off-site backup as well. 
As far as I understand it, most HDD failures are physical failures in mechanical parts that are not repairable by in-built error-correction.  Recovering data from a head HDD is expensive (probably impractical for a home user?  Not sure).  After a HDD failure a few years ago where I lost a lot of photos that didn't have a backup, I've now got all my critical stuff in 'the cloud', but it is also nice to know that my desktop itself has some resilience to hardware failures. 
     
      Posted Jul 16, 2020 13:44 UTC (Thu)
                               by mchouque (subscriber, #62087)
                              [Link] (2 responses)
       
Because the real and true value is not so much about checking whether your media is lying to you about the data it's reading back to you you but really about knowing what you are reading and got from the whole chain (eg: media -> firmware -> transport -> controller -> memory) is exactly what you wanted to write in the first place. 
With other non checksumming filesystems, when an IO goes from the kernel to the disk controller to end up on a disk, you have no idea if it was corrupted in flight or not. 
Maybe you're just writing random crap that was corrupted in memory, during transfer, by your controller, by your disk array, by your FC infrastructure and so on. In that case when you read it back, your hard drive, array or whatever will happily say it's all good since it's what it got in the first place. Or equally, maybe what it read was correct but it was corrupted on its way to you: you just don't know. 
And that's what a checksumming filesystems is good at. 
You're thinking filesystem checksums are about catching media corruption while you should be looking at the big picture: it's an end-to-end checksum. 
     
    
      Posted Jul 16, 2020 21:39 UTC (Thu)
                               by barryascott (subscriber, #80640)
                              [Link] (1 responses)
       
Of course our are correct that if the dat is corrupted on the bus that counts for nothing. 
Barry 
 
 
 
 
     
    
      Posted Jul 17, 2020 5:01 UTC (Fri)
                               by ncm (guest, #165)
                              [Link] 
       
So a block that gets past that will most often have two smaller clusters of bad bits, or one bigger cluster. Before delivering that bad block, it will read that sector over and over, maybe a hundred times before it gives up. If it gets one good read, it will write that to a new sector, and maybe mark the old spot as bad, and not write anything more there. 
After a while, you will have a growing collection of those copied-off blocks, with a higher than typical likelihood, in each, of having undetected errors in it. 
     
      Posted Jul 16, 2020 17:37 UTC (Thu)
                               by zlynx (guest, #2285)
                              [Link] (2 responses)
       
Because it is a *real life problem* that block addresses get corrupted during a write operation. Which means that data block your system wrote went to the *wrong block*. Now the new data is not where it is expected *and* it just wrote over some unknown block of data. 
ZFS or btrfs can help with that because by having multiple copies of the data when it goes to read it or scrub it, it will notice that data is either missing or overwritten and can recover from another copy. 
     
    
      Posted Jul 16, 2020 18:04 UTC (Thu)
                               by ncm (guest, #165)
                              [Link] (1 responses)
       
Furthermore, the block that was supposed to have been overwritten (instead of the one that was) still has a good checksum, even though its contents are stale, and wrong. To detect this you need a generation counter, the number of writes that have occurred in the filesystem, folded into the checksum, and have each block's metadata also include the generation count. Again, I don't know which filesystems do this, if any. 
Filesystem integrity maintenance is a lot like security: you need a threat model, a list of error types you hope to detect, and a list of which of those you hope to be able correct automatically. The more of each you have, the more complex it all gets, and the less confidence you can have in the code that implements it all. 
     
    
      Posted Jul 16, 2020 18:09 UTC (Thu)
                               by zlynx (guest, #2285)
                              [Link] 
       
     
      Posted Aug 3, 2020 13:48 UTC (Mon)
                               by Darkstar (guest, #28767)
                              [Link] (1 responses)
       
That paper, even though it is 12 years old by now, gives very accurate models for those errors, and even if you tweak the percentages down to compensate for "new developments" in HDD engineering, you still get non-negligible chances for visible errors during the lifetime of the harddisk. 
Yes, there are error checks and corrections, and yes, these might drop the BER for another order of magnitude or two. But there are also software bugs, race conditions, etc. in the drive firmware that counter these improvements. 
And these issues actually happen in practice. We have thousands of harddisks deployed in the field inside storage systems, and we see the blocks that just so happen to not make it to disk (or back) correctly. 
So while the actual numbers might be considered "fear-mongering", and you might argue that it should be one or two orders of magnitude higher or lower, if you have hundreds or thousand disks deployed it will have an impact. 
     
    
      Posted Aug 4, 2020 8:28 UTC (Tue)
                               by Wol (subscriber, #4433)
                              [Link] 
       
A single very large modern hard drive read end-to-end is also large enough to average one error. 
Cheers, 
     
      Posted Jul 17, 2020 0:42 UTC (Fri)
                               by Fowl (subscriber, #65667)
                              [Link] 
       
You *are* using encryption on all your disks, aren't you? ;) 
     
      Posted Jul 17, 2020 1:11 UTC (Fri)
                               by Wol (subscriber, #4433)
                              [Link] (2 responses)
       
And those errors are unlikely to be in the data on the hard disk. Reset the bus, re-read, and the correct data will be returned. 
So chances are the manufacturers CAN'T improve on those error stats, because they're the background level of errors caused by things like cosmic rays cascading through the electronics and flipping the odd random bit here and there ... 
Okay, I edit the raid wiki, but to some extent for those who want filesystems that protect their data I'd recommend dm-integrity/raid/lvm/ext. That uses the KISS principle with each layer "doing its thing". The trouble with btrfs/zfs et al is they try to be the swiss army knife - the filesystem that does everything. Totally at odds with the Unix principle of "do one thing and do it well". 
But it's like the monolithic/micro kernel holy wars, it depends what you want or need which approach is actually best for you. 
Cheers, 
     
    
      Posted Jul 17, 2020 5:25 UTC (Fri)
                               by ncm (guest, #165)
                              [Link] (1 responses)
       
So, no, there really are errors coming off the platter, that the controller can, most of the time, tease back to the correct bits--when it can tell they were wrong. But some errors produce the right checksum, and the controller is none the wiser. The designer has a target error rate, and producing fewer errors than the budget allows means the bits are not packed in as tightly as they could have been, and the disk advertises less capacity, for less money. If you care about errors, you will have made arrangements to tolerate some. 
Those are in addition to any errors resulting from noise on the bus, which is, ultimately  another analog affair, albeit with better margins than on the platter. The bus doesn't get to re-try, and anyway can't tell if it should have. 
     
    
      Posted Jul 22, 2020 12:15 UTC (Wed)
                               by ras (subscriber, #33059)
                              [Link] 
       
The same is true for any error system - including ZFS's.  But add enough bits and the problem becomes negligible.  The 512 bits someone here said a disk controller uses is more then enough to ensure you are unlikely to see an error before the universe goes dark. 
What ZFS and btrfs do provide is end to end checking.  The drive controller can only correct errors that occur on the disk.  Bus errors, SATA controller errors, DRAM errors, cosmic rays hitting CPU cache's all happen after that check is done and so go through unnoticed.  Doing the checking in the CPU catches them, which is where ZFS and btrfs win. 
 
 
     
      Posted Jul 30, 2020 14:15 UTC (Thu)
                               by azz (subscriber, #371)
                              [Link] 
       
When I built a new RAID array in 2018, I copied over 27.2 TB of data from my existing arrays, and b2summed everything before and afterwards. I found and fixed 16 single-bit errors - which is 1.4 x 10**13 - so pretty much spot on for that number! 
     
    Quote of the week
      
      This comment is an example of the fear-based ZFS sales pitch that cautioned me to steer clear of ZFS, and is therefore a perfect example of the original quote.  It cautioned me because the sales pitch does not agree with my knowledge about how HDDs work (HDDs use error-correcting codes, with some error detection distance between correcting to the right side and producing an undetected bit error; they also do not write blindly to some track the stepper motor has moved to, because they have not used stepper motors for a long time, and instead use servo motors with feedback from the content of the disk to determine where they are), nor with my experience of file system and HDD failures.
Quote of the week
      Quote of the week
      
Quote of the week
      
Quote of the week
      
HDD use 512 bit ECC for each 512 byte blocks and can recover 10 bit error runs I recall.
Quote of the week
      
Quote of the week
      
Quote of the week
      
Quote of the week
      
Quote of the week
      
Quote of the week
      
Wol
Quote of the week
      
Quote of the week
      
Wol
Quote of the week
      
Quote of the week
      
Quote of the week
      
 
           