Advertisement Advanced thin client solution for Linux, based on Open Source. Mix Windows and Linux applications on the same desktop.
Weekly Edition Return to the Development pageSponsored link Serve your customers, not your servers, with VERIO Linux VPS. Full-access test-drive here. |
Monitor disks with the S.M.A.R.T. monitoring toolsThe S.M.A.R.T. Monitoring Tools (Smartmontools) is a cross-platform set of utilities that are able to monitor operating data from hard drives:
The smartmontools package contains two utility programs (smartctl and smartd) to control and monitor storage systems using the Self-Monitoring, Analysis and Reporting Technology System (SMART) built into most modern ATA and SCSI hard disks.
In many cases, these utilities will provide advanced warning of disk degradation and failure. It should run on any modern Darwin (Mac OSX), Linux, FreeBSD, NetBSD, OpenBSD, Solaris, OS/2, eComStation, QNX, or Windows system.
Wikipedia defines SMART as the Self-Monitoring, Analysis, and Reporting Technology: "Mechanical failures, which are usually predictable failures, account for 60 percent of drive failure. The purpose of S.M.A.R.T. is to warn a user or system administrator of impending drive failure while time remains to take preventative action — such as copying the data to a replacement device. Approximately 30% of failures can be predicted by S.M.A.R.T." Version 5.38 of Smartmontools was recently announced. Improvements include:
Building Smartmontools was straightforward. The code was downloaded and unpacked. The usual configure, make and make install steps were performed on an Ubuntu 7.04 system with no troubles. The operation instructions from the README file were followed and the software was able to discover data from the one hard drive on the test system. This example output shows the wide variety of drive information that Smartmontools can display. The drive appears to be healthy. If you are a systems administrator who needs to keep track of hard drive reliability data, Smartmontools be able to provide some useful drive information. With the addition of a small amount of glue-logic scripting, it should not be too difficult to set up an automated drive monitoring system. (Log in to post comments)
Monitor disks with the S.M.A.R.T. monitoring tools Posted Mar 13, 2008 13:04 UTC (Thu) by nix (subscriber, #2304) [Link] You mean an automated drive monitoring system like, say, smartd(8) in the smartmontools? :)
SMART & Failures... Posted Mar 13, 2008 17:31 UTC (Thu) by leoc (subscriber, #39773) [Link] Google put out an interesting paper about this very topic.
Monitor disks with the S.M.A.R.T. monitoring tools Posted Mar 13, 2008 21:24 UTC (Thu) by malex (subscriber, #15692) [Link] Unfortunately, smartmontools still can't provide the SMART test information from external USB drives - one has to use manufacturer's MSWindows(TM) based Software to do that. It's a pity. If that capability were present I wouldn't be sitting without current backups while I"m waiting for a replacement drive to arrive.
Monitor disks with the S.M.A.R.T. monitoring tools Posted Mar 18, 2008 22:58 UTC (Tue) by jimparis (subscriber, #38647) [Link] This is not necessarily a problem with smartmontools. USB ATA passthrough is still new -- even Mark Lord, author of hdparm and a big contributer to ATA/IDE code in Linux, expressed suprise at finding an enclosure that actually supports it.Some vendors (Cypress) have invented their own custom protocol for getting SMART data this way, and there has been some recent discussion about including support for it in smartmontools...
Monitor disks with the S.M.A.R.T. monitoring tools Posted Mar 19, 2008 17:58 UTC (Wed) by malex (subscriber, #15692) [Link] I've realized that in time and switched to using a combination of linux-supported eSATA card and an eSATA enclosure. Now, SMART works, my backups are fast and I just don't care anymore for USB2. smartmontools work great with my current setup.
Monitor disks with the S.M.A.R.T. monitoring tools Posted Mar 13, 2008 22:18 UTC (Thu) by xav (subscriber, #18536) [Link] Mmmh .. so apparently it can predict 30% of 60% of the failures, which is less than 20% of the failures. Doesn't seem too useful.
Monitor disks with the S.M.A.R.T. monitoring tools Posted Mar 13, 2008 22:46 UTC (Thu) by hmh (subscriber, #3838) [Link] SMART failure prediction is crap, really. Most vendors set the thresholds too low. When you get one beyond the safety limit, it is usually too late for anything. But the SMART attributes, error logging, and the self tests are really useful. And so are smartd's mails to root when anything weird happens :-) I do self tests often, and long tests at least once a week. These find marginal and bad sectors in the RAID 1 components well before it becomes an issue. mdadm "array checks" also should be able to do it, but I've found that the SMART long test in my current set of disks is a lot more sensitive than simply telling the disk to read every sector. Your mileage will vary, of course :-)
Monitor disks with the S.M.A.R.T. monitoring tools Posted Mar 14, 2008 12:24 UTC (Fri) by NRArnot (subscriber, #3033) [Link] Manufacturers don't want to RMA disks that still "work", just because they are no longer working as well as they did when shipped. That's why they set stupid SMART thresholds. However, if you monitor the SMART counters yourself, you can get advance warning that a disk is starting to deteriorate, and swap it at that time. Unless you then put the removed disk into a test rig or unimportant system and exercise it for months or even years, you will never know if you caught a failing disk before failure or just replaced a good disk. However, the value of the data is usually much greater than the cost of the disk, so it's quite an easy decision. Google published some statistics on SMART's predictive value and on disk reliability in general. (One surprise: keeping disks cooled under 30C *reduces* life expectancy!) http://labs.google.com/papers/disk_failures.html
Monitor disks with the S.M.A.R.T. monitoring tools Posted Mar 14, 2008 21:45 UTC (Fri) by giraffedata (subscriber, #1954) [Link] However, the value of the data is usually much greater than the cost of the disk, so it's quite an easy decision. I don't think that's true. Often, the data is relatively unimportant, like a Google web page cache or a small part of a stream of undifferentiated experimental data. The rest of the time, the data is easily reconstructable, e.g. by copying from a mirror disk or backup tape. People set up storage systems so that the value of preserving the data is commensurate with the cost of preserving it. If you perturb that system by replacing drives more often based on SMART data, I think you'll have a net loss. On the other hand, if you could exploit SMART data so as to get the same reliability with fewer redundant copies, that would be a win. Either the Google paper or another that came out around the same time concluded that the best policy was to wait for a drive to fail, then replace it.
One surprise: keeping disks cooled under 30C *reduces* life expectancy If you want to jump to conclusions, but the study didn't actually isolate the cooling policy. It merely showed that drives that failed tended to be the ones that were cooler. That's a long way from saying if you speed up the fans, the disks will fail more. Just as likely is that the cool drives were of models where the engineers traded durability for low power consumption. Remember the one great consistent, fully controlled, correlation these studies show is between failure rate and model.
Monitor disks with the S.M.A.R.T. monitoring tools Posted Mar 20, 2008 5:01 UTC (Thu) by roelofs (subscriber, #2599) [Link] Either the Google paper or another that came out around the same time concluded that the best policy was to wait for a drive to fail, then replace it....for some definition of "fail." Keep in mind that performance drops, sometimes significantly, before unrecoverable data loss occurs. Greg
|
Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.