How to spoil your day
Sunday evening, I was in the data center. Hagrid had a failed sda in the RAID array, since post-Christmas (when I was on vacation), and I was going to replace it. Thanks to RAID1, it still kept humming along. Its almost impossible to find 120GB disks any longer, so I thought it would be time to upgrade to 2*500GB.
Monday, shone on me (smartctl -d ata -a /dev/sda):
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.
Again, sda was at it. A brand new, honking, 500GB SATA2 disk, failing. Power supply? Fubared motherboard? Now my thoughts of buying a Macbook or whatever new-fangled device Apple launches on January 15 at MacWorld, is clearly gone down the drain. I’m guessing a new server is in order. Well, at least it will be 64-bit, and every bit capable of running Xen.
In case anyone’s looking for a good reference to S.M.A.R.T. error messages, the Wikipedia entry on S.M.A.R.T. is pretty good.
Technorati Tags: raid1, s.m.a.r.t., failing disks
Bah. I’ve got an old IDE Fujitsu hard drive that SMART has been warning me about for almost 3 years, thru intallations of CentOS, Fedora and two versions of Ubuntu. Running on an Intel mobo, with some goofy flash anti-virus, which I think is the cause. Ignore it, do backups, have faith, soldier on.
Too late Don. Smartctl: Device Read Identity Failed (not an ATA/ATAPI device)
And dmesg has filled it up with relevant IO errors, to make it useless.
No soldiering on this time (being a server and all, with limited access). Sigh