Computers and disk arrays

Website · Post by **crfriend** » Sat Jun 09, 2012 8:12 pm

How many disk failures can a RAID-5 disk array with one hot spare withstand before going belly-up? The answer turns out to be, "It depends".

Last week, one of the disk drives therein went into "predictive failure". Now, for those in the know, "predictive failure" is the equivalent of the little red "oil pressure" light on one's car: it's already too late, and you're hosed (as, likely is, your motor). Duly, the OS picked up on this, and tried to perform the swap to the hot-spare. This worked for a while until the machine ran across a "miscorrected data error", gave up the rebuild, and faulted the array. This is not supposed to happen with professional-level kit, which this is, albeit rather "aged" (those who know me know my fondness for old iron) at about ten years of age. After the second failure, the system flagged one filesystem unsafe, errored it, and crashed the database that was running on the host. Collateral damage was the monitoring system that keeps me informed of the overall health of my home computing environment and the spam-filters.

So, here I am with an array that the system is convinced is unfixable, and every time I try to rebuild it it balks. OK, I think, I just need to swap the *other* bad disk and rebuild the array from scratch. Where's the backup? You got it; I didn't have one -- I don't have any sort of media that can back up 167 GB in any sane length of time, much less restore it. That's why I was conservative and allocated a hot-spare instead of grabbing the extra 18 gigs of space for storage. {Insert unprintable commentary here.}

"I'm not dead yet!" Sure enough, after a reboot of the system, the status page listed a dead drive (the wrong one, of course) but the one that "miscorrected" an error showed up as OK. The logs all showed the error to be in the same blasted block so I went digging. I hammered on the same block time after time and tried to get it to correct the error successfully (A symptom of insanity is performing the same behaviour and expecting a different outcome; I learnt this from Windows, but with Windows it usually works.). Clearly it was time to take a different tack, so I teased the filessytem -- and the RAID setup -- apart and found that the error block was in an unused extent of the filesystem. Great! All I need to do is run the array in degraded mode, copy everything off, and then rebuild it from scratch. How much free space do I have on all the other systems in the house to back up this 167 gig monster? 122 gigs total. {Insert unprintable commentary here.}

Now what? How much stuff can I afford to lose (i.e. How much of it do I have elsewhere?) from the array. Well, it turns out I could lose most of my music collection because I back-packed that to work on a laptop quite some time ago, and I'd been keeping the two collections in sync. That got me below the free-space number in the house and I spent the past several days copying everything off the failed array I could -- and then double-checked to make sure I'd gotten everything.

"# metaclear -f d127". For those who have issued such a command, you understand the need to have "all your ducks in a row" before pulling the trigger. The array -- and all the data thereon -- vanished without so much as a whimper.

Since I was dealing with several potentially-failing disks, I figured I'd just replace the lot of 'em and be done with. Try finding 18-gig disks these days. One can't, so I replaced them with 72-giggers (which actually hold about 67 gig -- one has to love marketing types) and decided to go with a modern RAID system.

Step 1 -- grab the array and give it a thorough cleaning (unscrewing the attachment points for the cable instead of the cable in the process).
Step 2 -- replace the cable-attachment points {insert assorted cursing here}
Step 3 -- install the new drives (and a hot-spare 18-gigger for the system's internal drives) and see if they all spin up. Miracle of miracles, they all do.
Step 4 -- take it all back downstairs again and hook it up
Step 5 -- create the main RAID entity -- double-parity this time, and a hot-spare; I do not want to have to do this again for several years.
Step 6 -- create all the filesystems and get them mounted in the right places.
Step 7 -- restore all the data. This is still in progress, and I'm hoping we don't take a power-cut. I figure this'll take about 2 days over the 10Mb/s network I have in the house.

Film at 23:00, perhaps in three days' time. I love computing!

Post by **Uncle Al** » Sat Jun 09, 2012 10:44 pm

Carl - I feel for ya'

I have a small Seagate FreeAgent Go which is powered via any available
USB port. It holds 320GB and can back up my 2 80GB drives in about
10 minutes. Now this is files only--no programs. The unit was about
$80.00 + tax. You may want to investigate this as an 'option' for you.

Uncle Al

Website · Post by **crfriend** » Sat Jun 09, 2012 11:03 pm

Uncle Al wrote:Carl - I feel for ya'

Thanks, but at the moment I have more pressing details on my mind than computers. I just put a whole load of what I thought was potting soil into Sapphire's "pallet garden" (it said "Garden Soil" on the label). Sapphire informs me that it was manure. I bite my fingernails. Use your imagination.

On the computer front, the database engine is back on-line and is performing well. The lightweight virtual-machines that I use to do software development and SkirtCafe prototyping are in the process of restoring and should be hale in another couple of hours. My vast collection of archived GOES images are restoring, as are the animations therefrom; I have animated satellite imagery from back before Hurricane Katrina. Also in process is my collection of weather observations going back to about 2002 with one-hour granularity. Music comes next, followed by the various "backup" spaces and my own home directory.

At least I'm not going to be at risk of being swept overboard with this as I've got my backside firmly planted in a chair instead of hanging onto a mast in 35-knot winds trying to furl a sail, which is what happened yesterday.

A chat with another good pal of mine points up that USB drives in the Terabyte range are available for reasonably short money. Since the "new and improved" array is slightly larger than half a terabyte, two (or even four) of those, rotated weekly and stored off-site, sound like a good idea.

Brandy · Post by **Brandy** » Sun Jun 10, 2012 3:40 am

crfriend;

Not sure what system you are using but " # metaclear -f d127 " I recognize as a Solaris Volume Manager command.

Sounds like you fell in the " raid5 write hole". Have you looked into Solaris 11 ZFS RAID-Z system? Here is a quick overview https://blogs.oracle.com/bonwick/entry/raid_z of course there is a gotchya the problem is doing the whole pool backup and restore. Solaris 11 is free for download, development, testing etc. Not free in a production environment. It also needs newer hardware to run, your 5-20 year old beater pc will not work.

--Brandy

Website · Post by **crfriend** » Sun Jun 10, 2012 12:48 pm

Brandy wrote:crfriend;

Not sure what system you are using but " # metaclear -f d127 " I recognize as a Solaris Volume Manager command.

It's Solaris 10 Update 8 using SVM. Good call.

Sounds like you fell in the " raid5 write hole".

It could be. There was a potentially contributing event just preceding the entire fiasco where one of our cats managed to knock the cable leading from the array to the DVD/ROM drive asunder which momentarily hosed the termination. No, we did not take the cat to the local Chinese restaurant for dinner.

However, I could provoke the error by poking at a single drive in the array, and this indicates, to me, a drive failure.

Have you looked into Solaris 11 ZFS RAID-Z system?

The new setup is ZFS raidz2. ZFS (Zettabyte File System, for the uninitiated) is new to me, so when I first built the array I used what was familiar to me with the intent of learning ZFS later. "Later" has arrived.

It also needs newer hardware to run, your 5-20 year old beater pc will not work.

The iron in question is a Netra T1 - 105 with a half-gig of mainstore. The mainstore will be getting upped to a full gig in the coming week; it turns out that ZFS is a bit of a memory-pig, and with it active I cannot boot either of the zones that contain (1) the prototype I use for SkirtCafe upgrades and (2) my Icinga development environment.

As far as Solaris 11 goes, the newer kit is (1) too expensive for my budget, (2) too restricted in what one can do with it, and (3) the newer hardware is so loud that I would not want it running in the room with me.

Oracle buying Sun was just a tragedy, and seems to be further fuelling folks' departure from Solaris-atop-SPARC in favour of Linux-atop-Intel. That's certainly the case where I work, even though the Solaris systems are virtually bullet-proof. I suspect that the purchase was part of Larry Ellison's fantasy of out-doing IBM: Bauxite and sand in one end and full-featured "appliances" out the other. Unfortunately, that means that everybody else who had an interest in the environment is going to suffer and, ultimately, go elsewhere.

Website · Post by **crfriend** » Sun Jun 10, 2012 2:06 pm

Here's how the saga unfolded, based on log excerpts from my system.

The first hint:

Code: Select all

Jun  3 07:38:18 t1 scsi: [ID 107833 kern.warning] WARNING: /pci@1f,0/pci@1,1/scsi@2/sd@8,0 (sd7):
Jun  3 07:38:18 t1      Error for Command: read(10)                Error Level: Informational
Jun  3 07:38:18 t1 scsi: [ID 107833 kern.notice]        Requested Block: 27425122                  Error Block: 27425122
Jun  3 07:38:18 t1 scsi: [ID 107833 kern.notice]        Vendor: IBM-PSG                            Serial Number: 01440040UCH5
Jun  3 07:38:18 t1 scsi: [ID 107833 kern.notice]        Sense Key: Soft Error
Jun  3 07:38:18 t1 scsi: [ID 107833 kern.notice]        ASC: 0x5d (LUN failure prediction threshold exceeded), ASCQ: 0x2, FRU: 0x0

After that, the logs are silent for some time, after which, this is emitted:

Code: Select all

Jun  3 09:19:47 t1 scsi: [ID 365881 kern.info] /pci@1f,0/pci@1,1/scsi@2 (glm0):
Jun  3 09:19:47 t1      Cmd (0x30008e44d60) dump for Target 8 Lun 0:
Jun  3 09:19:47 t1 scsi: [ID 365881 kern.info] /pci@1f,0/pci@1,1/scsi@2 (glm0):
Jun  3 09:19:47 t1              cdb=[ 0x28 0x0 0x1 0x2a 0x22 0x62 0x0 0x0 0x20 0x0 ]
Jun  3 09:19:47 t1 scsi: [ID 365881 kern.info] /pci@1f,0/pci@1,1/scsi@2 (glm0):
Jun  3 09:19:47 t1      pkt_flags=0x4000 pkt_statistics=0x61 pkt_state=0x7
Jun  3 09:19:47 t1 scsi: [ID 365881 kern.info] /pci@1f,0/pci@1,1/scsi@2 (glm0):
Jun  3 09:19:47 t1      pkt_scbp=0x0 cmd_flags=0x8e1

Clearly this system is now in some distress.

Code: Select all

Jun  3 09:19:50 t1      Error for Command: read(10)                Error Level: Retryable
Jun  3 09:19:50 t1 scsi: [ID 107833 kern.notice]        Requested Block: 19538530                  Error Block: 19538530
Jun  3 09:19:50 t1 scsi: [ID 107833 kern.notice]        Vendor: IBM-PSG                            Serial Number: 01440040UCH5
Jun  3 09:19:50 t1 scsi: [ID 107833 kern.notice]        Sense Key: Unit Attention
Jun  3 09:19:50 t1 scsi: [ID 107833 kern.notice]        ASC: 0x29 (bus device reset message occurred), ASCQ: 0x3, FRU: 0x0
Jun  3 09:20:36 t1 scsi: [ID 107833 kern.warning] WARNING: /pci@1f,0/pci@1,1/scsi@2 (glm0):
Jun  3 09:20:36 t1      Resetting scsi bus, <null string> from (8,0)

Ouch! This spells the beginning of the end.

Code: Select all

Jun  3 09:23:07 t1 md_raid: [ID 104909 kern.warning] WARNING: md: d127: /dev/dsk/c0t8d0s0 needs maintenance
   .
   .
   .
Jun  3 09:23:09 t1 md_raid: [ID 241980 kern.notice] NOTICE: md: d127: hotspared device /dev/dsk/c0t8d0s0 with /dev/dsk/c0t2d0s0

OK, there's the spare. The next step is to rebuild the metadevice by recreating all the data and parity information from the survivors. I've seen this happen many times and it's better than 99% successful.

Some time goes by, and then this pops up:

Code: Select all

Jun  3 11:03:28 t1 scsi: [ID 107833 kern.warning] WARNING: /pci@1f,0/pci@1,1/scsi@2/sd@b,0 (sd10):
Jun  3 11:03:28 t1      Error for Command: read(10)                Error Level: Retryable
Jun  3 11:03:28 t1 scsi: [ID 107833 kern.notice]        Requested Block: 14413090                  Error Block: 14413144
Jun  3 11:03:28 t1 scsi: [ID 107833 kern.notice]        Vendor: COMPAQ                             Serial Number: B0193370
Jun  3 11:03:28 t1 scsi: [ID 107833 kern.notice]        Sense Key: Media Error
Jun  3 11:03:28 t1 scsi: [ID 107833 kern.notice]        ASC: 0x11 (miscorrected error), ASCQ: 0xa, FRU: 0x0

WTF is this? No matter what it is, it stopped the rebuild in its tracks, left drive 8 in a "needs maintenance" state and puts drive 11 into a "Last Erred" state. This means that the array is very badly degraded and likely cannot be rebuilt. In other words, it's time for "Plan B" -- backups.

Backups? We ain't got no backups. We don't need to show you any steenking backups!

{Insert unprintable commentary here}

A reboot of the system cleared the "Last Erred" state and at least allowed me to read data from it, and -- fortunately the luck of the Irish was with me -- the error block on disk 11 was in the middle of unallocated space. Sysadmin wipes brow in light of this.

Once I got most of the important data off the array, I decided to poke at the problem area a little bit. I used "dd" (It's supposed to mean "convert and copy", but a better mnemonic is "diddle and duplicate") to zero out the block that was reporting the error (it was all zeroes to begin with) to see if I could fix the "miscorrected error" problem. No joy. Confusing the matter further, I could make the error come and go by varying the size and length of an access. The net result was one very unamused sysadmin.

Realising that I was fighting a losing battle on this, I slurped everything else off the array, stashed it in assorted dark corners on disks on all the other systems in the house, and "forklifted" (from "forklift upgrade" -- swapping one entire device for a newer one) it.

There was one blessing in this, and that was that I got to clean the inside of the array during the disk-replacement process -- and it was filthy! We have cats, so some fur is to be expected; there also chickens in the room with the array. Now, chickens are surperb at producing an insanely fine dust that gets onto -- and into -- everything in the vicinity, and if there is any air motion it "goes with the flow". The net result was an almost completely choked-off plenum in and around the disks and on the grille that leads rearward to the power supply.

I didn't get a picture of the innards of the array, but I just took one of one of the disks that was extracted from it which I'll attach to this missive later.

So, the new array is in place with double-disk parity and a hot-spare, and I left the restores running over the course of the night. I figured all the activity would keep my laptop awake, but at 03:00 the laptop felt lonely and figured it'd go to sleep:

tiger_13.png

The restores, needless to say, once their controlling session went away, terminated. I restarted them a bit after 08:00 after I got up and saw what happened. I suspect they'll run for the rest of the day; I just need to tickle the laptop a bit periodically to keep it awake -- or do the smart thing and put the sessions on a non-sleeping device.

sapphire · Post by **sapphire** » Sun Jun 10, 2012 2:31 pm

Egads. So happy I left IT. This sounds like what I used to do on IBM mainframes and DG Novas and Eclipses. Had a dream teh other night involving Nova support and my happiness that I didn't have to do it.

Regarding the "potting soil" incident. Yes, I thought it was garden soil as well, but it sure smelled like manure and handled like manure. As for your "potty mouth", you get no sympathy from me. You've spent tons of time around horses and chickens and can't recognize the smell? Anyway, didn't your Mom teach you to wash your hands after playing in the dirt?

Brandy · Post by **Brandy** » Mon Jun 11, 2012 4:38 am

crfriend;

Thanks for the details I'm sure it either bored or lost most people but enjoyed seeing the details. OK Solaris 10 5/08 I have some systems at work running on that version. ZFS raidz-2 for your data array should pretty good. I'll let the backup scheme up you but as mentioned usb drives are pretty cheap these days.

To actually back up the data pool means taking a snapshot, send the stream to a storage device. Then for a restore recreate the storage pool and then receive the stream from the storage device. Or just copy the data off to another device. There is a lot of information at OTN (Oracle Technical Network).

Yes from a user's point of view Oracle buying Sun is a disaster. As mentioned by a former SUN, now Oracle employee he was happy to see the buy out as SUN was hemorrhaging money and would have shortly been out of business.

Have a look at Oracle VirtualBox https://www.virtualbox.org/ ? I use it and like it and it is free. It will run Solaris 10 or 11 as a client.

--Brandy

Website · Post by **crfriend** » Sun Jun 17, 2012 1:13 pm

Given how complex modern disk drives are, and what amazing pieces of precision they are, it's rather surprising how much abuse -- environmental and otherwise -- they can take. This one, covered in cat-hair and chicken-dust, is such an example:

disk-dust.jpg

This was the topmost disk from the front of the array as the air flows; there were some that were worse, but I managed to knock much of the crud from them as I pulled all of them out to clean the innards of the enclosure.

So, it looks like most of this saga is behind me. The boxful of 18gig drives has been replaced by a boxful of 72gig drives with double-parity and a hot spare.

Code: Select all

t1:carl >. /usr/sbin/zpool status
  pool: pool0
 state: ONLINE
 scrub: none requested
config:

        NAME         STATE     READ WRITE CKSUM
        pool0        ONLINE       0     0     0
          raidz2     ONLINE       0     0     0
            c0t2d0   ONLINE       0     0     0
            c0t3d0   ONLINE       0     0     0
            c0t4d0   ONLINE       0     0     0
            c0t5d0   ONLINE       0     0     0
            c0t8d0   ONLINE       0     0     0
            c0t9d0   ONLINE       0     0     0
            c0t10d0  ONLINE       0     0     0
            c0t11d0  ONLINE       0     0     0
            c0t12d0  ONLINE       0     0     0
            c0t13d0  ONLINE       0     0     0
        spares
          c0t14d0    AVAIL

errors: No known data errors
t1:carl >. /usr/sbin/zfs list
NAME                  USED  AVAIL  REFER  MOUNTPOINT
pool0                90.9G   439G  50.2K  /pool0
pool0/Music          13.2G   439G  13.2G  /export/Music
pool0/bonnie         5.88G   439G  5.88G  /backup/bonnie
pool0/cache          50.2K   439G  50.2K  /local/squid/cache
pool0/mancini        8.25G  9.75G  8.25G  /backup/mancini
pool0/mysql-5.0.51b   640M   439G   640M  /usr/local/mysql-5.0.51b
pool0/orator         1.95G   439G  1.95G  /backup/orator
pool0/raid           6.23G   439G  6.23G  /export/raid
pool0/skirtcafe      1.75G  3.25G  1.75G  /zones/skirtcafe
pool0/syzygy         21.4G   439G  21.4G  /backup/syzygy
pool0/t1a            3.28G   439G  3.28G  /zones/t1a
pool0/www            28.4G   439G  28.4G  /var/www
t1:carl >.

The restores took three and a half days. I love the statement above: "No known data errors".

One 18gig drive remained as the hot spare for the disks that are internal to the processor itself. This will become a 73 at some point, as will the two internal disks.

floatingmetal · Post by **floatingmetal** » Sat Jun 30, 2012 10:06 am

I think it's the month for it. I'm still working on trying to recover the data from a drive where the RAID controller decided to start re-building a mirror on to a disk that wasn't part of the set and with data on already... The company which was supposed to be providing the online backup service are oscillating between "that was from before we were taken over (despite the fact we've still been taking your money for the service every month)" and "you've not been paying us enough for this service and *you* should have noticed (despite the fact we claim to look after your IT so you don't have to)". They're also trying to point the finger of blame at me for deleting some critical backup files, which I wouldn't have been so daft as to do but some of the people there do have a history of doing such a thing (I know this as I worked there before the take over).

If the backups had been working as it was supposed to, there would have been no problem of course.

The joys of IT...

STEVIE · Post by **STEVIE** » Sat Jun 30, 2012 8:47 pm

Hi Sapphire,
All the computer stuff is a foreign language to me. The "garden soil", however, we call "sharn" or "dung" dependant on it's colour and smell.
Believe it or not, there is an accepted Scottish tweed colour, "Sharnie Green", very descriptive and much beloved of the "landed gentry"
Steve.

skirtcafe.org

Computers and disk arrays

Computers and disk arrays

Re: Computers and disk arrays

Re: Computers and disk arrays

Re: Computers and disk arrays

Re: Computers and disk arrays

Re: Computers and disk arrays

Re: Computers and disk arrays

Re: Computers and disk arrays

Re: Computers and disk arrays

Re: Computers and disk arrays

Re: Computers and disk arrays