Recovery Lessons Learned from Storage Failure

Recently, we experienced a fairly catastrophic SAN : we lost two drives of a RAID-5 array. Needless to say, was time-consuming, but it also pointed out some general issues with many disaster , business continuity, and general architectures involved with virtual environments. Luckily, we were able to start one of the drives, let the hot-spare take over for the second failure, and recover the vast majority of our data. Yes, there was corruption, so that is where our backups came in and the ultimate dependencies for restoration. How do you recover from a catastrophic failure? Do you fail over automatically to a hot-site or cloud environment? Even if you fail over, how do you recover from a catastrophic failure?

Here are some of the problems we faced on recovery:

NOTE: This column was originally published in Newsletter.

