Showing posts from September, 2011

Omitting the key information is bad

A forum member goes to great length describing a hard drive problem, which sounds mechanical. Then follows a precise account of what they tried to fix the problem, including the fact they replaced a PCB from a same model drive and so on. The story goes for like a full page, still the actual model of the drive is never mentioned. In this case, the model number is probably the most important piece of data, because we can look up a typical failure modes with it.

Tricks to determine the RAID type

If there is a set of disks, but the RAID type is not known, how do we determine what type of RAID is that? Most of the RAID recovery programs, including ours at , require the RAID type to be provided by the operator. In a most simple case, where all disks are available, one can get the idea of the RAID type by just plugging all the disks and looking at the Disk Management data. The following cases are most typical, 1. One or multiple partitions on exactly one of the disks. This is a RAID 0 or a RAID 5, more likely RAID 0. 2. One or multiple partitions, with two identical sets of partitions on two disks. With three disks, this is a RAID 5. With four or more disks, this is either a RAID 5 or a RAID 10. The above does not account for RAID 6 or exotics like RAID 3, and assumes MBR-style partitioning on the array, but nevertheless makes for a good start when working with an array of unknown type.

Problem isolation in RAID recovery

A full, start-to-end RAID recovery is generally a three part process. Determine status of the member disks and make clones when required. Detect RAID parameters and perform destriping If the destriped volume is not readily mountable, perform filesystem recovery on it to pump out the data Now, if the above three steps fail to produce correct data, the question is how do we tell if it is RAID recovery part, or filesystem analysis part that failed ? We tell if the RAID recovery is OK by looking at the sizes of the recovered files. If there are multiple good files recovered which are larger than twice the full row size (i.e. larger than 2 * block_size * num_disks), then the RAID recovery is almost certainly OK. However, if all good files are of the small size, the RAID parameters should be investigated. This also applies to the files found by raw scan; however, keep in mind that file sizes produced by raw scan are not reliable.

The most common problem with RAID5 is...

... that one does a rebuild with the wrong order of disks. This is by far the most common scenario we at have for an unrecoverable RAID 5. Something bad happens and the configuration is lost. The operator then assembles the array in a way which looks correct , and does a rebuild on it. The configuration which looks correct is just not good enough. You need a configuration which actually is correct. Doing a rebuild on a RAID 5 with wrong block size or disk order effectively destroys the data on the array. Theoretically, the data can still be restored, but practically the complexity of having two sets of parameters (with unknown block sizes, disk orders, and such) precludes any recovery.