Sunday, 27 February 2011

Hotspare vs. Hotswap

What is the difference between hotspare and hotswap?

Hotswap is the capability to replace a failed part with a spare without having to shut down the system. To resolve a problem, somebody still has to bring the spare part and do the actual work replacing it.

Hot spare is the spare part that is placed into the system at build time, in anticipation that something will eventually fail. The hot spare ties the part which could be otherwise used, but instead sits idly waiting for something to fail. On the bright side, when something fails, there is no need for a human intervention because the spare part is already there and ready.

Thursday, 24 February 2011

On models in data recovery

The data recovery (be it filesystem parser or RAID recovery) software does not work based on the actual data alone. The equally important ingridient is the model of the correct state of the device being recovered.

Take a RAID0 for example. The model of the RAID0 would include stripe size, disk order, and first disk. There are often some less than obvious requirements, like "a block size must be a power of two". This works just fine until someone decides to implement a software RAID with 3 sectors per block. The recovery software then fails because its internal model of a "correct" RAID does not match the reality any longer.

Similarly with RAID5, the minimum practically useful model includes a notion of a possibly missing disk, to be reconstructed from the partity data. If you throw in a blank hot spare, the recovery fails because you just went outside of the design envelope - the model does not account for a possibility of a blank drive being included into the disk set for recovery.

Monday, 21 February 2011

Seagate and Raw Read Error Rate

Seagate drives are known to report exacerbated S.M.A.R.T. data for Raw Read Error Rate. This is well-known, normal, and should just be ignored.

Thursday, 17 February 2011

Images of disks in RAID recovery

In RAID recovery, if there is a controller failure, or a known software failure, there is no need to create the images of the RAID member disks. In single-disk operations, it is often considered a good practice to always make an image a disk. With RAID, this may be not so easy, considering sheer size of the modern arrays.

Actually, if there is no reason to suspect the physical damage of the RAID member disks, the imaging may be skipped altougether or put off until the decision is made to modify the data on the disks (possibly to revive the controller metadata).

Monday, 14 February 2011

Ratings instead of numbers

Circa 1998, AMD used a "Performance Rating" or "Pentium Rating" (PR) to indicate their CPU's performance by comparing it to then-current Intel Pentium. That was mostly because AMD could not deliver a CPU operating at frequencies matching these of Intel's, so they opted to move frequencies out of sight. Then, comparison shopping became little messy. And btw that did not help AMD much.

Given this thread on AnandTech, looks like we might get a similar issue with SSD benchmarks. Not that I
particularly care about SSD benchmarks.

Friday, 11 February 2011

Modern RAID recovery capabilities

Speaking of automatic RAID recovery software, there is still much to do.

In ReclaiMe Free RAID Recovery, we have the basics and classics pretty well covered, that includes RAID0, RAID5, and RAID 0+1/1+0 by reduction to RAID 0. There are a couple of other vendors out there who provide similar capabilities, so it is a done deal.

There are other RAID levels, in which neither we nor other vendors offer anything automatic. We could probably do something of E-series layouts (RAID 1E or 5E/EE), but we don't see a real demand for it. Automatic RAID 6 recovery looks more interesting and maybe we'd even give it a shot someday.

Also, all the current automatic RAID recovery tools rely on the RAID members having the same offset across all the physical disks. This works fine for hardware RAIDs, but can be a hinderance if you need a software RAID recovery. This requirement is not likely to go away in a near future because the computational power requirements to find offsets for array members exceed the available capabilities. Especially if we're talking something practical like 6x 2TB hard drives in the array.

Tuesday, 8 February 2011


Reading the article on NTCompatible as they test data recovery software and fail to recover data from TRIM-enabled SSD (which is pretty much the expected behavior), I see they're a little bit puzzled because some data would still remain even on a TRIM-enabled SSD.

Interestingly, some traces to specific data remained, and that's one oddity I don't quite understand.

The answer is actually pertty simple - these were NTFS resident files. On NTFS, when the file is deleted, its MFT record is marked "free, available for reuse", but never actually relinquished back to the free space. Because NTFS uses MFT entry numbers internally to address a parent-child relationships. Removing one entry would require an entire volume to be reunmbered, which is cost-prohibitve.

So, the data outside MFT is zero-filled immediately once TRIM command is issued. The MFT entires however remain unchanged. This explains they were able to get file names and correct file sizes, but the data was all zeros.

Now there is one special case called resident file. If a file is small enough so that the file name, attributes, and data all fit into 1024-byte MFT record, the data is stored within the MFT record. This saves little bit of disk space, and, more importantly, saves one additional seek to get the data on the rotational hard drive.

Since the MFT entires are not relinquished into free space, and for a resident file a file data is stored within its MFT entry, it is possibe to recover a resident file even on a TRIM-enabled SSD. However, this is of little practical applicability becase only files smaller than approximately 800 bytes can become resident.

Saturday, 5 February 2011

Thursday, 3 February 2011

Intel's new chipset

Intel reports there is a flaw in SATA controller on the SandyBridge chipset, causing functionality to degrade over time. I suppose functionality goes as in functionality to store data, actually. They say
  • the flaw only affects 3 Gbps ports (SATA II), while 6 Gbps (SATA III) ports are OK, but I'd wait for further confirmation.
  • the revised chipset will hit the market April, 2011.
So we got quite a number of data-loss time bombs somewhere, and the number still grows.

Wednesday, 2 February 2011

Even in 2011..

some people are still concerned about DoubleSpace and Stacker.

Do you still remember using MS DOS in production?

Bonus item: MFM hard drive interface in the same screenshot.