Monday, 31 May 2010

Types of data recovery

There are two distinct types of data recovery, namely “in-place” and “read-only” recovery.

The in-place recovery is the attempt to fix the errors and bring the filesystem to the consistent state. This is done by changing the damaged filesystem itself.
The read-only recovery, as the name implies, does not change the damaged filesystem. Instead, the data is extracted and copied to the separate dedicated storage.

The prevalence of each type of data recovery has been changing over time.

In the days of DOS, Windows 3.11, and then Windows 95, in-place repair prevailed. Actually, it was the only option available before Ontrack released their “Tiramisu Data Recovery” circa 1999. So, you had Norton Disk Doctor (which was quite good in fixing errors), Norton Unerase, and Norton Unformat and that was about it. Norton Utilities worked with FAT filesystem under DOS or Windows 95. The prevalence of in-place repair is understandable if you consider the simplicity of the filesystem and the cost of the storage in those days. The most widespread filesystem was FAT, which is rather simple and well-documented. On the other hand, a spare hard drive cost was prohibitively high.

The release of Windows 2000 changed the things significantly. NTFS filesystem was quickly established as the standard, but it lacked documentation badly (in fact, it is still not fully understood by developers outside Microsoft). As far as in-place repair is concerned, you were left with the Microsoft CHKDSK. Since NTFS is not documented, it is not possible to fix it in-place because you do not know what the consistent state of the filesystem should be. Even a minor deviation from the standard which is unknown to the developers causes the NTFS driver to reject the volume or produce otherwise bizarre behavior. However, the storage cost dropped, making it easier to find a high capacity spare storage. So the simplest route of just extracting what is really needed – the file content, and stop worrying about filesystem consistency became the most cost-effective approach. Nowadays, the read-only data recovery software dominates the do-it-yourself data recovery market and the in-place repair is left to servicemen in the recovery labs. In-place repair is only used in some specific cases where high capacity disk arrays are involved.

Saturday, 22 May 2010

Data recovery vs. TRIM

TRIM wins.

The TRIM command available on modern SSD reduces the chances to successfully undelete a file. TRIM violates the most significant principle of the data recovery that “the data is not overwritten until the disk space is actually required to store another piece of data”.

Writing on a SSD is slower because before writing something to the block, it's needed to erase this block, and the erase operation is relatively slow. This is responsible for the performance degradation effect of the SSD when the performance starts to degrade as the device is filled to capacity, because there are no more blank blocks on the SSD.

To compensate for this performance degradation, a hardware command TRIM was implemented to erase the specified blocks in advance. TRIM is supported by most modern high capacity SSDs. TRIM is commanded by the OS (supported starting with Windows 7). When Windows 7 is in the idle state, it commands TRIM to erase those blocks which are not in use any longer.

TRIM violates the most significant assumption made in data recovery that “data is not overwritten until the disk space is actually required to store another piece of data”. Thus, it is no longer enough "not to write anything to the disk". Even if Windows just sits idle long enough, it wipes out the content of the files in the background. When you try to undelete a file, the content is all zeros.

If you delete a file, it is likely that the TRIM command will be issued soon and destroys the data completely. In case of catastrophic damage, when the entire disk is unreadable, becomes raw file system, or if Windows fails to start, there are no side effects from TRIM because the operating system is either not there or does not command TRIM for a raw filesystem drive.

Thursday, 20 May 2010

What's the best cluster size

On Tom’s hardware forum mdk4ever asks what is the best cluster size? He's got a new 2 TB Western Digital hard drive and wants to set the optimal cluster size while formatting the drive.

The rule of thumb is “Always use the default cluster size”. The performance gain from changing the cluster size, if any, is not noticeable. One can speculate that the GUI option to change the cluster size if in itself a legacy from the days of floppy drives.
If you change the cluster size you may encounter unexpected side effects (did you know that NTFS compression off for cluster size greater than 4KB?).
Another consideration is that when you need to recover data it is much convenient to deal with the default cluster size values. Data recovery software can either calculate the default cluster size (for HFS) or look it up in the table of standard sizes (for FAT, NTFS). Non-default cluster size needs to be determined by the complex techniques.

So, we recommend that you always stick to the default cluster size to be able to recover data easily should the need arise.

Monday, 17 May 2010

RAID system that just works

On Tom's hardware, ratbat asks for a RAID that just works (no matter what).

His requirements are fairly simplistic:
  1. Minimum maintenance,
  2. Minimum downtime in case the drive fails,
  3. Simplest possible recovery,
  4. The setup has to boot from the RAID.

The only match for these criteria is RAID1 (mirror).

As you know, the RAID levels are (exotics excluded) RAID0, 1, 5, and 0+1. Each RAID level has its own strengths and weaknesses, but these are too complex to fit into this post. For more read on this, check RAID levels reference.

RAID0 is eliminated from the contest because it is not fault tolerant.

RAID5 fails the "Simplest possible recovery" requirement. To have a bootable RAID, the hardware controller is required. In case the controller dies, the recovery can get complicated, requiring RAID recovery software, which may be fairly tricky to operate properly.

So we are left with RAID1 (mirror) and RAID0+1.

RAID1 wins the contest because of its simplicity. If one of the drives dies, the mirror just continues to operate. if the controller dies, you can get any of the two identical drives, plug it into any compatible PC and have the data immediately available. You can even plug the drive into the mainboard (bypassing the dead RAID controller) and boot from it. The only thing you lose by doing so is redundancy.

Now we have to choose between hardware and software implementations of the RAID1.
On Windows, the RAID1 is only available in Server versions, so there's really not much choice. Either get a hardware controller, or use a controller integrated into the motherboard (if available).

Make sure to test your RAID once you have created it and placed the OS on it.
1. Unplug each drive in turn (make sure to power off before doing it) and check how the controller responds.
2. Unplug each drive in turn and attach it to the non-RAID mainboard port. Check if the system boots from that drive.
3. After each of these steps, the resynchronization of the array is needed, so it would take some time.

Once you are done with that, install the software as usual.