Wednesday, 30 June 2010

Fake flash drive

This is the most (or one of the most) well-known fake flash drive images. Looking at it once again, I suddenly spotted that it is actually not a pen drive. It is, or rather it is supposed to be, a DLink DWL-G122 USB Wireless Network device.

Following is the more realistic, and more widespread variant of the fake pen drive.
This one is working, but just mislabelled. Typically, they get the 2GB pen drive, stick a 32GB label onto it, and then resell at higher prices. The following image is a lesser size older model pen drive forced into the smaller case of the newer model:

Note that the PCB is cut off a little bit at the top to make it fit and then glued to the case with a blob of a silicone sealant.

The flash drive relabelled from 2GB to 32GB still only has 2GB of actual capacity. So if there is more data written to the flash drive, the excess data has to go somewhere. There are two possibilities: the excess data either goes nowhere, it is just discarded and never stored at all, or the data is written again from the beginning of the device, thus overwriting what was there previously. Either way, it is not possible for a data recovery software to retrieve the data in excess of 2GB capacity.

Monday, 28 June 2010


If the drive has become the RAW file system, you should first check the drive size to see if it is exactly 128 or 137 GB. The raw filesystem issue may be caused by the clipped hard drive capacity.

If you see 137 or 128 GB, you need to check whether the BigLBA still works before you rush to recover data. This is especially true for the drives (including external USB drives) which were brought to an old computer.

The BigLBA is a registry parameter which determines whether to use 48bit block addressing or not. If it is disabled then the maximum accessible disk size equals to 128 or 137 GB (depending on what units are used for the drive size, binary or decimal gigabytes).

Sometimes it turns out that the BigLBA is off despite the fact that theoretically it should be enabled in any modern installation (starting with Windows XP SP2).

Refer to the instructions for troubleshooting disk capacity issues for more details on what you need to check and how to enable the BigLBA parameter. If in fact BigLBA was an issue, then fixing it typically restores the drive with RAW filesystem to proper functioing.

Thursday, 24 June 2010

Secure erase a.k.a. data wiping

If you need to delete a file irreversibly, it is not enough to just delete it and then empty the Recycle Bin. Data recovery software is quite capable of restoring data that was deleted in such a way.

In earlier days, when the FAT filesystem was widely used, it was sufficient to write some garbage data to the file. To overwrite the file data completely, the garbage data size should be no less than the original file size. This worked because FAT is rather simple filesystem.

With the filesystem complexity increasing, a number of filesystem features which should be taken into account is increased as well. Nowadays, it is no longer enough just to write other content to the file to delete it irreversibly.

For example, if a file is stored on the NTFS filesystem in compressed form, then depending on the compressibility of the data in the original file (to be secure erased) and a new content, most likely a new set of clusters would be allocated for the new file data. Therefore, the original file data would not be overwritten at all.

It is particularly useless to write zeros - if the NTFS compression is turned on, zeros would not be written at all (so called sparse file); and therefore original data would not be overwritten.

The next obvious step is to delete the file and write some incompressible garbage to all the free space. Sounds good, but unfortunately does not work because the original file may be “resident” and so its content would not be overwritten. Thus, you should not just write the free space, but also overwrite all the free MFT records.

In short, the secure erase is complicated and difficult to do properly. If you ever need it, use SDelete. SDelete is free, created by Mark Russinovich, and it was tested to work many times. Additionally, they have a good explanation of how does it work and what was taken into consideration.

Tuesday, 22 June 2010

Should I make a disk image file?

A disk image file - an exact copy of all the disk content - is the first thing that a data recovery lab makes. When recovering data at home it is often not reasonable to create a disk image file.

Having the disk image file stored aside makes recovery more safe. If the fix gets wrong, the image file provides the backup to try again. If the disk is physically damaged, a disk image file allows you to perform the recovery independent of the mechanical conditions. Almost any data recovery software can create a disk image file and in most cases to load a disk image file that was created by another tool.

The significant disadvantage is that creating a disk image file takes a long time and requires a lot of free disk space.

When using read only data recovery software given that the drive is physically OK, the risk of further data damage is negligible. Read only recovery itself requires free space at least equal to the size of the data being recovered; if a disk image file is used, you should have free space for this image as well - and may even need to buy a new large hard drive.

Thus, it might be reasonable to attempt a recovery without creating a disk image first.

Thursday, 17 June 2010

Data recovery and different USB protocols

When recovering data from a USB external hard drive, you should keep an eye on a data read speed. If the speed is less than 2 MB (megabyte) per second, it would be better to abort the recovery and figure out in what mode the devices are working. The speed is of less concern with smaller devices, that is if you need to recover pen drive, you just sit, watch, and wait it out.

There are two different versions of the USB protocol, USB 1.1 and USB 2.0. From the user point of view, these protocols differ from each other only by a data transfer speed. USB 1.1 transfers maximum ~ 1.5 MB/sec, while USB 2.0 can achieve ~ 50 MB/sec. If several USB devices involved in a data transfer use different USB protocols, the lowest data transfer speed is used.

Although USB 2.0 was developed in 2000, USB 1.1-only hard drive enclosures and card readers are still produced. If you are going to buy a USB enclosure for external hard drive recovery, check that it supports the USB 2.0 protocol. Data recovery from the drive connected via USB 1.1 is too slow to be practically used, because it would take a couple of days or sometimes even weeks to perform a data recovery.

If you use a USB hub you should check it as well. Generally, whenever possible, try not to use any intermediate elements. Some hubs can switch in USB 1.1 mode when many devices are connected via the hub.

On some motherboards USB ports of different versions are mixed. If possible, check the motherboard manual to find out what ports are USB 2.0 and connect to them. If you do not have the manual, simply try a few different ports. Most often there is a difference between USB ports in the front of the case and those located on the rear.

Tuesday, 15 June 2010

Folder tree structure vs. file data

Is it possible for a data recovery software to get a correct file and folder structure but bad file content or vice versa? Why does it happen?

The answer depends on the filesystem type being recovered.

On FAT, the location of the parent folder is determined depending on the same formulae which are used for finding data. If the parameters in these formulae are invalid, neither data nor a folder structure can be restored. Hence, typically if you have a folder tree recovered properly or close to that, the files should be good as well.

On NTFS, there are two independent sets of parameters, one set controlling the data location and the other set covering the parent-child relationships in a folder tree. So on NTFS, it is theoretically possible (and sometimes happens) to have one good set of the parameters but the other one wrong. So, if you unformat an NTFS drive, a good folder tree full of damaged files is perfectly possible.

On HFS and HFS+, the parent-child relationship is described by designated records in the catalog file. So it is possible to recover a folder tree even if both child and parent folder records are damaged. HFS utilizes three different datasets to store information about the file data, file names, and content of the large files. Any of these three may be damaged separately, leading to all sorts of combinations being possible.

Rinse, repeat

On Tom's hardware, there is a question My 120 GB portable drive has some corrupted files that I can't delete ... How do I wipe the drive or delete the corrupted data?

The answer goes obvious, just copy all the good data you need (if any), and format the drive. This definitely resolves any software corruption which may be present.

The less obvious option would be to run CHKDSK /F, then reset permissions on the folders, then delete the folders and files. Format would just be more simple, faster, and generally more definite.

However, another poster chimes in saying to use XP setup CD and delete [the partition] completely,then create a drive this 2-3 times and ur hdd is all clean. This is
  1. not precisely true, because deleting the partition and then re-creating it does not wipe out the data (so the drive does not become all clean, it is still subject to unformat), and
  2. not needed, because only the first time matters - on the subsequent delete-create cycles the system does not delete more data than already deleted during the first cycle.

Sunday, 13 June 2010

Thou shalt not overclock

The computer components have the specifications they are designed to meet. The specifications are there for reason. Most particularily, the reason of stability. If the CPU is rated for, say, 2.0 GHz frequency, this means it would run flawlessly at 2.0 GHz. If you find a way to force it to 3.0 GHz, all bets are off.

The art of running the components faster than they are rated for is called "overclocking". Some pretty amazing results were achieved, especially if one throws in some nonstandard technology, along the lines of liquid nitrogen cooling. Unfortunately, there is one thing all these achievements lack - the stability.

Overclocked system tends to bite its owner one day. Even if it runs fine for a while, the overclocked system tends to degrade faster, and may soon degrade to the point where it fails to perform.

Take this long story for example. It involves a long list of suspected components: PSU, RAM, dying CPU, you name it. Lo and behold, simple revert to the rated speeds fixes the problem. The owner is lucky that the filesystem did not crash during the troubleshooting. If you boot up with the CPU or memory not functioning properly, filesystem crashes (either partial or leading to the raw file system state) are more than likely. In this particular case, looks like CHKDSK took proper care of the filesystem. However, does not look like the end of story just yet - i'm going to give it a few days and attempt a small overclock again. Yep, just a small one.

Wednesday, 9 June 2010

Partitioning for speed

I want to partition the RAID 0 array in order to create a dedicated space for Virtual Memory and Scratch disks for Adobe Photoshop and Premiere. The array is 4x 300GB WD VelociRaptor RAID0.

This is not going to work as intended. To get a better overall performance, he'd be better off splitting the array to two 2x drives each, or maybe even down to standalone drives. Scratch and swap files produce better performance when placed on separate hard drives in such a way that no "spindle" is servicing more than one data stream. In the layout with one 4-disk array, four "spindles" would be serving three or four data streams (source data, swap, scratch, and output data), which is far from ideal because the number of seeks would be too high. Adding partitions to the mix would only ensure there is a certain minimum distance for the disk heads to travel across the partition boundaries. This would actually decrease the performance.

In RAID planning, speed estimations, such as provided by the RAID calculator may be handy but only apply to the simplest case of a single data stream. Also, keep in mind that a RAID setup does not improve access time (command-to-start-read).

Tuesday, 8 June 2010

When calling in, or posting on a forum to get help with a RAID recovery, one should have the following info readily available.

  1. What is the array type, RAID0, RAID0+1, RAID5, whatever.
  2. How many drives were in the array originally. Might seem surprising but every once in a while there is a difficulty establishing the number of drives with an appropriate degree of certainity.
  3. How many drives are available now.
  4. Are there any known drives with a mechanical hard drive damage? If yes, how many drives are affected?
  5. What device the array comes from? Is it a NAS (and what model), brand server (what brand, model, and configuration options), or maybe a homebuilt machine (controller model or RAID software).

Although these questions may appear very simple, it still takes time to gather that information. When you got a RAID incident, collect this info as soon as practical.

Monday, 7 June 2010

Redundancy in various filesystems

This is a quick summary of redundant elements purposedly maintained in filesystems.

FAT16 and FAT32 filesystems typically have two copies of file allocation table (FAT). It is possible because the table is relatively small and the resulting overhead is not significant. Despite this some devices (e.g. mobile phone Sony Ericsson W580i) do not update the second copy of the table.

As for NTFS filesystem, the full copy of Master File Table (MFT) doesn't exist because it would be too large and too expensive to update. However, NTFS stores a copy of the beginning of the MFT. This copy has variable size depending on a cluster size. Only the records describing the system files are copied. There is no copy of the user file records.

ExFAT filesystem, which one might come across during a pen drive recovery or an SD card recovery, only stores a single copy of the file allocation table, most likely for performance reasons.

HFS and HFS+ do not have a copy of the Catalog File, although it might be theoretically possible, because the copy size wouldn't be too large. However, the designers opted not to do it.

RAID is not a substitute for a proper backup.

RAID reliability is provided by redundancy. In theory, the probability of a simultaneous two-disk failure is the square of probability of a single disk failure, but that formula only works given that the failures are independent.

Actually, the drive failures are not independent because there are many factors in common for all drives in the array.

These factors include:

  • Temperature. If a drive is damaged as a result of the overheating, most likely that rest of the drives overheat as well.
  • Power. If a power supply burns out (or lightning strikes the power line) all the drives would fail immediately.
  • Logical connection. If you have RAID 1 and you have accidentally deleted some files, both copies would be deleted simultaneously.
  • Controller. If a RAID controller burns out, a disk array would go offline completely. In a lucky case, it is possible to attach the drives to a similar controller and it would recognize the array, but it is not always that smooth. Sometimes, a RAID recovery software might be needed.
  • Cables. If several drives are connected to the same cable (as it was earlier with IDE and SCSI) and the cable snaps, all the drives connected with this cable would be gone.

There are multiple reasons why the redundant array may fail instantly, and the proper backup is still required to provide a secure data storage.

Friday, 4 June 2010

If you start to format the hard drive and then find out that it is the wrong drive, press "Cancel" immediately, and then go for the reset button (if your computer has it). If you do not have a reset button, keep in mind that a power button has a five second delay before the shutdown occurs. Wall socket plug may be a better option.

There is a significant technical difference between Quick and Complete format, but we'd rather discuss the timings for now.

If you are doing a quick format, most likely pressing "Cancel" and reset would be of no use because you do not have time for it, but you should try anyway.

In case of a complete format (with Windows Vista or Windows 7, which actually overwrite the data during the format)

  • on the FAT filesystem the file allocation table is lost very quickly and then folders are progressively lost. Loss of the allocation table makes subsequent unformat attempts difficult and causes the loss of all the fragmented files. Further loss of folder records makes the recovery next to impossible even though the file content may still be there.

  • on the NTFS filesystem, the MFT (Master File Table) is typically located starting at 3GB offset and takes up about 100MB. The typical disk write speed is about 30-60 MB/sec, 3GB are thus filled in about one minute, after which the MFT is lost, making the recovery next to impossible. Modern SSDs with write speeds about 300MB/sec cut the available time to like 10 seconds.

All in all, the conclusion is that you better double check what drive you are going to format.