Monday, 24 September 2012

Helium hard drives


Hitachi (HGST) is going to create hard disks filled with helium. Theoretically, it improves a thermal conduction and also allows getting more platters into the same form-factor. Obviously, such disk will be more difficult to repair, at least because a distance between platters decreases. It is less obvious that a vent hole (aimed to equalize a pressure inside the typical disks filled with air when the temperature changes) must be closed.
Maybe incipient pressure changes will be compensated by the lesser viscosity of helium; however, it the more severe temperature limits might be introduced. Additionally, it is known that it is more difficult to hermetically seal a volume of helium rather than that of air, because helium better percolates through gaskets. Thus, we cannot conjecture about the lifetime of helium disks.

Monday, 17 September 2012

Seek errors in RAID recovery


Theoretically, data recovery tools are read-only that means that their usage cannot cause any damage. However, in practice, when recovering data you may observe the effect as if the hard disks are being destroyed mechanically.

For example, we took four disks from the NAS, connected them to a PC and launched the RAID recovery tool. Immediately S.M.A.R.T. monitoring software (Cropel) raised alarm because too many seek errors arose. Indeed, these seek errors were caused by the disk vibration that was provoked by the RAID recovery command to move disk heads on all the disks simultaneously. To tell the truth, a NAS device does the same when reading data from an array. However, a regular NAS device is equipped with more vibration-resistant drive mounts and fastenings.

So, when transferring disks from the device managing RAID array to a regular PC, you may get alerts from S.M.A.R.T. monitoring software telling that the values of the Seek Error Rate attribute have changed significantly. Don't worry about the alerts; after a while values of this S.M.A.R.T. attribute will settle at the new level and alarm will stop.

Monday, 10 September 2012


In this article it is said that NetGear has sold 108,876 devices in 2011 and had a 16.4 % share of the market.
Assuming that a NAS device contains three disks on the average and disk attrition rate is 4% per year (that corresponds to the Google data in the article about disk reliability) we get that about 200 disks fail per day. Note that this is only among NAS devices sold in 2011.
Sales of hard drives are at least 600 million per year. On the basis of the same attrition rate we get
600,000,000 * 0.04 / 365 / 24 ~ 2700 disk failures per hour

Saturday, 8 September 2012

TRIM and Storage Spaces


It is known that filesystems that you will probably use on Storage Spaces, namely NTFS and ReFS, support TRIM. We have no information about FAT but it is unlikely that someone will use FAT in conjunction with Storage Spaces.
So, if you delete a file on a volume located on a virtual Storage Spaces disk, the filesystem driver sends a TRIM command to the Storage Spaces driver. The latter uses the same mechanism as TRIM uses in order to free slabs, i.e. to return slabs that are marked as unused by the filesystem to the pool of free slabs. Returning slabs to the pool of free slabs will take place if:
  • a file being deleted fully occupies one or several slabs;
  • a file being deleted is a single file in the slab meaning that after its deletion this slab would contain no data at all.
Once a slab is withdrawn back to the pool of free slabs, it is impossible to easily identify the slab - what volume it belonged to, what location it had in the volume. Therefore, regular data recovery from the filesystem side will not work because of some slabs just are absent. In this case you need to do a full-size Storage Spaces recovery aiming to match virtual and physical slabs.

Note that all the above applies to the virtual disks which use thin provisioning, rather than to the fixed ones. When creating a fixed virtual disk, the Storage Spaces driver assigns physical slabs to the virtual disk and then never takes them away.

Wednesday, 5 September 2012

Data recovery time in different filesystems


In FAT filesystem, structures describing directories are spread over the area of data and therefore mixed with the contents of files. If a directory is deleted, easily accessible information about its location is no longer available. In this case it is necessary to do a full scan of the data area to be sure that all the directories are found. Thus, data recovery time on FAT is proportional to the size of the disk and is mainly determined by the time needed to read the entire disk.

NTFS stores metadata densely at more or less known location; when recovering data from an NTFS volume, data recovery software can just look at this small area rather than scan the entire disk. Data recovery time on NTFS is mainly limited by computing resources required to recreate the file table. The total time doesn't depend on the disk capacity but it depends on the number of files actually stored on the disk.

ReFS again spreads its metadata over the disk, mixing it with the content of files to save disk head movement during read/write operations. From the recovery point of view, it means that you need to scan the entire disk even if you just want to find a single deleted file.

From the stability point of view, the filesystem with the spread storage wins over the filesystem with the localized storage of metadata. However, it doesn't apply to FAT filesystem since file allocation tables are stored compactly in the beginning of the volume.

Tuesday, 4 September 2012

Windows filesystems and TRIM


On NTFS, the process of file deletion is not limited by only the work of the file system driver such as zero filling pointers in MFT. Physical or virtual store takes part in this process as well. The filesystem driver sends a TRIM command to the store driver informing a data storage device that the blocks containing file data are not used any longer and therefore can be erased.
Depending on the type of underlying device, TRIM can lead to different results:
  • for a volume located on a regular hard drive, TRIM has no effect;
  • for a volume created in Storage Spaces, TRIM leads to unexpected consequences depending on how the files are located in relation to 256 MB slabs of Storage Spaces;
  • for an SSD, exactly for which the TRIM command was introduced, the blocks that are no longer in use are erased immediately.
However, it should be noted that NTFS never frees blocks with metadata and so the NTFS filesystem driver never sends the TRIM command to erase these blocks. This NTFS peculiarity generates one interesting consequence - since NTFS stores content of small files (resident files) along with metadata, these small files can be recovered even if the TRIM command is used.
As for ReFS filesystem, it doesn't use resident files and therefore nothing can be recovered if the TRIM command was applied on a ReFS volume.

Monday, 3 September 2012

Symmetrical vs. asymmetrical disk arrays


There are symmetrical (for example RAID5) and asymmetrical (like RAID4) RAID arrays.
1
2
p
3
p
4
p
5
6
RAID 5
 
1
2
p
3
4
p
5
6
p
RAID 4
As load increases, performance of an asymmetrical array is limited in some particular point. For example, in RAID 4, during a write operation, a disk with parity will be saturated first. In case of a symmetrical RAID 5 array all the member disks are loaded in the same way; therefore there is no specific disk that limits the performance.

From this two consequences follow:

1.   In a symmetrical RAID 5 array write performance can be increased by adding the disks. Write performance of asymmetrical RAID 4 doesn't change as the number of drives increases because parity data is still written to the single disk.

2.   If you add one speedy rotational disk or SSD to a RAID 5 array, you will not get noticeable speed-up. In case of RAID 4, the replacement of the parity disk with an SSD increases performance significantly because "bottleneck" related to parity update is removed.

All this applies in a similar manner to symmetrical RAID 6 and RAID 1E, and asymmetrical RAID3 and RAID-DP.