Showing posts from November, 2011

XFS recovery

We've been working for a couple of weeks now to implement an XFS recovery capability for our ReclaiMe data recovery software. The single most significant impression is that XFS is unnecessarily and exceedingly complex. Having... how many that would be, five? types of directories is actually OK, as long as these types utilize the same basic structures and design. Taking design commonalities into account, the number of distinct directory types is reduced to just two. Data storage comes in three forms (NTFS and ext4 are both OK having just two). The most interesting discovery of all to date was that each allocation group has two different sizes.

Data loss on JBODs

After a disasterous data loss coused by the raid system I have changed all to JBOD assuming that 1/8 data loss is more acceptable than a full disaster.

Wrong. If you have eight drives in JBOD, and one of them dies, there are several options:
On NTFS filesystem, if first drive dies, the entire array is lost. If any other drive dies, 7/8th of the data is lost.
On ext-whatever, you can salvage an unspecified amount of data because the superblocks are distributed more-or-less evenly across the volume. However, everything in disk groups which span across two disks is likely lost. So you can theoretically approach "1/8th of the data lost", but that involves using some data recovery software and is far from easy.

If you want to be sure that the loss is limited to 1/8th of data in eight-disk configuration, forget JBOD and create eight separate volumes instead.

Human vs. computer in RAID recovery.

Human vs. computer battle.

As far as I am aware there is no way to rebuild HFS+ RAID from file-system analysis. NTFS is simple because has good system counters to use, same as EXT. But HFS+ requires good knowledge of RAID distributions.

As you see, people rely on some property of the filesystem being recovered to produce a strong signal that we can use to determine the correct RAID configuration. NTFS provides plenty of those. FAT provides even more. With EXT, hddguy probably knows more than I do, because I'm not aware of any strong singal in EXT. By and large, humans perfer to find a small bit of data with high signal-to-noise ratio, and use it. Understandable because it is limits amount of effort involved.

The RAID recovery software, on the other hand, mostly works with weaker signals. Weaker signals have far worse SNR, but they are in plenty. For example, you can calculate entropy values for any data. Obvious computer strength is to quickly process large arrays of data, which the …