Wednesday, 28 July 2010

Just kidding

Came across the term "USB drive housing" today. So here goes a little attempt in drawing :)

Tuesday, 27 July 2010

Uncommanded HPA activation

Yesterday, the request came through like

we got a Samsung 40GB hard drive which started to show 4GB capacity. The drive was partitioned 4GB+35GB, now the second partition is just gone, and the first is displayed as raw file system.

So naturally, I pulled up the hard drive capacity troubleshooting manual and started going through it.

  1. Old mainboard? No way, it was working just fine the day before with the very same mainboard.
  2. BigLBA? Not relevant because the limit is 128/137GB whilst the drive is 40GB.
  3. Jumper settings? Checked, removed/reset, no fix.
  4. Host Protected Area (HPA)? That was the only option remaining.
Atola's HDD capacity restore does not work on x64 Vista (and presumably Windows 7 as well) because of the driver issues. It took a while to move to a 32-bit XP installation, but it was well worth the hassle. It turned out that resetting HPA returned the drive to the normal condition immediately.

The question however remains what caused the HPA to activate in the first place. This is not an unusual occurrence, as we've seen it several times last year, and resetting capacity always seemed to resolve the problem, but the root cause was never identified.

Monday, 19 July 2010

Bad sectors, part III - Zero-fill

Drive zero-filling (sometimes errorneously called low-level format) in some cases can fix bad sectors.

Firstly all sectors with incorrect checksums will be overwritten with the correct checksums and therefore these sectors can be used again. Those sectors which can't be fixed by simple overwrite will be reallocated.

It is important to understand that zero-filling doesn't eliminate the reason why the bad sectors appear. For example if there is a problem with the power to the drive (i.e. loose contact) the drive will power down periodically and as a result soft bad sectors will appear.

There are software vendors who claim that their software can repair the drive surface, HDD Regenerator and SpinRite. In fact there is no general technique to view or change the list of defective or reallocated sectors, or perform a low-level format on a modern hard drive. These techniques differ from model to model and usually require hardware-assisted solutions such as PC3000. The best DIY choice is a diagnostic utility from a drive's vendor. Some of them can zero-fill the drive but one should understand that such zero-filling destroys all data irreversibly. No data recovery is possible after zero-filling.

Monday, 12 July 2010

Bad sectors, part II - reallocation

Since it is known in advance that it is impossible to create a perfect magnetic surface, a number of spare sectors are reserved on the drive.

When a surface defect appears, the sector with the defect is replaced with a good one from the pool of a reserved sectors. Obviously, there is no surface repair involved. Instead the special record is made in the address table, like "if the write/read request arrives for the sector 123, use the sector 456 instead". This results in a certain loss of performance because it is now required to move the head to the reserved sectors zone and back again instead of just reading a contiguous chunk of data. On top of that, the data which was stored in the bad sector is lost. Nevertheless, theoretically you can use the drive further as if there are no bad sectors at all.

This process is called “reallocation”. The S.M.A.R.T. attribute named “Reallocated Sectors Count” shows a number of the reallocated (replaced) sectors.

If the drive idles long enough, it can start a self-test, reading random sectors to make sure that they are not corrupted. The sectors with defects are queued and then subjected to the reallocation if needed. Another S.M.A.R.T. attribute – “Current Pending Sector Count” - is designated for monitoring of the queue status.

The first surface check is done during production of the drive, and the new drive (just from the factory) may already have several reallocated sectors. However, these "factory-certified" defects are not shown in the S.M.A.R.T. counters.

Sunday, 11 July 2010

Bad sectors, part I - soft bad sectors

There are two kinds of bad sectors - those that can be recovered by overwriting and those that can't.

It is almost impossible (and would be very expensive for practical usage anyway) to create a perfect magnetic surface without a single defect. Instead of trying to create a perfect surface, additional redundant data is written along with the user data. It is data then possible to recover short data sections which were read incorrectly based on the redundant data. This is called Error Correction Code (ECC). Nevertheless, the capability of ECC to correct the errors is limited. It is not possible to recover either too many bad bits or too long a continuous bad section.

If a power failure occurs when you are writing data to the drive, the write procedure may be interrupted approximately halfway. Thus, the first half of the sector contains the new data while the other half still has the old data. Error correction code is not capable of fixing such a error, and when attempting to read the sector, it will be declared bad.

In fact, such a sector is not mechanically bad, it just contains the data with a wrong checksum. To fix this, it is enough to write a new data to the sector, and the sector will then function properly.

The truly mechanical damage - the destruction or wear of magnetic surface - cannot be corrected in such a way.

Monday, 5 July 2010

The missing ingredient

I've been reviewing the vendor-supplied hard drive diagnostic tools, and found one thing that is missing - in all of them. Before each test (the S.M.A.R.T. selfscan, whatever) there should be a clear indication if the test is destructive or not.

Naturally, it is reasonable to assume that there would be a warning before the test if the test is destructive (like zero-filling the drive). However, the gut instinct does not allow most of us to rely on that assumption. Murphy's law reinforced by the past experience suggests that the developers may have forgotten to include the warning, and testing failed to spot that.

So, there is always some concern when starting a test. It would feel much better if there was a message clearly stating that the test is not destructive.