Two different definitions of fragmentation
There are two distinct definitions of file fragmentation on a filesystem.
1. The file is fragmented if reading the entire file requires a head seek after the first data block has been read.
This definition mostly concerns read performance on rotational drives. Head seek is slow, so it is something you want to avoid.
If the file is sparse, i.e. has large area full of zeros in it, then the filesystem will not store zeros. Instead, only non-zero start and end of file are stored. From the performance standpoint, that's fine. Filesystem driver will read the first part of the data, then will generate the amount of zeros as required, and continue reading the last part of data without ever making a head seek.
If the file contains some metadata (i.e. ReFS page table) embedded in content, that is also fine from performance standpoint. The ReFS driver will read the file data, and at some point the page table will be needed. It conveniently happens that a page table occupies the next cluster after file data, and the driver will read the page table, analyze it, and continue reading file data.
2. The file is fragmented if the extent of data starting with the first data block and the same size as the file is does not match the file content when the extent of data is directly read from disk.
Now this definition is about recoverability. From data recovery standpoint, the file is not fragmented if it can be recovered without having any metadata, by finding its header and reading an extent of appropriate size from the disk, starting with the header.
If the file is sparse, you cannot read it by just taking the required amount of disk data starting from the file header. You must insert some unspecified amount of zeros between start and end of the file.
Also if the metadata is embedded between file content, as is often the case with ReFS, the recovered file will contain ReFS page tables in it, rendering the file useless.
1. The file is fragmented if reading the entire file requires a head seek after the first data block has been read.
This definition mostly concerns read performance on rotational drives. Head seek is slow, so it is something you want to avoid.
If the file is sparse, i.e. has large area full of zeros in it, then the filesystem will not store zeros. Instead, only non-zero start and end of file are stored. From the performance standpoint, that's fine. Filesystem driver will read the first part of the data, then will generate the amount of zeros as required, and continue reading the last part of data without ever making a head seek.
If the file contains some metadata (i.e. ReFS page table) embedded in content, that is also fine from performance standpoint. The ReFS driver will read the file data, and at some point the page table will be needed. It conveniently happens that a page table occupies the next cluster after file data, and the driver will read the page table, analyze it, and continue reading file data.
2. The file is fragmented if the extent of data starting with the first data block and the same size as the file is does not match the file content when the extent of data is directly read from disk.
Now this definition is about recoverability. From data recovery standpoint, the file is not fragmented if it can be recovered without having any metadata, by finding its header and reading an extent of appropriate size from the disk, starting with the header.
If the file is sparse, you cannot read it by just taking the required amount of disk data starting from the file header. You must insert some unspecified amount of zeros between start and end of the file.
Also if the metadata is embedded between file content, as is often the case with ReFS, the recovered file will contain ReFS page tables in it, rendering the file useless.
Comments
Post a Comment