5.1.1.3 Effective mechanisms to detect bit corruption or loss
The repository shall have effective mechanisms to detect bit corruption or loss.
This is necessary in order to ensure that AIPs and metadata are uncorrupted or any data losses are detected and fall within the tolerances established by repository policy (see 3.3.5).
Documents that specify bit error detection and correction mechanisms used; risk analysis; error reports; threat analysis; periodic analysis of the integrity of repository holdings.
The objective is a comprehensive treatment of the sources of data loss and their real-world complexity. Any data or metadata that is (temporarily) lost should be recoverable from backups. Routine systematic failures must not be allowed to accumulate and cause data loss beyond the tolerances established by the repository policies. Mechanisms such as checksums (MD5 signatures) or digital signatures should be recognized for their effectiveness in detecting bit loss and incorporated into the overall approach of the repository for validating integrity.
The APTrust system retrieves files for fixity checks on a 90-day basis to ensure data is accurate and complete. At the time of deposit/ingest of new bags our system generates a MD5, SHA1, SHA256, and SHA512 hash of each individual file in the bag, compares it with the bag manifest, and stores it in the metadata if it matches the manifest. This ensures that the bag was received correctly and the files are in the exact same state as they were prior to submission and transfer. If a calculated hash doesn’t match the one in the supplied bag manifest, the bag is not ingested and marked as failed. Depositors will need to submit a corrected bag per the Bagging Specifications.
Alerts are shown to depositors in the registry UI should a file fail a fixity check – fixity uses SHA256, and they are also emailed about the incident.
Restoring information after an alert can be found at the APTrust User Guide Restoration.