3.3.5 Information integrity measurements Copy URL

The repository shall define, collect, track, and appropriately provide its information integrity measurements.

This is necessary in order to provide documentation that it has developed or adapted appropriate measures for ensuring the integrity of its holding.

Written definition or specification of the repository’s integrity measures (for example, computed checksum or hash value); documentation of the procedures and mechanisms for monitoring integrity measurements and for responding to results of integrity measurements that indicate digital content is at risk; an audit process for collecting, tracking, and presenting integrity measurements; Preservation Policy and workflow documentation.

The mechanisms to measure integrity will evolve as technology evolves. The repository may provide documentation that it has developed or adapted appropriate measures for ensuring the integrity of its holdings. If protocols, rules and mechanisms are embedded in the repository software, there should be some way to demonstrate the implementation of integrity measures.

The Preservation Services Policy dictates that all preservation will be at bit-level and states “Fixity checks will be performed on every file” and at a frequency of every 90 days. Frequency of fixity and other preservation measurements are an ongoing discussion in the digital preservation field and will be updated according to current best practices.

Per the DART User Guide technical documentation, “APTrust will perform fixity checks on the S3 files every 90 days.” 

The process to verify and provide fixity checks can be found in the APTrust  Preservation & Storage documentation. 

“Files will be regularly copied out of S3 by a locally implemented service to confirm fixity check using both MD5 and SHA256 values and the outcomes reported in the administrative interface.  Objects failing fixity tests will be retried up to 5 times to ensure it is a true fixity error and not a copy error.” 

The Partner ( Member)  generates the MD5 checksum as part of their bagging process, and APTrust generates the SHA256 checksum.

Content is stored in two locations; AWS’ S3 and Glacier storage. As stated in the Preservation & Storage technical documentation, “For regular short term fixity checking we will rely on Glaciers internal SHA256 checksum reporting as a base level enhancement to the S3 fixity checks.  Additionally a manual service may be developed if needed to manually confirm fixity on a longer timescale (~24 months).  This longer manual fixity confirmation is throttled to use Glaciers slower free IO allotment to recover files form Glacier and confirm the fixity by performing manual MD5 and SHA256 checksums. and register the outcome with the Administrative Interface.”