4.2.7.2 Execute testing process of Content Information of AIPs Copy URL

The repository shall execute the testing process for each class of Content Information of the AIPs.

This is necessary in order to ensure that one of the primary tests of preservation, namely that the digital holdings are understandable by their Designated Community, can be met. (See 4.3 for additional requirements for understandability beyond ingest.)

Test procedures to be run against the digital holdings to ensure their understandability to the defined Designated Community; records of such tests being performed and evaluated; evidence of gathering or identifying Representation Information to fill any intelligibility gaps which have been found; retention of individuals with the discipline expertise.

This requirement is concerned with the understandability of the AIP. If the ingested material is not understandable, the repository needs to ingest or make available additional information to make sure that the AIPs are understandable to the Designated Community(ies). For example, if documents are written in a dying language and the Designated Community is no longer able to understand the language the documents are written in, the repository would need to provide additional documentation that would allow the Designated Community to understand the documents (e.g., translations of the documents in a language the Designated Community could understand or dictionaries that would allow the Designated Communities to translate the documents into a language its members understand).

In February, 2019, APTrust began automated monthly restoration spot tests. APTrust randomly restores one bag from each depositor institution and then emails administrators at that institution to tell them where to retrieve the restored bag. The depositor’s responsibility is to examine the bag to ensure it is complete and they can make sense of its payload. Through spring of 2019, depositors have been responding by email to confirm that their bags are complete. In future, we may provide a way to capture their responses in our repository’s metadata.

Restoration Considerations

APTrust receives materials in BagIt format, then unpacks the bags and stores their contents (both payload files and tag files) as individual files. The files in a bag usually constitute a single intellectual object, though some larger objects may be split across multiple bags. A database registry, separate from preservation storage, keeps track of which files are logically grouped into which intellectual objects.

During the restoration process, APTrust reassembles all of the files that constitute an intellectual object into one or more BagIt bags and moves the bag(s) into an S3 restoration bucket from which the depositor can download. Although both the SIP that the depositor originally submitted and the DIP that we return to them use BagIt format, the DIP that we restore is virtually guaranteed not to be identical to the SIP the depositor submitted, for the following reasons:

  • Depositors may have deleted individual files from the preserved intellectual object.
  • Depositors may have uploaded newer versions of the object, containing new or altered files.
  • Depositors typically submit bags with either md5 or sha256 manifests, while APTrust restores bags with both manifests.
  • Depositors generally omit tag manifests in the SIP, but APTrust includes them in the DIP.
  • APTrust includes a JSON file in the restored bag describing all PREMIS events for the intellectual object and each of its constituent files.

Because an object’s files may have been added, deleted, or replaced with new versions after initial ingest, the PREMIS events JSON file is essential for the depositor to understand why the restored bag differs from the originally submitted bag. This file contains a record of all events affecting the object since its ingest, including deletion of files, addition of files, and re-ingest of files, with a full history of all ingest checksums and fixity checks.

When verifying the contents of a restored bag, depositors typically must validate the bag and ensure it contains everything they expect it to contain (including tag files that may hold information useful for re-importing content into a local system).

As of spring 2019, all depositors have confirmed the validity of all restored bags.

Restoration Spot Test Process

We designed the spot test process to mimic the normal depositor-initiated restoration process as closely as possible. Depositors restore objects by clicking the Restore button in the Registry web UI or by sending an API request to Registry. Either of those actions creates an entry in the restoration work queue with the object’s identifier.

The spot test selects one object belonging to each APTrust depositor and creates an entry in the restoration queue with the object’s identifier. The criteria for choosing objects are:

  • The object must not have been restored in the past 180 days. (This is to ensure we don’t restore the same bag each month.)
  • The object must be 50 GB or less in size. (This may change, but currently, we don’t want to burden depositors with large downloads.)

Once the object identifier is in the restoration queue, the process is the same regardless of how it got there.

Restoration Process

For both depositor-initiated and system-initiated restorations, APTrust does the following:

  • Gets a list of all files that constitute the object from Registry (the APTrust metadata registry). This list includes both payload and tag files.
  • Copies all files from preservation storage to a local staging area.
  • Ensures all files present, and that all checksums match what’s in the registry.
  • Bags all of the payload and tag files, creating md5 and sha256 manifests.
  • Adds to the bag a JSON file containing all PREMIS events related to the object and all of its files. (This is added as a tag file, not a payload file.)
  • Creates md5 and sha256 tag manifests.
  • Validates the bag.
  • Copies the bag to the depositor’s S3 receiving bucket.
  • Sends an email to the depositor saying the restored bag is available for download from the receiving bucket.

Post-Restoration

  • APTrust automatically deletes the restored bag from the depositor’s restoration bucket after 14 days to avoid incurring unnecessary costs. If the depositor did not retrieve it in the 14-day window, they have to initiate a new restoration.
  • For spot restoration tests, APTrust asks depositors to respond via email to say whether they received what they expected and whether they were able to make sense of the restored content. APTrust may create a more formal process in the future so we can capture depositor responses in our registry.

This page is linked to on the Mandatory Responsibilities page.

This page is also linked from the DPC blog post: https://www.dpconline.org/blog/fire-drills