4.1.5 Verification of SIP Copy URL

The repository shall have an ingest process which verifies each SIP for completeness and correctness.

This is necessary in order to detect and correct errors in the SIP when created and potential transmission errors between the depositor and the repository.

Appropriate Preservation Policy and Preservation Implementation Plan documents and system log files from system(s) performing ingest procedure(s); logs or registers of files received during the transfer and ingest process; documentation of standard operating procedures, detailed procedures, and/or workflows; format registries; definitions of completeness and correctness.

Information collected during the ingest process must be compared with information from some other source to verify the correctness of the data transfer and ingest process. Other sources will include technical and descriptive metadata obtained prior to ingest and may also include expectations set by the depositor, the object producer, a format registry, or the will depend on what it knows about the SIP and what tools are available for verifying correctness. It can mean simply checking that file formats are what they claim to be (TIFF files are valid TIFF format, for instance), or can imply checking the content. This might involve human checking in some cases, such as confirming that the description of a picture matches the image. This allows the repository to demonstrate that its preserved objects have completely and correctly copied what it intended to copy from the SIPs. It also allows the repository to document reasons for other SIP-related actions such as rejecting the transfer, suspending processing until the missing information is received, or simply reporting the errors. Similarly, the definition of ‘completeness’ should be appropriate to a repository’s activities. If an inventory of files was provided by a producer as part of pre-ingest negotiations, one would expect checks to be carried out against that inventory. Whatever checks are carried out must be consistent with the repository’s own documented definition and understanding of completeness and correctness. One thing that a repository might want to do is check for network drop out or other corruption during the transmission process.

The ingest process includes validation of the incoming content and documenting verification procedures and result. APTrust makes sure all files are present and match the checksums in the manifests. (The BagIt specification allows the depositor include some custom tag files without mentioning them in the tag manifests. These files are allowed, but their checksums will not be validated.).

The full APTrust BagIt specification is described in Bagging specifications. During the ingest process, APTrust validates that bags meet the requirements described in this specification. A bag must be valid and complete to be ingested. APTrust rejects invalid and incomplete bags by marking the bag’s Ingest Work Item as failed, and noting the exact problem that caused the validation to fail in the Work Item. Depositors can view the Work Item in the Pharos Web UI, the can retrieve it programmatically through the member API, or they can use the apt_check_ingest Partner Tool.

Step 2 of the Ingest Timeline documentation describes when validation occurs in the ingest process. File format verification also occurs upon ingest using PRONOM.

The Item Resource endpoint of the member API includes the field containing a flag indicating whether APTrust should try again to process this item. This flag will be set to false if processing failed because of a fatal error, such as a bag failing validation.

An administrative interface, Pharos, provides details about the descriptive metadata provided at submission, technical metadata related to preserved files and data related to audited events involving digital files managed in the repository. Depositors can programmatically retrieve the status of work items using the Items endpoint of the member API.

Preservation Services Policy, approved by the Board in April 2018: https://doi.org/10.18130/V3N58CK61