4.1.3 Recognition and parsing of SIPs Copy URL

The repository shall have adequate specifications enabling recognition and parsing of the SIPs.

This is necessary in order to be sure that the repository is able to extract information from the SIPs.

Packaging Information for the SIPs; Representation Information for the SIP Content Data, including documented file format specifications; published data standards; documentation of valid object construction.

The repository must be able to determine what the contents of a SIP are with regard to the technical construction of its components. For example, the repository needs to be able to recognize a TIFF file and confirm that it is not simply a file with a filename ending in ‘TIFF’. Another example, would be a website for which the repository would need to be able to recognize and test the validity of the variety of file types (e.g., HTML, images, audio, video, CSS, etc.) that are part of the website. This is necessary in order to confirm: 1) the SIP is what the repository expected; 2) the Content Information is correctly identified; and 3) the properties of the Content Information to be preserved have been appropriately selected.

Structure and content of SIPs are described in Definition of SIP. Process of file type recognition and validation can be found in Ingest Timeline and Technical Documentation. Definition of AIP outlines the process of transforming SIPs into AIPs.