Carving out digital objects is much like the work archivists have long done with physical collections—arranging items into folders and boxes so that they can be understood, managed, and retrieved. The difference is that in the digital realm—whether materials are digitized or born digital—those physical cues may be absent. Practitioners must decide at what level of granularity to create digital objects and how to model them for long-term stewardship.
One of the most common questions APTrust members ask is deceptively simple: What goes into a digital object? Should an entire collection be modeled as a single object, or should each series, folder, or item stand on its own?
These questions of size and scope are central to digital preservation practice. The choices you make will affect not only how materials are stored and managed today, but also how they will be understood and accessed decades into the future.
Start With Collections, But Don’t Stop There
Most collecting institutions already use hierarchies—collections, series, boxes, folders, items. For digitized materials, this often mirrors their physical arrangement. For born-digital materials, hierarchies might come from existing file systems, email folder structures, or project-level groupings.
Some options for defining preservation objects include:
- One object per collection. Simple and efficient for very small collections (e.g., two letters, a single dataset), but may be unwieldy for larger holdings.
- One object per series or grouping. A good compromise for both digitized and born-digital collections (e.g., one object per archival series, per set of research data, or per email account). Retrieval, however, requires knowing where an item sits within that grouping.
- One object per folder or box. Mirrors the physical arrangement of digitized material and can echo directory structures for born-digital. But be cautious: physical boxes/folders may be rearranged over time, and digital directory structures are not always stable. If identifiers are tied too tightly to those numbers or paths, confusion may follow when things change.
- One object per item. Useful for photographs, books, audiovisual recordings, or datasets that are best managed individually.
There is no single correct answer. Many organizations choose different levels of granularity for different collections, depending on context and use.
What Belongs Inside a Digital Object?
Once you’ve set the outer boundaries, think about the contents and structure of each object. Digital objects may include:
- Preservation files: full-quality scans, master files, or untouched originals
- Access files: smaller, optimized versions for everyday use
- Supplemental files: OCR text, captions, transcriptions, or codebooks for datasets, reports
- Metadata: descriptive, administrative, or structural
Consistency matters. Standard file naming conventions and predictable folder structures help future users navigate both digitized and born-digital content.
Here’s where standards like the OAIS model and Trusted Digital Repository (ISO 16363) come in. They remind us that every digital object needs more than just its “content”:
- Preservation Description Information (PDI) ensures authenticity and trustworthiness over time by documenting fixity, provenance, context, and reference information.
- Representation Information ensures the object remains understandable by future users by including the technical and semantic details necessary to interpret it (e.g., file format specifications, software dependencies, or explanatory codebooks).
When deciding what belongs inside a digital object, don’t just ask “what files go here?” Ask instead: What PDI and Representation Information are necessary for this object to remain independently understandable to my Designated Community in 10, 50, or 100 years?
Here’s what APTrust member North Carolina State University includes in its digital objects:
For digitized archival materials, NC State usually creates a single digital object consisting of all assets created from a single resource, which can range from a single photograph to all the pages from an archival folder. For born-digital materials, a single digital object usually consists of the content of a discretely described archival object (e.g., a disk image of a DVD, a tarball of a set of files) and the reports created during processing (e.g, virus and privacy scans, file format characterization, file manifests).
Future Use and Access
Preservation objects are not just storage; they’re “messages in a bottle” to the future. When modeling them, ask:
- Will someone in 50 years be able to make sense of this package?
- If disaster strikes, can you restore collections efficiently from these objects?
- Do you need system-neutral preservation packages or packages tailored for quick re-import into your access platform?
These questions apply equally to digitized and born-digital material. For example, a digitized newspaper might be packaged at the issue level, whereas a born-digital dataset might best be preserved at the project level, ensuring that its files remain contextually linked.
Guidance for Practitioners
When deciding how big or small to make a digital object, consider:
- Collection context – How was it organized physically or digitally?
- Scale – How many files or items are you dealing with?
- Resources – Do you have the capacity to manage many small objects, or would fewer larger ones be easier?
- Future users – How will researchers or staff need to retrieve, interpret, or reuse the objects?
- Identifier stability – Will your naming and organization choices still make sense if the physical collection or digital directory structure changes?
The key is not to find the correct granularity, but to choose the level that balances sustainability, usability, and long-term meaning, whether the content was scanned from paper or created on a laptop last year.