Lessons from the Frontlines: Navigating Data Loss and Recovery in Digital Preservation
Data loss is a challenge no organization ever wants to face, but preparation, collaboration, and strong partnerships can make all the difference. At the 2022 Fall CNI Member Meeting, North Carolina State University (NC State Libraries) and APTrust shared how we tackled a real-world data loss incident. The story highlights critical lessons learned and practical steps every institution can take to bolster its digital preservation efforts.
The Incident: A Wake-Up Call
In June 2021, NC State Libraries experienced an incident that many in the digital preservation community dread: an accidental staff action deleted 35TB of locally hosted special collections storage, including its only backup. As Jill Sexton described it, “That was a bad day.”
What caused this incident?
A post-mortem revealed a combination of contributing factors:
- Incomplete Documentation: Critical system dependencies and workflows were not thoroughly documented.
- Knowledge Silos: Staff expertise was concentrated among individuals, creating bottlenecks during the crisis.
- Inadequate Testing: Recovery processes were not practiced, leaving the team to rely on theoretical knowledge.
Thankfully, NC State had started ingesting their digital preservation files into APTrust just a year earlier. As Sexton noted, “It was all there.” Out of the 35TB lost, 19TB was recovered from local sources, but the remaining 16TB needed to be restored from APTrust.
The Recovery Process: A Collaborative Effort
The restoration from APTrust took six weeks, reflecting the challenges inherent in large-scale data recovery. APTrust’s microservices and methodology for restoration played a critical role. Bradley Daigle explained, “We constantly interact with the data and pull things down for fixity, so we had microservices in place to understand what could be withdrawn without incurring costs.” This meticulous approach allowed for efficient and cost-effective data restoration.
Despite the timeline, the process unfolded smoothly thanks to strong communication and teamwork between NC State Libraries and APTrust. Here’s what went well:
- Data Integrity: All 16TB of missing data were restored successfully.
- Cost Management: APTrust’s transparent fee structure kept costs manageable, avoiding financial strain during a crisis.
- Teamwork: Collaboration across departments ensured alignment in goals and execution.
- Effective Communication: Clear updates and coordination minimized confusion and helped build trust throughout recovery.
- Preservation management system: NC State had already developed an application for record-keeping and integration with APTrust infrastructure, saving time in identifying what was lost.
Key Takeaways: Preparing for the Inevitable
This experience underscored that even the most prepared organizations are vulnerable to data loss. The lessons NC State Libraries learned have informed changes that make their digital preservation strategy more robust:
- Accurate Documentation: Maintain thorough and centralized records to clarify dependencies and workflows.
- Practice Recovery Scenarios: Sexton stressed, “It’s really important for you to test, practice, and define roles before you need to do it.”
- Strengthen Change Management: Mandatory processes with predefined templates ensure oversight and reduce risk.
- Encourage Knowledge Sharing: Cross-training mitigates the risk of relying on single points of expertise.
- Leverage Shared Infrastructure: Sexton reflected on NC State’s decision to commit to APTrust fully, stating, “Shared infrastructure equals shared distribution of effort and a better product.”
- Evolve Infrastructure: Long-term assessment ensures systems remain sustainable and resilient to future challenges.
A Partner in Preservation: APTrust’s Role
The need for a scalable and sustainable digital preservation solution drove NC State Libraries’ decision to be a founding member of APTrust and to expand usage in 2019. The consortium’s transparent approach, public documentation, and active member engagement provide a solid foundation for managing risks like data loss.
For APTrust, this incident underscored the value of proactive measures, like fire drills, which simulate restoration scenarios. These tests verify APTrust’s systems and assess the member organization’s ability to handle restored data. Daigle remarked, “Assurance is not just about trusting the service; it’s about ensuring your organization can make sense of what it receives back.”
The Value of Transparency
One of the most remarkable aspects of this story was NC State’s willingness to share it. Sexton reflected on the importance of open dialogue: “We very rarely talk about our mistakes… but it’s good to be prepared and to share so others can learn.” Daigle echoed this sentiment, noting that openness is critical to advancing the field: “If we don’t tell as many stories about where data loss happened, other organizations won’t learn until it’s their turn.”
Looking Ahead: Building a More Resilient Future
The incident catalyzed transformative changes at NC State Libraries, from enhanced change management to infrastructure updates. These efforts have made a difference, positioning the organization for long-term sustainability.
For APTrust, the incident reaffirmed the value of its consortial model, which is built on shared responsibility and transparency. Together, the two organizations exemplify how preparation, partnerships, and a commitment to improvement can turn crises into opportunities for growth.
Learn More
To explore this case study more fully, view the slides and recording of the original presentation from the 2022 Fall CNI Member Meeting.