At APTrust, our commitment to preserving your digital assets is paramount. For a decade, we've assured our members that we maintained six copies of each file of an object saved to our High Assurance (also known as Standard) storage class to ensure data durability: three "hot" copies in AWS S3 and three "cold" copies in AWS S3 Glacier. Recent findings have shown that AWS does not follow this practice. Instead, AWS uses erasure coding, which provides even greater data durability. Wasabi also uses erasure coding. We want to share this update transparently and explain why this change strengthens our preservation strategy.
What We Learned
Our previous understanding was that AWS stored three copies of each file in S3 and Glacier. This belief was based on an interpretation by previous staff, who may have misunderstood AWS's storage mechanisms. AWS, in reality, uses erasure coding—a method that offers superior data protection compared to simple replication.
What is Erasure Coding?
Erasure coding is a data protection method that breaks data into smaller fragments, expands them with redundant data pieces, and stores them across different locations. This method ensures that the original data can still be reconstructed even if some fragments are lost or corrupted.
Comparing General Features of Replication and Erasure Coding
- Data Protection and Durability
- Replication: Involves storing multiple copies of the same data across different locations. While it provides redundancy, it requires significant storage overhead and is less efficient.
- Erasure Coding: Provides higher durability by storing data fragments across many locations. Even if several fragments are lost, the original data can still be recovered. This method significantly reduces the risk of data loss compared to replication.
- Storage Efficiency
- Replication: Requires substantial storage space, as each copy of the data needs to be stored in its entirety. For example, storing three copies of a file means triple the storage cost.
- Erasure Coding: More efficient in terms of storage space. It uses less additional storage to achieve the same or higher level of data protection compared to replication.
- Performance
- Replication: Generally offers faster read access since multiple copies are available. However, write performance can be impacted due to the need to update all copies simultaneously.
- Erasure Coding: While it may introduce some computational overhead during encoding and decoding processes, modern implementations are optimized to minimize performance impacts. Additionally, the distributed nature of storage can enhance read performance.
Benefits of Erasure Coding Over Replication
- Higher Durability: Provides better protection against data loss due to its distributed nature.
- Efficient Storage: Uses storage space more efficiently, reducing the overhead required for high durability.
- Improved Performance: Enhanced read performance and fault tolerance by spreading data across multiple locations.
- Scalability: Easily scales with growing storage needs without significant reconfiguration.
Addressing Concerns
This change might raise concerns about the integrity and reliability of our service. We assure you that our commitment to preserving your data remains unwavering. Erasure coding meets and exceeds the durability and reliability standards we aimed for with replication. APTrust has prepared this FAQ.
Moving Forward
- Updated Practices: Our documentation and website now accurately reflect the use of erasure coding.
- Continuous Improvement: We are committed to continuously improving our services and adopting the best practices in digital preservation.
- Open Communication: We value transparency and are here to answer any questions. Please join our upcoming webinar to discuss this in detail and address your concerns.
Join the Conversation
We invite you to join our upcoming webinar, Understanding Data Durability: Replication to Erasure Coding, on Monday, August 26 at 2pm Eastern to learn more about this update and ask any questions you may have. Our goal is to ensure you feel confident and informed about the methods we use to preserve your valuable digital assets. Thank you for your continued trust and partnership. Together, we are advancing the field of digital preservation.
Recommended Resources
Here are some excellent resources to learn more technical details about why erasure coding provides more durability than replication.
AWS Re:Invent 2022 - Deep Dive on Amazon S3 (STG203), 2022. https://youtu.be/v3HfUNQ0JOE?feature=shared&t=311.
- This video recording from AWS Re:Invent 2022 explains what happens during a hard drive failure in different scenarios where replication and erasure coding are employed for data integrity.
Beach, Brian. “Backblaze Releases the Reed-Solomon Java Library for Free.” Backblaze Blog | Cloud Storage & Cloud Backup (blog), June 16, 2015. https://www.backblaze.com/blog/reed-solomon/.
- This is a blog post from Backblaze on their Reed-Solomon erasure coding algorithms. The post goes in-depth on how erasure coding works.
Rosenthal, David. “Correlated Failures.” DSHR’s Blog (blog), March 16, 2021. https://blog.dshr.org/2021/03/correlated-failures.html.
- This post discusses the advantages of erasure coding over simple replication in the context of correlated failures in data storage systems. It highlights that erasure coding achieves higher reliability and durability than replication, making it a more robust choice for long-term data preservation.
Rosenthal, David. “Natural Redundancy.” DSHR’s Blog (blog), April 10, 2018. https://blog.dshr.org/2018/04/natural-redundancy.html.
- Another insightful post on Rosenthal’s blog emphasizes the importance of erasure coding for protecting data against various threats beyond just media failures. It argues that erasure coding, combined with other redundancy methods, offers a comprehensive approach to data protection.
Wang, Meng, Jiajun Mao, Rajdeep Rana, John Bent, Serkay Olmez, Anjus George, Garrett Wilson Ransom, Jun Li, and Haryadi S. Gunawi. “Design Considerations and Analysis of Multi-Level Erasure Coding in Large-Scale Data Centers.” In SC23: International Conference for High Performance Computing, Networking, Storage and Analysis, 1–13, 2023. https://doi.org/10.1145/3581784.3607072.
- This paper comprehensively analyzes Multi-Level Erasure Coding (MLEC) in large-scale data centers, exploring various design considerations, chunk placement schemes, and repair methods. It demonstrates that MLEC offers significant advantages in terms of performance and durability compared to Single-Level Erasure Coding (SLEC) and Local Reconstruction Codes (LRC), particularly in managing independent and correlated failures while reducing repair network traffic.