Erasure Coding FAQ

What is erasure coding, and how does it differ from replication?
Erasure coding is a data protection method that breaks data into smaller fragments, expands them with redundant data pieces, and stores these fragments across multiple locations. Unlike replication, which creates multiple copies of the entire data, erasure coding ensures that the original data can be reconstructed even if some fragments are lost or corrupted. This method provides higher durability and storage efficiency compared to traditional replication.

Why didn’t we know about the use of erasure coding earlier?
Our previous understanding was based on an interpretation by former staff, who believed that AWS internally stored three copies of each file, invisible to the user. This belief may have stemmed from how data redundancy was explained at the time. Recently, we discovered that AWS (and Wasabi) uses erasure coding, a more advanced and efficient method for data protection. We are committed to transparency and continuous improvement and are sharing this update with you now.

How does erasure coding benefit as compared to the previous replication method?
Erasure coding offers several key benefits over replication:

  • Higher Durability: It provides better protection against data loss by distributing data fragments across multiple locations.
  • Efficient Storage: It uses storage space more efficiently, reducing the overhead required for data protection.
  • Improved Performance: The distributed nature of storage in erasure coding enhances data retrieval performance and fault tolerance.
  • Scalability: Erasure coding easily scales with growing storage needs without significant reconfiguration.

Are there any risks associated with using erasure coding?
While erasure coding is highly reliable and efficient, it has risks like any data protection method. However, modern implementations are designed to minimize these risks. Erasure coding is widely used in large-scale data centers and cloud storage services, demonstrating its effectiveness in protecting data. We continuously monitor and update our practices to ensure the highest digital preservation standards.

Will this change impact our existing data stored with APTrust?
The transition to understanding and utilizing erasure coding does not impact the integrity of your existing data stored with APTrust. The data remains protected with the same high level of durability and reliability. Our commitment to preserving your digital assets remains our top priority.

Can you provide more technical details or resources on erasure coding?
We recommend resources such as David Rosenthal’s blog and the IEEE Digital Library for more technical details on erasure coding and its benefits.

How will this update affect APTrust’s storage costs?
Erasure coding is more storage-efficient than replication, meaning it uses less additional storage to achieve the same or higher level of data protection. This new knowledge does not impact past or current storage costs.

Who can we contact if we have more questions or concerns?
If you have any further questions or concerns, don’t hesitate to contact us at help@aptrust.org. We are here to provide support and ensure you feel confident and informed about our digital preservation methods.