Cloud storage is not preservation

September 26, 2022

Here’s an interesting case of “preservation” gone wrong that appeared over the weekend. I put preservation in quotes, because I’m talking about cloud backup, which is what many consumers and some organizations consider “good enough” preservation. 

Users around the world started noticing that entire collections of photos they stored in Google drive had become corrupt. There are dozens of open tickets on this, but the main one is at https://support.google.com/photos/thread/180787712/corrupted-photos?hl=en.

Digging through this thread and other discussions shows that Google was compressing original images so they could serve lower resolution copies. A bug in the compression process corrupted the low-res versions, and these were the only versions they were serving back to users.

Apparently, this happened on a large scale, and Google didn’t notice. Their users did, with some panic.

Fortunately, Google kept the original hi-res images, and users were able to get them back through Google Takeout. Then, of course, they had to find space to store them locally and check them individually if they wanted to ensure integrity. For users whose collections go back more than a decade, this is quite a task. They also may now worry that their local copies are the only truly safe ones.

These are some of the problems that dedicated, active preservation systems prevent–problems that users outside the preservation community don’t see but will inevitably have to deal with when things go wrong.

This incident shows how transparency helps in services whose job is to safeguard digital materials with a high level of trust. If users had known that Google was only showing them compressed/transcoded versions of their photos, they may not have panicked. Most probably had no idea Google was manipulating their originals in the first place.

It also shows how important it is to have someone minding the preservation the ship. When a service provider transcodes millions or billions of photos, they should have internal checks to make sure that process is working. It should never be left to users to discover that runaway automation damaged materials on a large scale.

To Google’s credit, they seem to have the situation in hand, and it looks like no permanent harm was done. But for organizations safeguarding materials they can’t afford to lose, this incident provides food for thought.

For a massive company like Google, whose true focus is on its advertising cash cow, this weekend’s incident amounts to a glitch in one of their side projects. It’s a distraction from their core focus, a mess they have to clean up in a system that wasn’t making them money in the first place.

Community-driven preservation repos, on the other hand, were built from the ground up to prevent data loss and corruption. Anticipation of failure scenarios such as bit rot, programmatic corruption, hardware failure, media obsolescence and format obsolescence are baked into the design of the system from its inception.

All of these risks increase over time, which is why they must be addressed at the system and organization levels by anyone intending to do long-term preservation. Communities like APTrust pool the knowledge of disparate professionals across many fields to solve these complex problems.

We have no trade secrets to protect. We are open about what we do and how we do it, because transparency breeds trust, and trust is essential among partners working together to safeguard valuable materials. 

Finally, we work closely with depositors to recover materials when problems do occur, because this isn’t our side project. It’s the core of what we do, our mission, and our reason for being.

Andrew Diamond

Lead Developer, APTrust

Technical, Thoughts