-- Andrew Diamond, Lead Developer
When I started my career in software development in the late 1990s, it was nearly impossible to convince project managers and stakeholders of the importance of creating secure applications and systems. Everyone understood the basics, that user accounts should be password-protected, and that the servers running our applications should sit behind firewalls. But beyond that, what was the point of investing in security? It added substantial cost and delayed project timelines with no visible benefit to users and no financial benefit to stakeholders.
Software users naturally viewed the systems we created from a user's perspective: the software does what we want, so what exactly needs to be "fixed?" Investors and project managers saw it from a stakeholder's perspective: the sooner the system is up and running, the sooner we can start making money. We'll have a platform we can build on and expand.
System administrators and software developers took a broader perspective. We weren't just delivering systems, we had to keep them running. Managers understood the need for adequate hardware, backup power and system monitoring. We could tell them ahead of time what we needed along those lines and how much it would cost.
It was harder to convince them of the need to prepare for more vague future threats, such as hackers and malicious users. After all, we were building systems for users, not abusers. Why waste our time designing for users we didn't want, for contingencies that weren't profitable? Who would target us anyway? Surely hackers would focus on banks, where the money was. Let the banks worry about security.
System administrators knew better. Look at the logs of any publicly accessible server and you'll find hundreds, if not thousands, of hack attempts each day. Most of them are automated, simple and harmless--the cyber equivalent of someone checking every car on the street for unlocked doors.
Eventually, as public and commercial infrastructure moved online, organizations did begin to suffer breaches. In some cases, hackers were stealing data. In others, they simply made a mess that took important services offline and left IT departments scrambling to restore order.
Companies used to suppress information about security breaches because they were embarrassing and they undermined public confidence in their products and services. The few who suffered serious damage began to take security seriously. Others continued to neglect it until they suffered their own disasters. The problem was so widespread that all fifty US states eventually passed laws requiring companies to publicly disclose security breaches that compromised personal information.
The US Securities and Exchange Commission and the Federal Communications Commission also issued guidelines requiring disclosure of breaches in which personal information was leaked. Federal law now holds organizations liable for certain data breaches, and many have had to settle embarrassing and expensive class-action suits.
Companies began to take security seriously only after two critical conditions were met. First, breaches showed that the vague, theoretical threats that system administrators had been warning about for years were real. Second, disclosure and liability laws ensured that the reputation and finances of companies with poor security practices would suffer. The hackers and the regulators effectively made the argument that the system administrators had been losing for years, because the hackers and the regulators put it terms the managers could understand: ignore this and you will pay.
Digital preservationists today face problems of advocacy and understanding similar to those faced by the security-minded computer professionals of twenty years ago. While preservationists are constantly anticipating threats that loom just over the horizon, budget-conscious managers are under pressure to solve today's problem today, with as few resources as possible. To them, preservation means having an extra copy of data somewhere outside the production systems that can be restored if things go wrong. If anything does go wrong, it will be up to their successors to dive into the backups and sort it out.
Preservationists understand the world doesn't work this way. In the years between you making the backup and you needing to restore it, the tapes and hard drives have gone bad, the software that can read the data you saved has gone extinct, and the people who understood how to restore the materials into running, accessible systems no longer work here. This is like the old definition of hopelessness: a blind man in a dark room looking for a black cat that isn't there.
The inherent challenge of digital preservation is that we're trying to preserve materials for the long term on physical devices and digital technologies that will become obsolete in a few years. We don't have to wait for the hackers to attack us, or even wonder if they will. The technical landscape will evolve around us inevitably. The state of the art video we've backed up to cloud storage, while accessible today, will soon be as useless as those old Flash videos stored on Iomega Zip drives. If the storage medium didn't rot, good luck finding the hardware to read it. And if you can read the bits, good luck finding software that can interpret them. If you do get that far, you'll likely need contextual information to make sense of what you're seeing. Let's hope the archivists who felt this material was worth preserving had the time and budget to preserve the contextual metadata as well.
Many people don't even understand that digital preservation is a practice, requiring considerable forethought and ongoing intervention. In the past, societies preserved information on physical media that lasted centuries: marble, bronze, stone and clay tablets. Even humble paper endures for generations under the right conditions.
Digital information, on the other hand, is inherently unstable. Though the public doesn't see it, Google is constantly swapping out hard drives to keep your photos and slide decks available. They put immense work into making their documents, spreadsheets and slides compatible with Microsoft Office and PDF standards so that you can keep reading your files. After all, it's not just the bits you want, it's the human-readable information.
YouTube has invested tremendous resources to ensure creators can upload videos in numerous formats and viewers can watch them on virtually any device. Video formats have changed over time, but it's impossible for users to see how much behind-the-scenes work goes into keeping the viewing experience seamless. The fact that everything in the well-funded commercial world just seems to keep working on its own makes it hard for digital preservationists to make the case to managers that this work requires considerable planning and resources.
Software engineers learned decades ago that trying to add security to a product after launch was like setting sail on a leaky ship with the glib idea that we'd patch the holes as they appeared. We learned the hard way that if we wanted our systems to be viable in the long term, we had to build security into the initial design, and we had to address safety concerns in every function and feature that emerged as the product evolved. The job was never "done," it was simply integrated as a fundamental and necessary component of the larger practice of software development.
The software security Cassandras whose dire warnings went unheeded by management were eventually proved right by their enemies. High profile hacks and ransomware attacks were so costly and humiliating, organizations had not only to address them, but to anticipate them.
Digital preservationists don't have the advantage of waiting for a shock-and-awe event that will wake the public up to their cause. It's unlikely that we'll see a dramatic loss that the news sites will value in the billions of dollars, stirring public outrage and calls for change. Because loss of cultural knowledge can't be reduced to dollar amounts, preservation failures don't make for captivating news stories.
The materials we lose over time will go piecemeal, as someone forgot to copy the contents of a dying hard drive that was itself considered preservation, someone else stored research data in an undocumented propriety format, and some knowledge organization locked all its information behind a paywall before finally going out of business. We'll wake up one day to find holes throughout the cultural record, and there will be no inventory of what was lost .
Unlike software developers, preservationists don't have to wonder if or when an enemy might show up to rot out the core of everything they worked so hard to build, because unlike the hacker, time isn't an opportunistic foe waiting to stumble onto a worthy target. It's inevitable. It's here, and it's already outflanking the plans we're making to get us through to next year.
So how do we make the case to the public, to the decision makers, to the ones who hold the purse strings, that we need to address these problems now and continually going forward?
Ideas?
Anyone?