I often think about how to preserve data. This is mostly driven by my photography habit. My pictures are not fantastic, but they mean a lot to me, and I suspect, but am by no means certain, that they will mean something to my children and grandchildren. I certainly would love to know what the lives of my own grandparents were like, to see them in stages of life parallel to my own. But I don’t know how to make sure my kids and their kids will be able to see these photos.
This is a super difficult problem. The physical media that the images are stored on (hard drives, flash cards, etc) degrade and will fail over time, and even if they don’t, the equipment to read that media will become scarce. Furthermore, the format of the data may become undecipherable over time as well. I have high confidence that it will be possible to read jpegs in the year 2056, but when you get into some more esoteric formats, I dunno.
A commonly proffered solution is to upload your data to a cloud service for backup. I have strong reservations about this as a method for long-term preservation. Those cloud backups are only good as long as the businesses that run them have some reason to continue to do so. Subscriptions, user accounts, and advertising driven revenue seem a poor match for permanent archival storage of anything. Who, long after I’m dead, is going to receive the email that says “your account will be closed if you do not update your credit card in 30 days”? Also, what good is a backup of data I can no longer view on my now-current quantum holographic AI companion?
All of this compares quite unfavorably with a common archival technique used for informal, family information: the shoe box. Photographs stored in a shoe box are susceptible to destruction by fire or flood, but they are fantastically resilient to general benign neglect over exceedingly long periods of time. Sure, the colors will fade if the box is left in a barn for 50 years, but, upon discovery, anyone can recognize the images using the mark-I human eyeball. (Furthermore, it’s really astounding how easy it is to use a computer to restore natural color to faded images.)
There is simply no analog to the shoe box full of negatives in today’s world. Sure, you can throw some flash memory cards into such a box, but you still have the readout problems mentioned above.
As people migrate from their first digital camera to their last digital camera to iPhoneN to iPhoneN+1, lots of images have already been lost. Because of the very short history of digital photography, you can’t even blame that loss on technological change. It’s more about plain old poor stewardship. But just to amplify my point above: the shoe box is quite tolerant of poor stewardship.
* * *
Okay, so, this post was not even going to be about the archival problems of families. That is, in aggregate, a large potential loss, made up of hundreds of millions of comparatively smaller losses.
The reason I decided to write today was because I saw this blog post about this article, in which it was described how the on-line archives for a major metropolitan newspaper — going back more than 200 years, are in risk of disappearing from the digital universe.
Here we have a situation in which institutions that are committed to preserving history, with (shrinking) staffs of professional librarians and archivists are failing to preserve history for future generations. In this case, the microfiche archives of the print version of the paper are safe, but the digitally accessible versions are not. The reason: you can’t just put them in a shoe box (or digital library). Someone most host them, and that someone needs to get paid. Forever.
Going forward, more and more of our history is going to happen only in the digital world. Facebook, Twitter, Hillary Clinton’s (or anyone other politician’s) email. There’s not going to be a microfilm version at the local university library. Who is going to store it? Who will access it and how?
A few years ago, it looked like companies like Google were going to — pro bono — solve this problem for us. They were ready, willing, and seemingly able to host all the data and make is available. But now things are getting in the way. Copyright is one. The demand from investors to monetize is another. It used to be thought that you could not monetize yesterday’s paper — today’s paper is tomorrow’s fish-wrap, but more wily content owners realize that if they don’t know the value of an asset, they can’t give it away for free. Even Google, which, I think, hands somewhat tied, is still committed to this sort of project, probably cannot be trusted with the permanent storage of our collective history. Will they be around in 50, 100 years? Will they migrate all their data forever? Will they get bought and sold a dozen times to owners who are not as committed to their original mission to “organize the world’s information and make it universally accessible and useful?” Will the actual owners of the information that Google is trying to index try to monetize it into perpetuity?
I think we know the answers. Right now, it all looks pretty grim to me.