- Archival Media Preservation - http://archivemediapartners.com/AMPed -

Long-term Website Preservation Uncertainties

Tweet [1]

Since the advent of the Internet, floodgates have opened with people creating all forms of documents put on the web. And with open source and proprietary software, the proliferation of websites and blogs has been nearly overwhelming. But will all that material be around a year or five from now? What will exist in the future? How will it be archived? Internet content creators cannot be certain that their material will be around for years to come. A lot of people might be OK with that, but if they do want their sites around for posterity, they should be proactive in saving their works.

Although “good-faith relationships” typically exist between users and the Internet service providers, the records actually belong to the latter, and most of them do not archive all the websites in perpetuity. In 1997, Tom Hyry and Rachel Onuf wrote in Archival Issues, “From multimedia projects to personal homepages and beyond, new expressive digital media proliferate. Since content in these formats, too, can be easily altered over time, the past forms, looks, and contents of these documents become replaced, and normally lost, with the developments of their replacements.”

Another consideration is when “sponsoring organizations” of Internet sites cease to exist, like for political campaigns, so the digital contents may depart as well. This happened to Al Gore’s website after the 2000 election was finally called. Of course there is the Internet Archive [2] and its Wayback Machine [3] to view various websites, but it does not have everything, nor are all the links active. Brewster Kahle created the Internet Archive in 1996. According to its website, it contains almost two petabytes of data and is currently growing at a rate of 20 terabytes per month. “This eclipses the amount of text contained in the world’s largest libraries, including the Library of Congress.” But Kahle acknowledges, in a 2007 American Archivist article, that “digital technologies erode very quickly. The current digital technologies only last about three years. In the last ten years, we’ve moved – transitioned – our materials three times.”

There continues to be concern about whole companies’ sites disappearing. For example, Driveway.com [4], a provider of free web-based digital storage, had approximately two million users. The company then announced its “demise” giving people a two-week notice to move their files. If those customers did not see the notice, they lost all their material.

There is a risk of disappearance for even the more prominent websites. For instance, on January 20, 2001, Inauguration Day, the White House website had changed completely with the incoming president. However, the previous contents of the Clinton administration’s site, and its searchable archive companion site, were “completely wiped clean.” Called “link rot,” thousands of links within other websites were broken. Not only did this create issues for the general public who may have wanted to research material from and about Clinton’s tenure, but archivists and historians know such material is vital to analyze a presidency. Fortunately, the National Archives and Records Administration (NARA) took action to preserve various “renditions” of those sites which were taken down. This is recounted in the article “Digital Preservation: Paradox & Promise” by R. Wiggins in Library Journal Net Connect from 2001.

Some computerized material and electronic records are difficult to preserve and access since they are “born digital.” As more documents are authored in a digital form, some of that material cannot be reduced to print, at least not without “substantial loss of content or function,” according to Clifford Lynch, Executive Director of the Coalition for Networked Information. These documents also create other issues for archivists such as making sure works like digital photographs that might be accessioned into an archival database have metadata added to them so correct cataloging information can be kept.

The Internet Archive, amongst all its electronic pages of information, warns that when it comes to preservation, “any medium or site used to store data is potentially vulnerable to accidents and natural disasters.” And with the news reported on October 12 that a division of Microsoft, called Danger, had a server crash leaving users of its Sidekick device without their photos and other personal information, this just reinforces the vulnerability of all the electronic material that deluges the Internet.

Related Posts with Thumbnails [5]