In an old church, the Internet Archive stores our digital history

Jan 5, 2017

In San Francisco’s Richmond District, where Geary Boulevard meets Park Presidio, there stands a bright, white, defunct Christian Science church. There are big white columns out front, with pink steps leading up to iron double doors.

But what goes on inside this church is not quite what you’d expect.

The first thing I see when I open the doors is a black doormat with a white image of what looks like a Roman temple. Come to think of it, it actually looks a lot like the building I just stepped inside. That’s a happy coincidence, according to Brewster Kahle.

“We bought this building because it matched our logo,” Kahle says.

Kahle is leading me and about 15 other people on a tour of this old church, which now houses the gigantic internet project he founded 20 years ago, the Internet Archive.

“The idea was to try to build the library of Alexandria, version two,” explains Kahle.

The library of Alexandria, version one, was in Egypt. It was one of the biggest and most important libraries of the ancient world. It’s said to have housed every book – or scroll, I should say – ever written, up until the first century, BC. That’s when it burned and everything inside was destroyed. Nowadays, that would be like if we lost all of Shakespeare, or you could never watch The Wire again. The Internet Archive is trying to prevent this from ever happening.

“And it turns out,” Kahle says, “technologically, you can do it.”

Kahle and a team of more than 50 engineers, programmers, archivists and volunteers are doing that right in this building. He pays for all of this with money he earned early in his tech career, and from foundations.

As Kahle leads us up to the second floor, I can hear the faint hum of hard drives. But the first thing I see is a full-on church.

We’re in a cavernous room lit by daylight streaming in through big, stained-glass windows. It actually does feel kind of holy. There are real pews in here, arranged in a wide half-circle facing an altar. The floor is sloped like an amphitheater.

“So we thought when we bought this place that we would flatten the floor and make it into a library, but exactly what a library looks like or is like in the future, we don't know,” says Kahle. “So we thought, ‘Why don't we just take it slow and we'll have the building adapt to us and we'll adapt to the building?’”

One of those adaptations is humming right behind me. Inset in a space that looks like it should hold a statue of a saint or something, instead stand three towers of compact, black computer servers, illuminated by blinking blue lights.

“Every time a light blinks, someone is either uploading something or downloading something from the Internet Archive. We get between two and three million users a day,” says Kahle. “So people actually want old stuff.”

Old stuff like a copy of Dracula from 1897. Or books from the year 1000 AD. Or recordings of a Grateful Dead concert from the 1980s (there are over 95 million downloads of those). All this stuff that might otherwise disappear – or, at least, remain inaccessible to most people.

“If it's not online, it's as if it doesn't exist,” says Kahle. “People aren't going and necessarily hunting things down in libraries in the way that they used to 50 years ago. So we're if we're going to be bringing up our kids with this as their whole experience of information, we better put the best we have to offer within reach of our children – and it's not there now.”

The Internet Archive is working to change that. With 30 scanning centers in eight countries, they’ve uploaded more than three million books in 184 languages. Those books are publicly available, for free.

Sometimes libraries pay the Internet Archive to digitize their entire collections. If that sounds like an incomprehensible, even impossible, amount of data to you, it doesn’t to Brewster Kahle.

“It's absolutely possible to go and have all the books, music and video online,” Kahle says.

Kahle says the 28 million books held in the largest library in the world, the Library of Congress, could fit onto about seven hard drives.

There are a lot more than seven hard drives in our next stop – a couple hundred more. All these servers also hold the data for another project of the Internet Archive, the Wayback Machine. That’s a searchable spot on the site that preserves past versions of around 145 billion websites. Every two months, a program built by the archive crawls the entire Internet.

“The idea is to just visit all of them and record the pages and then start again,” Kahle says. “So this has been going on since 1996.”

You can go back in time to see what Google looked like in 1998. (It doesn’t look all that different.) Or, you can go see some of the first viral internet videos.

About six hundred thousand people use the Wayback Machine every day, for fun and for not-so-fun.

“We also see people use it to hold companies accountable for things that they used to say. And it's been used in court a lot to go and say, ‘But this is what you offered,’” explains Kahle. “It's been useful in politics and that same sort of way of keeping people accountable for what it is they used to say.”

Things like keeping track of the original privacy terms of your Facebook account, for example, or what the President said in his State of the Union address.

“If we're just locked into the present, if we only had what it is people want us to know right now without any history, we would live in a very Orwellian world,” says Kahle.

But, if Brewster Kahle has his way, the only place you’ll find that very Orwellian world is in a book on the Internet Archive.

This story originally aired in January of 2015. Since then, the Internet Archive made national news with its announcement that it's planning on backing up all its data in Canada, to protect itself from possible interference from President-elect Donald Trump. Today, the organization released the Trump Archive, a collection of over 520 hours worth of recorded video of the President-elect in an effort to catalogue and track his statements on public policy.