It seems like only yesterday that my friend John showed me, with pride, his brand-new AT&T 6300 Plus computer with a 10 megabyte drive. The manual that came with it said, "You may think that you'll never be able to fill up a hard drive that holds 10 mb, and you may be right! However, it's a good idea to delete files you no longer need, just to be safe. Here's how..." followed by a description of the DOS DEL command.
Earlier computers had used 5.25" floppy drives, which held about 360 kilobytes of data. John's hard disk held forty times that; so AT&T's pride was understandable. After all, these were the days of command-based, text-based applications. There were no sounds (other than the occasional "beep!"), no pictures, no icons. All you could store was words, with one character taking up one byte in the hard disk. Ten million letters and spaces is a lot of letters and spaces. Jane Austin's novel Pride and Prejudice, to give an example, is made up of roughly 650,000 letters and spaces. John's hard disk could have stored more than 15 complete novels the size of that one.
But, of course, in less than a year John and I had managed to fill his hard disk up, anyway, with the names and addresses he used in his bulk mail business. We had to purchase another hard disk, and another. Older disk drives failed, but two 10 mb drives could be replaced by one 40 mb that provided room for future growth...which too soon was not enough, either.
It wasn't just words anymore. First, graphics were introduced. Microsoft Windows not only included built-in icons, it allowed users to draw their own pictures. The first graphics were crude and didn't take up too much space, as they only allowed four different colors. But then new versions of Windows supported 16 colors, then 256, and finally the millions of distinct shades that make computer storage of photographs possible.
The human eye can perceive about 16,000,000 different shades, except for Martha Stewart, who can perceive over 24,000,000. To represent 24,000,000 distinct colors, each pixel must be four bytes wide. Our wallpaper photo takes up 1,228,800 bytes...and that's for one picture.
That's right: A single photo of the Grand Canyon or your kid blowing out his birthday cake or of Janet Jackson's "wardrobe malfunction" takes up about twice as much space as an e-copy of Pride and Prejudice.
Obviously, people's photo collections were expanding faster than the capacities of their hard disks. Compression techniques evolved that made it possible to store a wallpaper-size photo as a "jpg" (pronounced jay-peg) that might be somewhere around 100,000 bytes or even less, depending on how complex the photo was. But the photo problem was no sooner solved than computer audio was introduced. (The audio version of this essay takes up about 3,000,000 bytes.) And then there came video, which combines audio with a series of millions of photos, for which storage is in the range of billions of bytes.
Now, even though text by itself doesn't take up much space, relatively speaking, when you get into the concept of a "database" you can require a surprising amount. For example, John's name-and-address lists typically looked like this:
- Name, 32 characters
- Address 1, 32 characters
- Address 2, 32 characters
- City, 20 characters
- State, 2 characters
- ZIP, 9 characters
That's 127 bytes to store one person's name and address...but John's mailing lists often held as many as 20,000 addresses or more. Still, that "only" required 2,540,000 bytes, or about four times the size of Pride and Prejudice.
The IRS is much more impressive when it comes to databases. Each year, it adds 1.7 tera bytes to its massive collection of information on you and I and every job we've ever held, not to mention our list of deductions. (A terabyte is approximately one trillion bytes.) Considering that they must be storing at least seven years' back data, that's a heck of a lot of storage.
Now, as Dan Akroyd said to Albert Brooks in The Twilight Zone, do you want to see something really scary? Then keep reading.
Yesterday's USA Today included an interesting story about the NSA wiretaps. These have been in the news recently, with President Bush assuring us that the NSA was only monitoring calls of US citizens who telephone members of al Qaeda in foreign countries--never domestic calls. Well, according to the story,
The National Security Agency has been secretly collecting the phone call records of tens of millions of Americans, using data provided by AT&T, Verizon and BellSouth, people with direct knowledge of the arrangement told USA TODAY.
Later on, the story reports,
"It's the largest database ever assembled in the world," said one person, who, like the others who agreed to talk about the NSA's activities, declined to be identified by name or affiliation. The agency's goal is "to create a database of every call ever made" within the nation's borders, this person added.
Appearing on live TV yesterday, President Bush, who in just a couple of paragraphs managed to work in the phrase "After September 11" and to reiterate that the NSA was "only listening in to international calls with known Al Qaeda agents" and "the government does not listen to domestic phone calls without court approval," ignored the allegation that all calls were being noted.
The USA Today article described this spying as involving the numbers of the telephone making the call, and the telephone receiving the call. Common sense says they'd also need to store the date and time the call began, and probably its duration. That's all text information that should take up far less than the name-and-address information in my friend's bulk mailing lists. I could do it in 16 bytes. So let's make some assumptions:
Size of record: 16
Number of phones: 300,000,000
Calls per phone per day: 20
Size of one day's data: 4,800,000,000
Size of one year's data: 1,752,000,000,000
That's a lot of data. But it matches the IRS's 1.7 terabytes stored each year. And the IRS's data storage needs are minor compared to some other databases, like Google's with their photos of every square mile of Earth, or Wal-Mart which tracks the purchase of every single item sold by them anywhere. So how can this be the "largest database ever assembled in the world"?
1 | 2