Something happened over the weekend that I’m at a complete loss to explain. AOL released a list of over 20 million searches by 500,000 658,000 users. The online giant apparently did this for “research” purposes, although a key battle was won by Google a few months back to keep the government out of this kind of privacy-invading data.
Michael Arrington at TechCrunch has more details on this strange and upsetting incident and it seems that AOL has taken the data down, although a mirror site still provides access to this mother lode of information. He says “the utter stupidity of this is staggering.”
The utter stupidity of this is staggering. AOL has released very private data about its users without their permission. While the AOL username has been changed to a random ID number, the abilitiy to analyze all searches by a single user will often lead people to easily determine who the user is, and what they are up to. The data includes personal names, addresses, social security numbers and everything else someone might type into a search box.
He also points out that there is “some really scary stuff in this data” an obvious fact reinforced by Markus at The Paradigm Shift. Contained in the data are social security numbers, credit cards and other personal information.
Even more creepy, Markus points out that a lot of users in the data seem to be researching how to commit murder. One user, identified by number only, is clearly hunting for a way — even if only in fantasy — to murder his wife.
Check out the search history for user 17556639, most recent search is at the bottom of the list.. Does this look like the search history of a user wanting to do something bad?
17556639 how to kill your wife
17556639 how to kill your wife
17556639 wife killer
17556639 how to kill a wife
17556639 poop
17556639 dead people
17556639 pictures of dead people
17556639 killed people
17556639 dead pictures
17556639 dead pictures
17556639 dead pictures
17556639 murder photo
17556639 steak and cheese
17556639 photo of death
17556639 photo of death
17556639 death
17556639 dead people photos
17556639 photo of dead people
17556639 www.murderdpeople.com
17556639 decapatated photos
17556639 decapatated photos
17556639 car crashes3
17556639 car crashes3
17556639 car crash photo
Let’s hope that user 17556639 is a murder mystery writer, or perhaps a writer for one of the numerous true crime TV shows, or maybe even a Walter Mitty-esque disgruntled husband. But won’t the authorities be even slightly interested in what user 17556639 is up to? If not, don’t we all bear responsiblity if user 17556639 actually does something?
This is the problem with privacy violations. If user 17556639 is indeed someone with a highly imaginative interior life, or merely a writer looking for versimilitude, now everybody knows it and trouble could be on the way — for no good reason.
AOL ought to be utterly ashamed and issue an apology. Some people are calling for a boycott of the service, an ironic development given that the company just unveiled a risky and bold move to draw more people to AOL.
Update: According to John Battelle, AOL is gearing up to officially apologize and say, in essence, “we screwed up.” Here’s a section of a draft press release that John got from AOL’s PR folks.
This was a screw up, and we’re angry and upset about it. It was an innocent enough attempt to reach out to the academic community with new research tools, but it was obviously not appropriately vetted, and if it had been, it would have been stopped in an instant.
Although there was no personally-identifiable data linked to these accounts, we’re absolutely not defending this. It was a mistake, and we apologize. We’ve launched an internal investigation into what happened, and we are taking steps to ensure that this type of thing never happens again.
Cynthia Brumfield at 10:30 AM|Comments(1)
In all the current discussion about AOL's sharing of the query-log
data, there has been little emphasis on the importance of such data to
research on information retrieval. In addition to the real privacy
concerns, a key point that must be considered is the fact that if
useable data is not made available to the wider research community,
only the big search companies will be able to analyze that data. We
academic researchers are increasingly dependent upon industry for this
sort of data to do research; the sort of small-scale data that can be
gathered in a university-based setting is simply insufficient for
obtaining reliable experimental results.
Should companies be prevented from sharing data with the research
community (either by law or public outcry), research progress will be
greatly reduced, as it will be impossible to compare different studies
with one another, since each study's data will be proprietary, and
thus no one will be able to trust any research result from another
lab. All non-industrial research in this area will more-or-less dry
up, and search technology will tend more and more to be developed in
"closed-shop" efforts within the large firms; innovative startups and
open-source hacking will not exist, since the research projects that
serve as launching pads for such technological innovation will not
exist. This prospect should disturb us all, as search technology
(broadly construed) is more and more the vehicle that people use to
gain information about their society and the world.
All of this is not meant to ignore the real privacy issues that can be
involved in the preparation and release of such data. It appears to
me that there was little real privacy risk in the data released by
AOL, but it is clear that policies and practices need to be debated
and developed that accomplish two essential goals: (a) to protect the
privacy of individuals in any sharing of research data, and (b) to
ensure that as much useful data can be shared by companies with the
greater research community. In this effort researchers and privacy
experts must collaborate to ensure that all sides of these important
issues are properly addressed.
Shlomo Argamon, Associate Professor
Department of Computer Science
Illinois Institute of Technology
Chicago, IL 60616
Posted by: Shlomo Argamon at August 10, 2006 1:30 PM