An article by technology author Steve Lohr published last week in The New York Times caught my eye. Titled “How Privacy Vanishes Online,” it explores how conventional notions of “privacy” have become obsolete over the past several years as more people engage in cyber/social interaction and web e-commerce.
What’s happening is that seemingly innocuous bits of information are being collected, “read” and reassembled by computers to build a person’s identity without requiring direct access to the information.
In effect, technology has provided the tools whereby massive amounts of information can be collected and crunched to establish patterns and discern all sorts of “private” information.
Pulling together disparate bits of information helps computers establish a “social signature” for an individual, which can then be used to determine any number of characteristics such as marital status, relationship status, names and ages of children, shopping habits, brand preferences, personal hobbies and other interests, favorite causes (controversial or not), charitable contributions, legal citations, and so on.
One of the more controversial experiments was conducted by MIT researchers last year, dubbed “Project Gaydar.” In a review of ~4,000 Facebook profiles, computers were able to use the information to predict male sexual preference with nearly 80% accuracy – even when no explicit declaration of sexual orientation was made on the profiles.
Others, however, have pointed to positive benefits of data mining and how it can benefit consumers. For instance, chain grocery stores can utilize data collected about product purchases made by people who use store loyalty cards, enabling the chains to provide shoppers relevant, valuable coupon offers for future visits.
Last year, media company Netflix awarded a substantial prize to a team of computer specialists who were able to develop software capabilities to analyze the movie rental behavior of ~500,000 Netflix subscribers … and significantly improve the predictive accuracy of product recommendations made to them.
To some, the Netflix program is hardly controversial. To others, it smacks of the “big brother” snooping that occurred in an earlier time during the Supreme Court confirmation hearings for Robert Bork and Clarence Thomas, when over-zealous Senate staffers got their hands on movie store rental records to determine what kind of fare was being watched by the nominees and their families.
Indeed, last week Netflix announced that it will not be moving forward with a subsequent similar initiative. (In all likelihood, this decision was influenced by pending private litigation more than any sort of altruism.)
Perhaps the most startling development on the privacy front comes courtesy of Carnegie Mellon University, where two researchers have run an experiment wherein they have been able to correctly predict the Social Security numbers for nearly 10% of everyone born between 1989 and 2003 – almost 5 million people.
How did they do it? They started by accessing publicly available information from various sources including social networking sites to collect two critical pieces of information: birthdate, plus city or state of birth. This enabled the researchers to determine the first three digits of each Social Security number, which then provided the baseline for running repeat cycles of statistical correlation and inference to “crack” the Social Security Administration’s proprietary number assignment system.
So as it turns out, it’s not enough anymore merely to be concerned about what you might have revealed in cyberspace on a self-indulgent MySpace page or in an ill-advised newsgroup post.
Social Security numbers … passwords … account numbers … financial data. Today, they’re all fair game.