People seem to care a lot more about being monitored by the government online than they care about it collecting data on their phone calls. Last week’s revelation that Verizon (VZ) was sharing reams of so-called transactional data with the National Security Agency was quickly overshadowed by the discussion of Silicon Valley’s involvement in Prism, an electronic surveillance effort that collected e-mails, photos, and other user information from several of the largest tech companies. In a Pew Center poll published Monday, however, 56 percent of Americans said the NSA’s phone-tracking program was acceptable, while only 45 percent said the government should be able to “monitor everyone’s e-mail and other online activities if officials say this might prevent future terrorist attacks.”
This could well reflect the instinctive feeling that the content of online activity is more telling than data about who you called and when. But a recent analysis by Kieran Healy, an associate professor of sociology at Duke, should cast a bit of doubt on that. He took a data set showing whether some 260 people involved in the American Revolution belonged to one of seven groups. Taking on the role of an analyst for the Royal Security Agency (with the accompanying old-timey style of writing), he set to work. Through a series of relatively simple steps, he determined how people were connected to one another, how the memberships of various groups overlapped, and which people served as the most important connections within the revolutionary community. He quickly identified Paul Revere as the kind of person who might be best-positioned to warn his co-conspirators when the redcoats were coming.
If you’re interested, it’s worth reading through the techniques. But suffice to say that Healy did not have access to an enormous number-crunching machine in the Utah desert, and the data he was working with wasn’t very large. Here’s how he explains his approach:
For the simple methods I have described are quite generalizable in these ways, and their capability only becomes more apparent as the size and scope of the information they are given increases. We would not need to know what was being whispered between individuals, only that they were connected in various ways. The analytical engine would do the rest! I daresay the shape of the real structure of social relations would emerge from our calculations gradually, first in outline only, but eventually with ever-increasing clarity and, at last, in beautiful detail—like a great, silent ship coming out of the gray New England fog.
The NSA’s collection of transactional data may seem more abstract than spies reading your e-mails Stasi-style. But it is much more in tune with the current let-the-robots-do-the-work research methodology, according to Matt Blaze, an encryption expert who directs the Distributed Systems Lab at the University of Pennsylvania. When looking for data sets to analyze with computerized techniques, it’s nice to find those in which no one has to pay attention to what’s being said. “On a massive scale, when you look at everyone’s metadata, it becomes even more powerful, more revealing, perhaps, than content,” said Blaze in an e-mail. “Unlike with content, there’s no real limitation on how much of it can be effectively processed. Analyzing metadata is something ideally suited to computer analysis, and there are powerful algorithms that can discover far more about our behavior, interests, and roles in the community than a human analyst could possibly manage alone. And the more metadata there is, the more revealing it is.”
Various people within the federal government have been arguing pretty hard that the review process for targeting people protects against the misuse of the data. This is hard to judge when the review process is a closely held secret and public discussion crosses into potential treason. But not everyone believes collecting all this data is invasive. “In the standard law-enforcement search, the government establishes the relevance of its inquiry and is then allowed to collect the data. In the new collection-first model, the government collects the data and then must establish the relevance of each inquiry before it’s allowed to conduct a search,” wrote Stewart Baker, a former NSA and Department of Homeland Security official, on his blog. “If you trust the government to follow the rules, both models end up in much the same place.”
The stubborn abstractness of the consequences of privacy violations in the age of big data has been a stumbling block for advocates, and Healy’s example does lend a certain credence to the argument that those without something to hide shouldn’t worry about privacy. After all, Paul Revere’s name floated to the top while colonial subjects who weren’t central to the rebellion didn’t emerge as suspects. At the same time, Healy does note “the prospect of discovering suggestive but ultimately incorrect or misleading patterns.” Still, this scenario makes it seem very possible that we’d be saluting a different flag if Paul Revere really did have a cell phone: