It used to be that the National Security Agency and its ilk had to pay through the nose for the latest in spying technology. The supercomputer specialist Cray (CRAY), for example, would receive government funds and come out with a new multimillion-dollar machine specially tuned for “pattern matching” and then sell the system to three-letter agencies. The machines were anything but general purpose and came with a premium price tag. Beyond that, the NSA has been known to run its own chip manufacturing plant and to pay for custom software.
While that type of thing still goes on, the NSA has another, much cheaper avenue for great spy technology at its disposal: open-source software. The popularity of open-source software among the latest generation of big-time Web players—including Google (GOOG), Facebook (FB), and Yahoo! (YHOO), three of those on the Prism list—means that private companies disclose for free much of the core technology behind their services to the public. In fact, products like Hadoop and MapReduce that appear on the leaked NSA presentation slides as the keys to the government’s data-mining operation are open-source applications first developed by Yahoo and Google, then modified by thousands of people over the last few years.
How we got to here from there is quite fantastic. Oracle (ORCL) and IBM (IBM) used to own the title of world’s best database makers. IBM researchers pioneered many of the techniques on which the major databases rely, and Oracle did the best job taking this technology and packaging it for government and businesses. Getting straight to the point, Oracle started in 1977 and took its name from a CIA effort that some of the founders had worked on. For much of the last 40 years, Oracle, IBM, Microsoft (MSFT), and a handful of other companies have done the groundbreaking database and file-system work and sold their proprietary products to three-letter agencies.
When Google arrived in 1998, things started to change. Google emerged as the first major technology company to push a new wave of database, storage, and searching techniques and then allow open-source access to quite a bit of the technology. The search-engine giant needed to collect and analyze so much data that it could not afford to buy Oracle software and hardware from the big-name companies like Hewlett-Packard (HPQ) and IBM. Instead, it bought cheap hardware and wrote new software that ran well across hundreds of thousands of computers. Google’s work helped give birth to the Big Data era and a host of new data-analysis products.
Yahoo, Facebook, Twitter, and a couple of other massive consumer Web companies have been even more aggressive about open-sourcing their underlying infrastructure. In 2009 the NSA confirmed in public for the first time that it had taken some of this code and modified it to run data analytics operations. It even did the world the favor of setting up an open-source project site for some of its work!
The NSA had, in effect, gotten direct access to the inventions of thousands of the smartest computer-science minds on the planet for free. And, lucky for the NSA, these brilliant minds happened to spend their days creating technology that excels at collecting huge volumes of information and analyzing it for patterns. The Social Graph, which Facebook has popularized, is a spook’s fantasy, showing how people relate to each other and even finding non-obvious relationships between people. Just as Google and Facebook have drastically reduced the amount of money it takes to calculate someone’s taste in shoes, they have also drastically reduced the amount of money it takes to analyze someone’s chat logs.
Open-source software has long been a preferred tool of the government. People tend to think that running, say, the Linux operating system instead of the proprietary Windows on government computers might put government data at risk, since nefarious folks can see the underlying source code and manipulate it. Not so, according to myriad security researchers. They tend to argue that open-source software results in more eyes spotting more bugs and more things getting fixed. There’s also the contention that people can just flat-out see what’s going on and that transparency leads to more secure systems.
There’s no question that the government still relies on specialized, pricey systems that can search through data in unique ways. The CIA’s venture capital arm, In-Q-Tel, funds these types of companies out in the open. But when you hear about the NSA sucking up petabytes of information every hour and performing analysis on these huge vats of data, you can be sure that technology developed by the consumer Web companies helps power these efforts. And yes, your likes, tweets, and status updates have been invaluable in this quest.