Consider the tweet. It’s short—140 characters and done—but hardly simple. If you open one up and look inside, you’ll see a remarkable clockwork, with 31 publicly documented data fields. Why do these tweets, typically born of a stray impulse, need to carry all this data with them?
While a tweet thrives in its timeline, among the other tweets, it’s also designed to stand on its own, forever. Any tweet might show up embedded inside a million different websites. It may be called up and re-displayed years after posting. For all their supposed ephemerality, tweets have real staying power.
Once born, they’re alone and must find their own way to the world, like a just-hatched sea turtle crawling to the surf. Luckily they have all of the information they need in order to make it: A tweet knows the identity of its creator, whether bot or human, as well as the location from which it originated, the date and time it went out, and dozens of other little things—so that wherever it finds itself, the tweet can be reconstituted. Millennia from now an intelligence coming across a single tweet could, like an archaeologist pondering a chunk of ancient skull, deduce an entire culture.
Twitter’s (TWTR) Nov. 7 initial public offering marks the San Francisco-based company’s coming-out party, the moment when it graduates from its South of Market beginnings and takes its place as one of the Internet’s most valuable properties, without ever turning a profit. What’s perhaps most remarkable about Twitter’s rise is how little the service has evolved from the original core concept of the 140-character tweet—which is to say, not at all. It’s tempting to view tweeting as silly and trivial, and Twitter itself as overhyped and overvalued. But there’s some sophisticated, supple, and even revolutionary technology at work. Appreciating Twitter’s machinery is key to understanding how an idea so simple changed the way millions of people advertise their existences to the world.
How do you look inside a tweet? It’s easy; the structure of a tweet is a matter of public record. Twitter, as a modern Web company, reveals to the world some of the technology it uses, in the form of an application programming interface—an API—which allows external software developers to build tools on top of the service, making it more widely used and thus more valuable for everyone.
All tweets share the same anatomy. To examine the guts of a tweet, you request an “API key” from Twitter, which is a fast, automated procedure. You then visit special Web addresses that, instead of nicely formatted Web pages for humans to read, return raw data for computers to read. That data is expressed in a computer language—a smushed-up nest of brackets and characters. It’s a simplified version of JavaScript called JSON, which stands for JavaScript Object Notation. API essentially means “speaks (and reads) JSON.” The language comes in a bundle of name/value fields, 31 of which make up a tweet. For example, if a tweet has been “favorited” 25 times, the corresponding name is “favorite_count” and “25” is the value.
You know how the National Security Agency collects “metadata” about the phone calls Americans make? Well, that’s what these fields are, except instead of metadata about phone calls, this is metadata about tweets. In fact, those 140 characters are less than 10 percent of all the data you’ll find in a tweet object. Twitter’s metadata is publicly documented by the company, open for perusal by all and available to anyone who wants to sign up for an API key.
This metadata contains not just tidy numerals like “25” but also whole new sets of name/value pairs—big weird trees of data. A good example is in the “coordinates” part of the tweet. This value contains geographical information—latitude and longitude—in a format called GeoJSON, a dialect of JSON that’s used to describe places. This can seem complicated at first, but it’s actually awesome, because it means that simple-to-understand formats such as JSON can express some pretty complex ideas about the world. GeoJSON isn’t controlled by Twitter; it’s a published, open standard. Twitter has added another field, called “place.” Places are not just dots on a map but “specific, named locations.” They include multiple coordinates—they actually define polygons over the surface of the earth. A tweet can thus contain a very rough outline of a given nation. A few tweets can, with some digital fiddling, serve as a primitive atlas. And through some slightly complex math, they can reveal how far one tweeter is from another. Tweets also have a “created_at” field, which indicates the exact time at which they were posted.
This is where things get interesting. With just the places and the times, you can do some database work and learn when people in every corner of the world are in the strange, receptive state of social media engagement. Could be valuable! This information might tell you the best time to update a blog post or communicate with the most human beings at once, or when to release an advertisement. Maybe we learn that certain people tweet most assiduously right before leaving for work, a fine time for an advertiser to pitch them some orange juice or a new car to ease their commute.
This is the sort of combinatorial work that defines the modern Web: There’s so much data that there’s a very good chance that you will be the first of all humankind to find something interesting or unexpected. Whether you find something valuable is another question. But it’s surprisingly easy to become an expert in a very tiny niche as a developer—to become a world-leading expert in Android video or a specialist in Twitter geography—and charge accordingly for your services.
For all the possibilities of APIs, there are also limits. Another tweet field, “withheld_copyright,” if set to “true,” lets you know that a tweet is in trouble—that its content has raised flags and hackles over copyright. The text of the tweet, in that case, may be suppressed. The “withheld_in_countries” field can provide a list of the nations in which the tweet is banned. Another field has a telling name: “possibly_sensitive.” It’s set to either true or false. The field indicates whether a tweet links to potentially offensive things such as “nudity, violence, or medical procedures.” (If ever you wanted a snapshot of our world’s anxieties in three terms, there you have it.) As a user you can check a box on your profile so that the media you link to is automatically flagged this way. If you don’t, you risk having your pictures of your medical procedure marked as objectionable by an offended reader and thus put “in review,” the Twitter version of limbo.
A field like this indicates the inherent difficulty of managing an enormous platform like Twitter. The only way the company survives is if it can safely ignore most of what’s said on Twitter. If it had to use employees to monitor tweets, it wouldn’t last a day. But in order to attract as many users as possible, it must find ways to avoid horrifying them.
There’s a great deal of hedging in both the words “possibly” and “sensitive.” The end result is that Twitter is putting the moral burden on the user. One person’s art is another person’s smut, and Twitter is not going to decide which is which—nor is it going to force you to look at the stuff. This position is both somewhat noble for its acceptance of the range of human expression and also highly expedient, putting the responsibility back on the user: We told you the picture was “possibly sensitive,” so why did you look at it?
Much of the rest of the metadata contained in a tweet is familiar: The number of times people have starred, or “fav’d,” a tweet, the number of times it’s been retweeted. The value for “user” contains a whole huge bundle of stuff—the user’s name, a link to an avatar image, the user’s number of followers, the number of people the user follows, and whether the user is “verified” and deserves one of those blue checkmarks. It’s a pretty full portrait of an individual, given that this portrait is attached to every tweet.
From a single tweet and with no other information, you can extract a sense of social influence—how big a voice an individual has, the number of people they reach, the number of people who engaged with this particular tweet. Tweets themselves are just regular text (although text on a computer is anything but regular; there are dozens of abstractions that make it possible for an “a” to appear on a screen—but it’s safe to gloss over that). Here it is, 140 characters, a plain little beastie. You might be fooled into thinking there’s hardly anything there.
That’s the genius of Twitter. All of this scaffolding has emerged around a very basic human impulse. A tweet is the manifestation of the human desire to communicate with many other humans at once—to exercise some influence, to inform, amuse, or outrage. Of course, people have been informing, amusing, and outraging each other forever. It’s been said that Twitter is more of a discovery than an invention. What did it discover to make its insane growth possible?
First, Twitter discovered that blogging is hard. At the time of its birth in 2006, many people in traditional media mistakenly thought that blogging was too easy, and would lead to a profligacy of voices and perhaps even the downfall of polite society. But creating and maintaining an old-fashioned blog took time, effort, and an audience. Twitter democratized blogging by redefining it—the term “microblogging service” is today as meaningless as “microcomputer,” but that’s what Twitter was. It gave millions of people voices they might not have known they possessed—and now is in position to sell a place among those voices to advertisers.
Another of Twitter’s discoveries was that mobile phones could work as a broadcast platform. This was something of a miracle of timing: A massive proportion of its traffic today comes from mobile devices. The short length of the tweet was perfect for celebrities in limousines to communicate with thousands, and later millions, of followers. The tiny payload of tweets could be easily jammed into narrow mobile phone data streams, giving people a real-time flow of information.
Twitter started with a very simple form—a single box on the Web with a limit that kept people from inserting too many characters—and through tens of billions of repetitions became a network unto itself. It’s embedded within the Web’s culture, but it’s also so large that it’s separate from the rest of the Web. The technologies that go into building Twitter today are not the same technologies one uses to build a typical website. The tweet is the social network’s building block in the way that Web pages built the Web in the mid-1990s.
Twitter’s founders recognized that encouraging people to use a very small number of very tightly controlled forms, billions of times over, creates huge, deeply interconnected, highly creative, and potentially profitable new spaces. It’s as if you could, with exactly the right kind of bricks, build a skyscraper that was infinitely tall. Twitter, like its half-sibling Facebook (FB), became so powerful that people now use it to log on to other websites; your Twitter identity is a major component of your Web identity. And today major news properties and blogs increasingly look like Twitter: infinite streams of data, tags, and voices.
The history of Twitter, as it’s been told so far, doesn’t offer a moment where various youngish men rise up from their desks and run naked through San Francisco yelling “Eureka!” It was created, like most things, in meetings. Somewhere in those meetings, Twitter uncovered a latent aspect of human life that had never before been so clearly articulated and turned it into a product that has altered, to various degrees, hundreds of millions of lives. That much of what is tweeted is trivial or silly is an obvious truth—but that’s not Twitter’s fault. That’s on us.
Ford is a programmer and the creator of SavePublishing.com.