Heavy Twitter users connect to the site like it’s an IV drip — a steady stream of information about what their friends, favorite celebrities and business contacts are saying and doing in real-time. The result is an often overwhelming load of information, and not just for Twitter-addicted users; it’s also an unreasonable amount to manage for Twitter itself. But now the company is developing a new architecture to meet our always-on, real-time demands, and these modifications might just be something other consumer web companies should watch and learn from.
The Problem: Polling is Imperfect
Twitter currently deals with clients like TweetDeck and Seesmic, which are constantly left on, by allowing the clients to ping its servers for new information about a user a certain number of times per hour. Though the precedent — 350 times per hour — may seem like a lot, in reality, it’s become insufficient for our always-on world.
Not only that, the current Twitter client-server polling process is imperfect and wasteful. Often the client polls the server when it has no new data to offer, and the data is not actually delivered in real-time, but in batches. Plus, it hasn’t worked reliably; to date, Twitter has had to horizontally scale its infrastructure by guessing the number of queries per second, something it’s had difficulty doing in the face of neverending growth. All too often, the result is the cheery face of the fail whale (or equally as bad, no new data in your Twitter client).
The Solution: Rearchitecturing Twitter
But Twitter is now rethinking the way its system communicates with desktop clients, and new architecture will create persistent connections that actually send information in real-time. These “user streams” were first announced in April at Twitter’s Chirp developer conference. With them, every time a user loads up her client, it will make a unique connection with a server at Twitter that will remain open and transmit incoming data until the user shuts down the client. (The next time she logs in, a new connection will form.)
Company head of product Jason Goldman compares the user streams functionality to a Bloomberg terminal — the specialized computer setup that stock market trackers use as a dashboard for real-time financial information. In addition to delivering tweets from the users and search terms an individual follows, Twitter will also tell the client about other relevant actions in a user’s network — retweets, favorites and following actions, to name a few. And not just for the people she’s directly connected to, but for her connection’s connections, one level out further in the social graph. Instead of just new Tweets, a user might see who her friends have recently followed, her network’s favorite Tweets, or who has favorited something she said. Twitter, then, will not only get faster, it will also get more interesting.
The new user streams API is a deviation from the REST-based API used for almost everything at Twitter. However Twitter’s existing REST-based and search APIs aren’t going anywhere; in many cases they are more effective than the new streaming API. For instance, answering a complex search query along the lines of “Tweets that say this but not that, and were posted in the vicinity of this place” would not be a good project for the streaming API, said Raffi Krikorian, tech lead for Twitter platform. For users, the way content is posted will remain the same, the way data is stored stays the same, and the way data is searched for stays the same. But the streaming API should ensure that the way data is received is more real-time, rich and reliable.
Twitter at the Speed of “Wow”
Twitter head of platform Ryan Sarver said he hopes one side-benefit of the launch will be that third-party Twitter clients can stop focusing so much of their efforts on building a competitive advantage around rate limits. With the new system, every client will get all the Twitter data they need, and almost instantly. And since Twitter has become important to the lives of so many people as a communication and news tool, these new developments mean a faster and much more useful way of sending and receiving information.
User streams are “almost feature-complete,” said Ryan Sarver. They will be the next big launch for his team. Twitter said Wednesday it hopes to get the product out before the end of the year. Sarver admitted the project was progressing “slower than hoped” in part due to other infrastructure concerns, like the World Cup. “It’s a big shift for us in how network will shepherd this data around so we need to do it prudently,” he said.
User streams are coming first to desktop clients, and then after that to site streams that move data between servers, like what the business Twitter platform CoTweet does. Eventually they could come to mobile — possibly to the BlackBerry first, because it’s built to hold a connection to the server, unlike the iPhone, said Krikorian.
Currently TweetDeck, Seesmic, Echofon and Ubertwitter have gotten access to a prototype of the functionality. And as of Wednesday, TweetDeck and Echofon are rolling out user streams to a limited set of beta testers. In a blog post, TweetDeck’s Richard Barley called the upgrade “Twitter at the speed of ‘Wow!’”
The Lesson: Scale Is Scary
Sarver said user streams are expected to “have a measurable impact on reducing infrastructure demands and costs” because they will eliminate constant and unnecessary polling. Even though the streaming connection is always open, it’s event-driven, only transmitting information where there’s something new to say. However, Sarver admitted that it’s possible that streaming could end up generating more demand than REST.
Twitter didn’t invent the concept of a persistent connection to a server. It’s commonly used for communications technologies. Chat clients like Meebo, for instance, make use of synchronous server connections in the browser; enterprise mail programs like Exchange use them to deliver messages on mobile. Though it might seem unwise to take infrastructure advice from Twitter, the fact is the service has experienced unprecedented growth and usage. The company built its service around an architecture that was designed to perform a set function, but without the capacity to scale. Now the company’s engineering team has to retain (and improve) that capacity and still scale. It’s a lesson other web startups should be so lucky to heed.