Dissecting Twitter’s Capacity Issues
So a witty and original thought pops into your head and you need to share it with the world that instant in 140 characters or less. Where do you turn? Twitter of course, but this time is different. Instead of being greeted by your inspiring background and a feed of the latest news from those you follow, you are welcomed by a friendly whale being hoisted out of the ocean by several cute birds. Besides being completely unrealistic (there is no way eight birds would ever be able to carry a whale, that’s just silly), this can become quite frustrating and trigger many questions. First, where are the birds taking the “fail whale”? A diagnostic review of the situation reveals that the whale looks cheerful. Maybe the birds are helping the whale escape from the BP oil spill to safer waters. The waves beneath the whale are partially orange, not necessarily the color you would like water to be. The birds could also be assisting the marine creature break free from Sea World. Free Willy still makes me tear up. The second question you may ask is why these issues?
If you are remotely intelligent (and I know you are), you have realized that the problem is simply too much traffic for servers that are not able to handle the hits. The “real time” ability of Twitter along with the constant stream and checking of the site by users causes the server to become overcrowded quickly. Especially during peak times, such as the World Cup and presidential elections, Twitter has been bombarded with capacity mishaps (handling 3,000 messages per second is no easy task). Twitter does recognize this predicament and is working feverishly to fix the complications, however, this is not the end of the story.
During peak usage times, Twitter has been shutting off key features (such as trend tracking) in order to keep up with the traffic. Although this is helpful in curing the issue momentarily, it ultimately is restricting to the user. In doing so, Twitter has also uncovered bigger problems at hand. Twitter reports that they did expect capacity issues during the World Cup, but did not expect this to cause aggravation in fixing the system before and after the fact. More than just capacity issues, Twitter has discovered significant structural problems which they have addressed by doubling the internal network capacity.
This is not a long term solution and Twitter has been exploring for some time a better way of doing business. At Velocity 2010, Twitter engineer John Adams announced that Twitter’s main weapon against capacity problems is by using enhanced metrics. An advanced measurement system allows Twitter to graph out the weakest points in the system and adhere to these complications quickly. Adams reported that if Twitter does “fail” they rather “fail fast” than “fail slow”. For example, if the site is over capacity, Twitter wants to to see the “fail whale” quickly, so you can hit refresh and be redirected to another server in a short amount of time with as little annoyance as possible. Ultimately, Twitter would like to eliminate capacity issues in general which loyalists speculate that the company will achieve by purchasing their own data center in the near future.
As with any emerging company, growing too fast is sometimes much more dangerous than growing too slow. Twitter is growing very rapidly, but with a current 210 man team, global presence, and venture capital still sitting in the bank, I don’t think anyone should be too concerned. It is only a matter of time till Twitter rids itself of peak time capacity issues. One can only predict that this time is shortly.