How Facebook keeps 100 petabytes of Hadoop data online

It’s no secret that Facebook (s fb) stores a lot of data — 100 petabytes, in fact — in Hadoop, but how it keeps that data available whenever it needs it isn’t necessarily common knowledge. Today at the Hadoop Summit, however, Facebook Engineer Andrew Ryan highlighted that solution, which Facebook calls AvatarNode. (I’m at Hadoop Summit, but didn’t attend Ryan’s talk; thankfully, he also summarized it in a blog post.)

For those unfamiliar with the availability problem Facebook solved with AvatarNode, here’s the 10,000-foot explanation: The NameNode service in Hadoop’s architecture handles all metadata operations with the Hadoop Distributed File System, but it also just runs on a single node. If that node goes down, so does, for all intents and purposes, Hadoop because nothing that relies on HDFS will run properly.

As Ryan explains, Facebook began building AvatarNode about two years ago (hence its James Cameron-inspired name) and it’s now in production. Put simply, AvatarNode replaces the NameNode with a two-node architecture in which one acts as a standby version if the other goes down. Currently, the failover process is manual but, Ryan writes, “we’re working to improve AvatarNode further and integrate it with a general high-availability framework that will permit unattended, automated, and safe failover.”

AvatarNode isn’t a panacea for Hadoop availability, however. Ryan notes that only 10 percent of Facebook’s unplanned downtime would have been preventable with AvatarNode in place, but the architecture will allow Facebook to eliminate an estimated 50 percent of future planned downtime.

Facebook isn’t the only company to solve this problem, by the way. Appistry (which has since changed its business focus) released a fully distributed file system a couple years ago, and MapR’s Hadoop distribution also provides a highly available file system. In Apache Hadoop version 2.0, which underpins the latest version of Cloudera’s distribution, the NameNode is also eliminated as a single point of failure.