Hadoop Kills Zombies Too! Is There Anything It Can’t Solve?

Using Hadoop to analyze big data for delivering better ad metrics or to personalize web sites is pretty familiar, but an Atlanta-based company called ipTrust is using it to target botnets. The service uses Hadoop’s ability to process trillions of log files per minute to identify IP addresses that might be doing a botnet’s bidding.

Internet botnets wreak much of their havoc because of the sheer number of infected machines doing running their workloads. IpTrust is trying to fight fire with fire by using many terabytes of data, stored within Hadoop and Cassandra, to combat these hordes of zombie PCs in real time. By analyzing security-event information in order to determine “reputation scores” for the world’s IP addresses, the company says it can make existing security products — such as web firewalls — more intelligent by giving them the knowledge to restrict activity on a machine-by-machine basis as those transactions hit the network.

IpTrust is a cloud service that delivers information to companies’ existing security software and appliances. According to Dan Ingevaldson, ipTrust founder and SVP of products, the service works its magic by storing trillions of events per minute from IP addresses across the globe within its Cassandra NoSQL database running atop the Hadoop Distributed File System, then processing that information using Hadoop MapReduce.

Somewhat like social networks such as Twitter use social graphs to determine everyone to whom a particular user is connected, ipTrust determines relationships between various IP addresses, log files and event types, and gives each address a reputation score between zero and one. Zero equals no risk or unknown risk and one means the machine attached to an IP address is infected. Both IBM (s ibm) and HP (s hpq) use ipTrust within their families of security products.

Ingevaldson said ipTrust gathers terabytes of new data per day. The data is then reduced via a MapReduce workflow and stored in the Cassandra cluster. An end user’s security software makes an API call when a new IP address comes knocking at the firewall, and ipTrust delivers a reputation score in real time from its Cassandra cluster. According to Ingevaldson, ipTrust stores all the data it has collected, not just the reduced data, because of the typical scenario of trying to solve botnet attacks. He said forensic investigators might start working on cases a month after they happen, so ipTrust wants to be able to provide investigators with all the data possible from the day in question in order to help determine what happened.

Ultimately, like all uses of big data tools, the goal of ipTrust is to make users more intelligent by providing them with more-granular control. For example, Ingevaldson said it’s known much botnet activity comes from infected machines in China, and some corporate firewalls will just block all traffic to and from China. When security personnel can boil down the threats to specific ISPs or IP addresses, though, they can streamline their cordons without closing off an entire country or region. Ingevaldson and the ipTrust team have some knowledge of the difficulty that traditional security product have in being intelligent at this low a level, as he and much of the management team were executives at intrusion-protection vendor Internet Security Systems, which IBM bought for $1.4 billion in 2006.

Going forward, Ingevaldson thinks big data will become even more important in battling botnets, because they’re getting more numerous and more dangerous. Large ones like the Zeus trojan and the Conficker worm get lots of attention, he explained, but there are many more coming online all the time. Many of those, he said, are Zeus-like malware designed for login harvesting or more-targeted attacks such as industrial espionage. Ingevaldson analogizes the increased uptick in botnet on the Internet to how the human body has a natural resistance to certain levels of known bacteria and viruses, but gets ill when levels get too high or when new strains get introduced. As botnet activity picks up, security software will require the analysis of even more data to determine where new threats are coming from.

ipTrust is a division of Endgame Systems, which raised $29 million from Bessemer Ventures, Columbia Capital, Kleiner Perkins Caufield & Byers and TechOperators in association with the ipTrust launch in October 2010.