When the Microsoft-Yahoo search deal became a reality this summer, it looked like the era of Yahoo-led Hadoop innovation might come to an end. When Hadoop creator Doug Cutting left the web giant for startup Cloudera 11 days later, the inevitability of Yahoo’s Hadoop exit seemed certain. Not so fast.
As the MapReduce world converged on New York late this week, Senior VP of Cloud Computing Shelton Shugar has been telling anybody who will listen that Yahoo is fully committed to continuing its work with Hadoop. In fact, he says, with search soon to be under Microsoft’s control, Hadoop innovation actually might increase across Yahoo’s dozens of other services. Among the other services running on the company’s 25,000-server cloud are operational storage, online serving, large-scale caching and targeted content – and most use Hadoop to some degree. Word from a lead Hadoop developer at Yahoo is that the company is moving as much of its analytics work as is possible to Hadoop, with the goal being for Hadoop to manage tens of petabytes of data by about a year from now. Much of the work being done within Yahoo’s Hadoop ranks will be open-sourced, including its forthcoming Pig SQL programming language.
And with Cutting joining fellow ex-Yahooer Amr Awadallah at Cloudera, Yahoo’s numerous projects actually have their best-ever chances of evolving into business-ready products. Like Red Hat for Linux, Cloudera exists to make Hadoop easier to use for mainstream companies, who might not have the development resources to fully leverage the project versions. Thus far, Cloudera has been focused on making its Hadoop distribution widely available (including for both EC2 users and vSphere users), but it will have to release more enterprise-ready tools if it wants to grow. Cloudera’s in-house (and in-depth) knowledge of Yahoo’s Hadoop projects should make working on them much easier and much more appealing, and the resulting products could take Cloudera to the next level.
It might want to hurry up, too, because Hadoop innovation is not slowing down outside Cloudera’s walls. Yes, the company took advantage of the spotlight at its inaugural Hadoop Summit by releasing its Cloudera Desktop tool, but it was not alone (in fact, Karmasphere released its own desktop-based development tool). Others getting into the Hadoop product-release spirit this week were Aster Data Systems, Amazon Web Services and, a bit surprisingly, IBM, which announced its M2 enterprise data analysis platform.