VMware aims for Hadoop on VMs with ‘Serengeti’ project

VMware (s vmw) is launching a new open-source project, called “Serengeti,” that aims to let the Hadoop data-processing platform run on the virtualization leader’s vSphere hypervisor. The company apparently smells a lucrative opportunity with growing enterprise interest in the Hadoop data-processing platform, and is not about to miss out on it. Serengeti is just one of several moves VMware has made lately to make big data and virtualization software play nice together.

The company explained the thinking behind Serengeti in a press release:

By decoupling Apache Hadoop nodes from the underlying physical infrastructure, VMware can bring the benefits of cloud infrastructure – rapid deployment, high-availability, optimal resource utilization, elasticity, and secure multi-tenancy – to Hadoop.

That sounds great — and all those features represent current shortcomings for most Hadoop distributions — but there are some significant limitations to running Hadoop on virtual resources (this tutorial from Apache’s Hadoop Wiki lays out the pros and cons as they currently stand).

Hence Serengeti: If VMware can make Hadoop on VMs a legitimate option for companies that prefer to run applications on virtual resources for all the reasons VMware suggests, those companies get their way and VMware isn’t shut out of licensing revenue on potentially large Hadoop clusters.

Serengeti is just VMware’s latest attempt to secure a piece of the Hadoop market, though. It launched its Spring Hadoop project in February to help developers write big data applications using Spring, and in April it bought big data startup Cetas. On Tuesday, VMware rolled out a reference architecture (along with Hortonworks) for making Hadoop highly available by running the NameNode and JobTracker services on VMs.

The message from VMware seems clear enough. It’s nearing ubiquity in corporate data centers, and Hadoop is set to become a neighbor in many of those deployments. The two platforms can either co-exist and both do fine on their own, or work together and thrive like the ecosystem after which VMware’s project is named.

Image courtesy of Shutterstock user Putut.