
Hadoop, the open-source file system and MapReduce implementation for massive-scale data, was the talk of the conference Wednesday at our Structure Big Data conference in New York. From new Hadoop distributions to end-customers’ plans, Hadoop was all anyone could talk about. One of the companies whose name crept up in conversations was a stealth-mode company called Mapr, which is building a proprietary version of Hadoop and is likely to launch later this year.
Mapr, based in Saratoga San Jose, Calif., has been in the works for nearly two years. The Securities and Exchange Commission filings show the company has raised about $9 million in funding from Barry Eggers of Lightspeed Venture Partners and Peter Sonsini of the New Enterprise Associates. On its web site, the company says it’s “engineering game changing Map/Reduce related technologies.” Its ambitions aren’t limited by that somewhat ambiguous statement.
People Behind Mapr:
- M.C.
SrinivasSrivas, an ex-Googler (s goog) is the founder and CTO of the company. - John Schroeder, formerly of Lightspeed VC and former CEO of Calista Technologies (acquired by Microsoft (s msft)) and Rainfinity (acquired by EMC (s emc)) is the CEO and co-founder of Mapr.
- The company has close to 30 employees, many of them based in India.
- Ted Dunning, chief scientist at Site Tuner and Veoh Networks, is the chief application architect at Mapr. He created the recommendation engine for Musicmatch, a music service that was popular before iTunes (s aapl) came on the scene. He is also one of the key guys behind the Apache Mahout data-mining project.
What Is Mapr Doing?
They are said to be building a proprietary replacement for the Hadoop Distributed File System that’s allegedly three times faster than the current open-source version. It comes with snapshots and no NameNode single point of failure (SPOF), and is supposed to be API-compatible with HDFS, so it can be a drop-in replacement.
The Road Ahead
Mapr might have an edge over Apache Hadoop in the interim, but Apache is working to improve the HDFS architecture in its distribution, and should have its own snapshot feature sometime in 2012. Also, Appistry sells a NameNode-free HDFS alternative based on its distributed CloudIQ Storage offering. As for the speed advantage, I don’t have any details for now, but if you have some thoughts, please share them with us.
On a broader canvas, I think Mapr is up against a whole lot of major competitors. Cloudera has a lead in the commercial market place, and the Apache Hadoop distribution on which it’s based keeps improving thanks to upgrades from contributors like Facebook and Yahoo (s yhoo). Apache Hadoop companies more control over their data, as they are not at all held hostage by a vendor, and surveys and anecdotal evidence alike suggest that Apache Hadoop is still the most widely-used version.