MapR is stepping up the feature set of its Hadoop software, announcing on Tuesday the addition of support for the YARN resource manager and the ability to run HP’s Vertica analytics software directly atop the MapR file system. The latter feature, in particular, is emblematic of MapR’s approach to keeping up with — or even passing in some areas — Hadoop mindshare (and presumably marketshare) leaders Cloudera and Hortonworks.
The addition of YARN support is important, but also something MapR had to do eventually. YARN is the resource management technology that became part of Apache Hadoop with its 2.0 release in 2012 (and which was just granted general availability status in 2013). YARN lets multiple computing frameworks run on the same Hadoop cluster using the same underlying storage. So, for example, a company could process data using MapReduce, a graph processing engine, Spark and MapR’s Drill SQL-on-Hadoop technology, all without operating four separate clusters.
However, MapR’s YARN support is unique. Aside from taking advantage of the various file system and other components that were (and it would argue, still are) MapR’s biggest point of distinction from other Hadoop distributions, MapR’s software allows users to run the Hadoop 1.0-based version they’ve been using for production workloads on the same cluster as the YARN version. The idea is to make the transition less risky by letting current jobs keep doing what they’re doing without disruption.
Building HP Vertica support directly into the MapR distribution, on the other hand, was unnecessary but potentially very smart. MapR has always placed the functionality of its products above a strict adherence to open source code, a decision that has fueled competitors’ criticisms but also had made some Hadoop users happy.
Presently, users are excited about running SQL queries on Hadoop data, and everyone under the sun is working on projects or products to let them. MapR is actually spearheading the Apache Drill project for this purpose, but seems to have decided there’s a big enough demand for running Vertica’s columnar analytic database atop Hadoop that it engineered this capability. Why not, if enough (or big enough) users are willing to pay for it?
None of this means that MapR is in an inherently better position than Cloudera or Hortonworks when it comes to selling Hadoop to large enterprises, but it does mean those CIOs do need to take it seriously as an option. All three of the companies are trying to do similar things, only using different technologies in certain areas, presenting different messages and seemingly employing different strategies to win deals. It’s also a very contentious space as the companies involved fight for billions of dollars from IT budgets.
I suspect attendees of our Structure Data conference in March will get a sense of these differences in strategy, and opinion, during sessions with the CEOs of both Cloudera and Hortonworks. But in the meantime you can just check out this post from December — and the comments — that highlight how each vendor views its chances compared with the rest.
Feature image courtesy of Shutterstock user Graeme Shannon.