5 things everyone should know about Hadoop

It wasn’t too long ago that Hadoop was a shiny new technology — familiar to large web companies but foreign (and fascinating) to everyone else. Things changed fast and Hadoop is now a billion-dollar IT market underpinning big data efforts by companies of all stripes. Mike Olson, co-founder and chief strategy officer (and former CEO) of Cloudera, came on the Structure Show podcast this week to tell us where Hadoop is now and where it’s headed.

Here are the highlights of that interview, but anyone interested in Hadoop — especially in how the underlying technologies will evolve — should listen to the whole thing. Hadoop market watchers will also want to attend our Structure Data conference next month in New York, which will feature interviews with three important CEOs: Tom Reilly of Cloudera, Rob Bearden of Hortonworks and Paul Maritz of Pivotal. Big data applications are advancing fast, and these execs will explain how their companies plan to keep up and win in the market as a result.

[soundcloud url=”” params=”color=ff5500&auto_play=false&hide_related=false&show_artwork=true” width=”100%” height=”166″ iframe=”true” /]

Download This Episode

Subscribe in iTunes

The Structure Show RSS Feed

Big data is no place for the weak

“If we had to identify the single defining characteristic of the [Hadoop] market this year and going forward, it’s that shift in the competitive dynamic,” Olson explained. “It’s no longer a band of hearty, wild-eyed visionaries, venture-backed companies battling for market share with one another, but really the entrance of large and well-capitalized companies with very large installed bases and very good field relations with those guys who are going to shape how we — Cloudera — does business and really are going to shape how the market develops over the coming seven years.”

The big companies he’s talking about: IBM, Microsoft, Pivotal (which spun out of EMC and VMware) and Oracle, among others.

There is no love lost among Hadoop vendors

Cloudera touts its product lineup as an “enterprise data hub” in order to distinguish it from competitive offerings from vendors such as Hortonworks and MapR. Here was Olson’s response when asked why Cloudera considers itself so different from those companies, which are also making advances around security, search and other capabilities:

“We deliver a production product to market today. And we are proud to say that we have been the first vendor to bring that stuff to market consistently. So lots of announcements happen, because I think our vision is right and is widely recognized as right. But the question is ‘Who’s driving the platform forward and who is making that innovation available to customers first?’ Yeah, other vendors are going to announce future availability of products, not yet in GA, and while they do that we continue to innovate in our own way.”

It’s worth noting that MapR has been pretty aggressive itself about adding new features, many of which are shipping. And Hortonworks, although somewhat more measured on the innovation front, is a close partner with many of the big companies (including Microsoft and Red Hat) that Olson says will help reshape the market. Also, Hadoop is a highly competitive space and the companies involved are quick to criticize their peers.

Structure Data 2012: Michael Olson – CEO, Cloudera
Mike Olson at Structure Data 2012. (c) Pinar Ozger
(c) 2012 Pinar Ozger [email protected]

At least part of the database market is safe

“[T]here’s really no answer in the Hadoop space today for the kind OLTP and very advanced workloads that run on the traditional larger databases,” Olson acknowledged, adding, “I don’t think that any of those vendors looks at the platform, Hadoop, or even the more-capable enterprise data hub that we bring to market right now, with that much trepidation.”

However, he said, it might be a different case for companies that make their money selling analytics software, especially as technologies such as Cloudera’s Impala and other SQL-on-Hadoop offerings mature: “[A]s we’ve driven real-time capabilities into the platform…some of the more traditional analytic database workloads get to move over pretty easily right now….I think that trend is going to continue, and that the query language language that the platform supports is going to get more interesting over time, and that more workload optimization is in front of us.”

Hadoop is coming to the mid-market

Olson said it’s true that most Hadoop adoption today is coming from technologically savvy web companies and large enterprises making big investments. But, he added, “That’s going to get better…it really is. As the platform matures and as the tools and applications that run on top of it get easier to use and more diverse, you’re not going to need to be a data scientist anymore to buy and use this platform. You’re going to just be able to get a shrink-wrapped application that solves your problem.

That’s “exactly what happened with relational databases back in the day. When nobody used SQL and there were no apps, man, you needed to be a genius,” he continued. “But that changed in a pretty fundamental way, and we expect that will happen in this market.”

He noted, though, that Cloudera sees mid-market customers benefiting more from that ecosystem of technology vendors and applications than from cloud-hosted Hadoop — at least if it’s the one providing it. “We started the business with a hosted offering. We discovered that our customers loved us to run their Hadoop, but in addition they wanted us to run their other data infrastructure,” Olson said. “…It turns out we’re really good at running the new scale-out Hadoop-based platform, but that other stuff — not our wheelhouse.”

Cloudera's Jeff Hammerbacher talking about making data science easier at Structure Data 2013. (c) Albert Chau
Cloudera’s Jeff Hammerbacher talking about making data science easier at Structure Data 2013. (c) Albert Chau

MapReduce will fade away as innovation flourishes

“I do believe that in time, the original implementation — disk-based, batch-mode MapReduce — will diminish in importance,” Olson said. “It’ll probably never go away, because there’s a bunch of installed base running on that and you don’t get to even retire your mainframes 50 years later, but if you think about where future workloads are going to be built, we think Spark is super interesting.”

Spark is a faster, easier, more efficient processing framework developed at the University of California, Berkeley, and currently being commercialized by a startup called Databricks. But it’s far from the only innovative thing happening in the big data world. Olson said his job is to keep an eye on what’s happening and to use good curatorial judgment in deciding which pieces to bring into the Cloudera platform and when.

“I believe that the most interesting data management work happening on the planet right now is happening in the consumer internet, in general, and at Google(s goog) in particular,” he said. “We watch very carefully what is happening at the big scale-out web properties as basically a prediction of what more traditional enterprises are going to want in the future….This has been for the first 25 years of my career a very lucrative career, but a pretty dull one. I would not claim that boredom is a problem for me today.”