Cloudera’s plan to become the center of your data universe

Cloudera wants to be bigger than Hadoop so bad it created a new term for what it’s selling — the enterprise data hub. Think about it like a cross between the enterprise data warehouse and the so-called “data lake” or “data reservoir.” Also think about it as Cloudera’s plan to transcend Hadoop and become a new kind of data-management bigwig.

When Cloudera CEO Tom Reilly told me last week he considers IBM and Pivotal to be his competitors more than Hortonworks and MapR, this is what he was talking about. The former class of companies, which arguably also includes the likes of Oracle and HP, can sell customers everything from storage up to, in some cases, the analytics software to analyze their data. They certainly can cover customers anywhere along the database or data warehouse front.

And now, Reilly said, “We believe the EDH is going to become the center of most enterprise’s data architectures.”

He wants to replace significant portions of companies’ data infrastructure with Cloudera’s platform. The idea is that cheap, high-volume storage bolstered by technologies around analysis, security and discovery is good enough for most enterprise data. If customers really need to put stuff into a data warehouse or some other specialized system, they can do so easily enough thanks to integrations with most of those systems.


One could think of the enterprise data hub as a smartphone next to the DLSR camera that is an enterprise data warehouse, as in “(a DSLR is) really good at taking pictures, but the only thing it can do is take pictures,” Cloudera co-founder and CTO Amr Awadallah explained during a separate interview, noting that it’s also quite expensive. An enterprise data hub is more like a smartphone, he added: “A unified integrated experience that can take pretty good pictures.”

“Essentially, we’re saying there is a new industry now being formed,” Awadallah said.

Technically, though, the move away from being a Hadoop vendor and into becoming an EDH vendor is more about messaging than technology. Awadallah compares the old approach of bragging about the software versions in a big data vendor’s Hadoop distribution to a tablet maker bragging about the specs in its new device. Upgraded components aren’t the destination, they just enable the platform to do new things.

“It’s the same thing with Apple and its new announcements (focusing on things like the 64-bit processor),” he said. “My mom is like, ‘What does that mean? What does that mean for me?'”

Structure Europe 2012 Amr Awadallah Cloudera Barry Morris NuoDB
Amr Awadallah (center) at Structure Europe 2012. (c) JULIADEBOER PHOTOGRAPHY

The new messaging is all about describing what the new technologies in Cloudera Enterprise 5 — the latest edition of the company’s whole-hog software-and-services suite — mean, rather than what they are. Navigator, Data Manager and Sentry, for example, mean better data discovery and data lineage, easier deployment of applications or tools on top of Hadoop, and finer-grained security. YARN combined with Cloudera Manager means better and more-efficient resource management.

As thorough as the Cloudera’s vision might be from a data-management perspective, though, Reilly is holding strong on his predecessor Mike Olson’s stance that Cloudera should stick to the platform layer and avoid creating its own applications. He believes that companies interested in buying open source technology are also interested in the freedom to have the best technologies rather than purchasing a top-to-bottom vendor stack that a vendor like IBM might sell.

“I think our opportunity is big enough that we are better off working with partners on that than building the end applications ourselves,” Reilly said. “… I think it would be a huge mistake, today.”

Feature image courtesy of Shutterstock user for you design.