Jumping on the big data bandwagon, cloud IaaS provider, GoGrid and managed hosting company, Sungard, have unveiled big data services based on Hadoop. They join Amazon Web Services, Microsoft, IBM and Mortar Data, all offering Hadoop as a service. These offerings are a good start at removing some of the hassle at the infrastructure layer when it comes to implementing Hadoop, but to be really useful to enterprises they need to integrate all the rich visualization and reporting capabilities at the top of the stack, where business users will see the most value.
In other words, making it easier to run Hadoop is helpful but not really the issue. The smart money is on the analytics that help companies draw conclusions from big data that in turn drive business decisions. What the market really needs is something like a Dropbox for big data BI.
Meanwhile, GoGrid’s offering, called GoGrid Big Data Solution, tackles the infrastructure layer, something AWS has been doing since April 2009 with its Elastic MapReduce service. GoGrid’s service comes as a package that includes pre-configured hardware to match the requirements for running Apache Hadoop, in this case Cloudera’s distribution of Hadoop (CDH).
The reason GoGrid’s bundle comes with pre-configured hardware is because, although Hadoop runs on commodity boxes, it’s fussy about the specification and configuration of these boxes. GoGrid followed the best practices from Cloudera, like making sure there is enough RAM in the NameNode machine to support large numbers of files. And there are other clustering techniques required to run Hadoop efficiently that will be unfamiliar to most IT pros. A recent GigaOM Pro survey of just over 300 enterprise IT decision makers found that 61 percent were investigating using a third party service provider for their big data needs. It’s not surprising given the fiddly nature of Hadoop.
So there’s certainly value in sorting out the infrastructure layer for customers, but I think that pretty quickly they’ll be asking for help in generating reports on the data inside Hadoop.
GoGrid isn’t the only one focusing exclusively on the infrastructure layer. Sungard is all about making Hadoop run well in the cloud. It’s working with the University of Texas on research around optimizing Hadoop to run atop a multitenant infrastructure, a tricky thing to do today as the NameNode in Hadoop is a centralized metadata repository that can constrain performance and creates a single point of failure.
It’s all good work, but my hunch is these early cloud-based Hadoop services will appeal to financial services and government that have pent up demand for big data solutions, but after that adoption will be slow and disappointing until service providers integrate the analytics piece on top.