Survey: Hadoop is Great, but Challenges Remain

Source: Karmasphere

From the I-was-going-to-conduct-this-research-but-someone-beat-me-to-it department, commercial Hadoop startup Karmasphere today released the results of a survey (PDF) of 102 Hadoop developers regarding adoption, use and future plans. The results provide some interesting insights into how Hadoop grows within organizations and underscore its status as an extremely valuable, but none-too-simple analytics tool. Of course, the latter characterization is why ISVs like Karmasphere, Cloudera and Datameer exist: to make millions by reducing the Hadoop learning curve.

Among the key results:

  • Sixty-eight percent of deployments begin as skunkworks projects, with 86 percent advancing to active development or production environments within a year.
  • The top three reasons for using Hadoop are data mining for business intelligence (19 percent), lowering the cost of data analysis (15 percent) and performing log analysis (13 percent), although uses like ETL (11 percent), scientific research (10 percent) and better utilizing unstructured data (9 percent) aren’t far behind. The longer organizations use Hadoop, the more valuable they find it and the more uses they find for it.
  • The number of Hadoop developers looks to rise by between 50 and 60 percent within the next year.
  • Java is the dominant language (86 percent), with Pig and Hive sharing the No. 2 spot at 44 percent each (multiple responses were allowed).
  • The steep learning curve (44 percent) and hiring qualified people (34 percent) top the list of general challenges, while debugging Hadoop jobs (63 percent) and monitoring Hadoop jobs (47 percent) top the list of programming challenges. Seventy percent of respondents feel that these challenges will have a major-to-moderate effect on growing or expediting their Hadoop deployments.

Based on what I’ve seen and heard about Hadoop, these numbers seem accurate. They’re also the reasons why the above-mentioned startups are receiving a lot of attention from all types of users and vendors, and why the ecosystem of commercial products supporting Hadoop keeps on growing. Hadoop is arguably the most mature tool for analyzing large volumes of unstructured data, and these numbers suggest it’s also the most capable. When commercial products evolve enough to mitigate the learning curve and overall lack of skills, watch out.

Related content from GigaOM Pro (sub req’d):