Close

Hadoop, analytics continue Enterprise rites of passage

This week brings a pair of tuck-in acquisitions, a new release and a partnership that deliver fit, finish and further Enterprise-mainstreaming of Hadoop and predictive analytics. The week’s events also include an important milestone in post-MapReduce Hadoop.

Discount shopping
Let’s start with the acquisitions. Data warehousing veteran Teradata continued its metamorphosis into a full-fledged Big Data company by announcing Tuesday (with terms undisclosed) that it had acquired the assets of two Hadoop startups: Hadapt and Revelytix. Hadapt was arguably the pioneer in the SQL-on-Hadoop movement, and Revelytix was bravely pushing the concept of data governance and Enterprise Information Management (EIM) into the Hadoop world.

Hadapt sold what was essentially a Postgres-based massively-parallel processing (MPP) data warehouse product that overlaid physically onto a Hadoop cluster (typically one running MapR or Cloudera). The problem was, given that most other Hadoop and data warehousing vendors (including Teradata) have been pushing their own SQL-on-Hadoop solutions, Hadapt was put into a tight spot. The technology was still good though and had recently been extended to include the ability to query semi-structured data with SQL.

Revelytix’s claim to fame was a product called Loom, that provided for metadata management and data preparation on data in Hadoop. The two companies were converging BI technology (in Hadapt’s case) and rigor (in Revelytix’s case) into the Hadoop world. Maybe the Hadoop world wasn’t ready for it. But a company like Teradata, whose own transition parallels that convergence, can use the personnel and intellectual property of companies like Hadapt and Revelytix, and that’s exactly what they’re getting from the deals.

Teradata will merge Hadapt’s and Revelytix’s people and assets into its Labs unit and will likely benefit from doing so. The two deals were announced, somewhat quietly, on Tuesday though the transactions had in fact taken place – even more quietly – the week prior. Teradata was unwilling to provide a briefing on the deals, preferring instead to issue a press release. I’m not sure why Teradata is keeping such a low-profile on this deal, but we may learn more in due course.

Predicting to the chorus
In the world of predictive analytics, meanwhile, Maverick player Alpine Data Labs announced the release of Alpine Chorus 4.0 on Tuesday. Chorus, an open source project which at one time was an EMC product, provides for SharePoint-like collaboration between members of analytics teams. Alpine took over stewardship of the Chorus project and has integrated its Alpine Now visual predictive analytics product into the Chorus UI. The two products are now integrated, and consolidated under the Chorus name.

The combination of Chorus’ enterprise-like social sharing and collaborative communications may help bring predictive analytics into the Enterprise, and the visual UI on the predictive analytics side is more Enterprise-ready than some other platforms, which are essentially workbenches for highly-paid data analytics professionals.

There’s a lot more to the release than what I’ve discussed here. Alpine’s press release has the details.

MapR teams with IT services firm
Back on the Hadoop ranch: on Tuesday, Hadoop distribution provider MapR announced a partnership with one of India’s largest technology services firms, Tata Consultancy Services (TCS). TCS provides practices and professionals around virtually every development platform, database and enterprise technology genre imaginable.

That it is teaming with MapR to provide products like “TCS Sensor Data Analytics” and “TCS BigData Desktop” shows just how far Hadoop in the Enterprise has come, especially in the last six months. If global SI (system integration) firms do Hadoop, then so will their customers.

Tez goes top-level
And as Hadoop 2.0 and its independence from MapReduce becomes more evolved, solutions are shaking out around running Hadoop stack components, like Hive and Pig, on the Hadoop’s newest cluster management layer (YARN). On the one hand, Cloudera, Databricks, MapR, Intel and IBM announced they will engineer most Hadoop stack components to run atop Apache Spark. On the other, Apache Tez will eventually run many of these same workloads on YARN. You can read more about Spark and Tez in my recent article on gigaom.com.

Tez is backed chiefly by Hortonworks, whose engineering team built it to run on YARN (which it also developed). But Tez has been gaining industry endorsements, and support, from Hadoop-related projects (like Cascading and Pig). And on Tuesday, the Apache Software Foundation announced that Tez had graduated from “incubator” status to top-level project, achieving parity in that regard with Spark, which had achieved its own top-level project status about five months ago.

Enterprise or bust
Each of this week’s announcements (which, for some reason, were all on Tuesday) provides measureable evidence that Hadoop and predictive analytics are working their way into the workflows and procurement habits of Enterprise customers.  There’s no surprise in this; instead we see measurable evidence that the Enterprise-ation of analytics is proceeding as expected.