This past week in the Big Data world, though perhaps less news-laden than last week, has nonetheless brought with it more funding announcements and more new releases, with Apache Spark as the nearly-common thread.
Riding on the opening day of Spark Summit in San Francisco, Databricks, the commercial entity founded by the creators Apache Spark, the distributed, in-memory data processing engine, announced $33M in Series B funding. Led by New Enterprise Associates, with a lesser role played by Andreessen Horowitz (Databricks’ initial funder), the company will likely use the money to expand, rapidly. Databricks also announced a new Spark-based product offering, which we’ll discuss shortly. Details on both matters are covered nicely by Gigaom’s own Derrick Harris.
The other prominent funding round this week, also covered by Harris, consisted of $110M that went to Hadoop vendor MapR. $80M of that sum came in the form of venture capital from new investors Google Capital and Qualcomm Ventures and with participation from existing investors Lightspeed Venture Partners, Mayfield Fund, NEA and Redpoint Ventures. The other $30M was debt financing provided by Silicon Valley Bank.
Guavas and DataStax
It should be noted that MapR’s distribution of Hadoop includes Spark. Many other companies support Spark, and additional ones are in the pipeline. Two companies who just moved into the “currently support Spark” column are Guavas, with its release of Reflex 2.0 and DataStax, with its DataStax Enterprise 4.5 release, both announced Monday at Spark Summit.
Guavas Reflex is a streaming data operational intelligence platform built on Hadoop. The 2.0 release adds explicit support for Spark and YARN (a component of Hadoop 2.0 that I’ve discussed in recent posts).
DataStax is the dominant commercial entity behind Apache Cassandra, a wide column store NoSQL database patterned after Google’s Bigtable and Amazon’s Dynamo. This new release also adds support for Spark, and includes automated diagnostic and performance tuning, and enhanced visual management.
Meanwhile, the folks at Actian have some news as well. The company began as Ingres Corporation, when that foundational relational database product was spun off by Computer Associates. It then acquired several firms including Vectorwise, Versant, ParAccel and Pervasive Software. That gives the company quite a portfolio of products.
That portfolio was broadened in early June when the company announced the new Hadoop SQL Edition of the Actian Analytics Platform product at Hadoop Summit. Hadoop SQL Edition seems to be an adaptation of Actian Vector (derived from the Vectorwise acquisition), running in Hadoop, over data stored in HDFS. Such an approach is similar to that of Pivotal’s HAWQ product in its Pivotal HD Hadoop distribution, which also runs on Hadoop and is derived from technology present in Greenplum, the company’s MPP data warehouse.
The comparison of Actian to Pivotal doesn’t end with SQL-on-Hadoop solutions. Just as Pivotal added an all-you-can-eat license this year, now so too has Actian. Actian announced its “all-in-one platform pricing” license’s availability on Monday. Specifically, customers can opt for capacity-based or subscription models, and a “Right-to-Deploy” option that allows unlimited consumption of Actian’s technologies “over a period of one, two or three years and the right to continue using what has been deployed in perpetuity.”
We end where we began: with Databricks. In addition to new funding, the company announced a new Spark-based offering, called Databricks Cloud. The product offers a cloud-based platform that includes Hadoop with Spark as well as tooling for the analysis of data.
By offering such an integrated suite as a service, Databricks is hoping that fast, simplified provisioning of Hadoop, along with the in-memory performance of Spark, will make the Hadoop + Spark combo more accessible to power users who are less than passionate about infrastructure and cluster design.
Finger to the wind
Yes, in-memory darling Spark continues its rapid growth of popularity and ubiquity. But, as this week’s funding and new releases show, disk-based Hadoop is no less a contender, and a range of SQL technologies are doing well and increasing in momentum. Will one be supreme? The likelier scenario is that the three will co-exist, perhaps as different layers in a unified stack. Or several such stacks.
This post was updated on July 3rd, 2014 to cite Amazon Dynamo (in addition to Google Bigtable) as an inspiration for Apache Cassandra