In the world of NoSQL databases, the products that have dominated the conversation are MongoDB and DataStax Enterprise, a leading distribution of Apache Cassandra. But a couple of headlines this week bring into focus a perhaps less-splashy, though rather tenacious player: Apache HBase, which is included with most major Hadoop distributions.
The important stories? The seven-year old MongoDB named its third CEO, and HBase-focused startup Splice Machine received $3M in new funding. There’s nothing in either of these developments, on their own, or even in combination, that proves HBase is gaining ground on MongoDB. After all, outgoing MongoDB CEO Max Schireson attributes his stepping down to the personal toll of travel between the company’s dual headquarters in Palo Alto and New York, and other demands of the job.
But the occurrence of these two news items in the same week, at the very least, provides food for thought around the NoSQL scene.
MongoDB’s fast growth has seemingly introduced growing pains, not only managerially, but also perhaps technologically. I’m hearing more often from developer and industry friends – anecdotally, to be sure – that Mongo has been letting them down in situations of large scale, be it in cluster size or data ingestion volumes.
When the other shoe drops in those conversations, it’s DataStax and Cassandra that are usually presented as the counterpoint. This tends to leave HBase out of the conversation.
But HBase’s momentum is growing, and that has little to do with any growth issues over at MongoDB. While HBase may not have a corporate champion behind it the way Mongo and Cassandra do, it has a lot going for it:
- HBase, as part of Hadoop, has incumbent status. Its tables are Hadoop Distributed File System (HDFS) files, which means it can process data from, or output data to, other Hadoop workloads, or it can work on its own.
- Apache Hive can be used to query data in HBase, providing a SQL interface to the NoSQL database
- MapR has long been promoting the use of HBase for operational applications. The company’s customized read/write version of HDFS helps there, and a C++ based, HBase-compatible database in the company’s M7 Hadoop distribution is especially designed for operational workloads
- Continuuity’s Reactor product provides a developer platform designed around the combination of Hadoop and HBase
- Apache Knox, Hortonworks XA Secure and Zettaset Orchestrator all provide security services around HBase data
- Microsoft (the company behind leading relational database SQL Server) is now offering cloud-based clusters, specially configured for HBase, as a preview in its Azure HDInsight cloud Hadoop service. In this implementation HBase works atop Azure blob storage
- As mentioned above, Splice Machine has successfully raised new funding for its HBase offering which, interestingly, is a relational database. This demonstrates, at least to a point, the versatility of HBase as scale-out database infrastructure, that need not limit its use to NoSQL applications
Enough to go around
The interesting thing about HBase, made especially clear by the Microsoft and Splice Machine developments, is that it’s a NoSQL database that augments other data technologies well. HBase’s success isn’t about zero-sum competition and displacement, and it’s not about any one company’s industry prowess.
HBase’s success looks to be about utility and standards. It’s also about HBase’s versatility to work as a standalone database that is nonetheless compatible with other Hadoop technologies and the growing interest in the “data lake” architecture. Keep an eye out for HBase’s continued momentum.