In just the last week, the Apache Software Foundation (ASF) has announced important, if somewhat subtle, new releases of Hadoop and two affiliated projects. Individually, the announcements are a bit of a study in minutiae; together, they illustrate Hadoop’s continued trajectory towards Enterprise adoption.
Second (and a half) time around
First off, Hadoop itself has now reached its version 2.5 release. The Hadoop project leadership classifies this as a “minor” release, but also explains that designation means the release is backwards compatible with any applications built for its predecessor (v2.4.1). In other words, “minor” releases introduce no breaking changes, whereas major releases may.
Hadoop 2.5 does, in fact, introduce important new capabilities, several of them file-system related. First, Hadoop 2.5 includes a fairly detailed spec for the Hadoop Compatible File System (HCFS). HCFS is a set of APIs that conventional file systems can implement to emulate the Hadoop Distributed File System (HDFS). Allowing this kind of emulation allows integration of Hadoop with existing infrastructure which, at the risk of stating the obvious, is a key enabler to broader Enterprise adoption of the Hadoop platform.
Hadoop 2.5 also adds extended file system attributes, allowing arbitrary key-value pairs to be attached to a file, that the file system itself will ignore but which can store meta data that other components and engines can use. The new Hadoop release also brings improvements to WebHDFS, the Web service layer on top of Hadoop’s file system. While even mentioning these features can make for dry reading, the key takeaway is that they add rigor and maturity to HDFS, which provides cheap storage, drives the “data lake” approach and thus helps drive Hadoop adoption.
Sqoop and Sentry
On the affiliated project side, Apache Sqoop has been updated to version 1.4.5. Sqoop provides for bulk import/export connectivity between relational database tables and HDFS files. Sqoop 1.4.5 is the project’s fourth release as a top-level ASF project and adds greater support for IBM Netezza and Oracle, as well as compatibility updates for the latest versions of Apache HCatalog and Hive.
Apache Sentry, an incubating project which provides a role-based access control layer for Apache Hive and for Cloudera’s SQL-on-Hadoop engine, Impala, released its version 1.4. While it doesn’t add any major features, Sentry 1.4’s change log indicates that it implements 10 minor ones in addition to resolving 116 bugs, and adding 8 improvements. As we at Gigaom have covered previously, security-related projects are important to Hadoop. New releases provide important forward momentum.
Boring is good
Taken together, these releases, which have occurred in a single 8-day period, underscore Hadoop’s continually evolving Enterprise-readiness. The kind of down-and-dirty fit and finish these new releases bring is exactly what’s needed to make Hadoop into the hardened platform it needs to be. It’s not until the boring bits are tended to that technologies are ready for production in the Enterprise.