Why the cloud is the data, and the data is the cloud

Structure Data 2014
Structure Data 2014

I attended the Gigaom Structure Data Conference held this week in NYC.  Being a huge database geek from way back, it’s one of the most interesting events of the year.  Gigaom can compress a week’s worth of presentations into just two days, so the content is fast and interesting.

You might be asking yourself, why is the cloud guy writing about the evolution of databases?  Databases and clouds are now tightly coupled, in case you haven’t noticed.

The show had the normal suspects with purpose-built, typically big data-oriented systems, including guys like Hortonworks and Cloudera.  The idea is that the database is changing.  The days of big SQL databases are rapidly coming to an end, at least for new systems development.  Today’s databases need to be big, a petabyte or more at minimum, and scale to performance levels that were science fiction just a few years ago.

How did we get to this point?

First, we have the use of easy-to-provision and auto-scaling virtual machines that provide a platform for widely distributed “share nothing” database operations.  This provides a divide-and-conquer approach to gathering data from both structured and unstructured sources.  It’s the ‘secret sauce’ behind the newfound “Hadoop-y” speed that was really not there in the world of traditional relational databases.

Second, we have the ability to deliver data using data services that combine behavior and information.  This places the database operations behind a well-defined API or service.  For the most part, these are simple services that act very much like a traditional database query, or just produce data as requested from a single data source.

However, these cloud-based data services, or, cloud APIs, are becoming complex.  They can mash up data from multiple sources and externalize that data using a single interface.  Thus, you may be able to ask a single question about the existing state of the company and have a service that considers data in hundreds, perhaps thousands of databases, using up-to-date operational data, to come back with a single meaningful answer.

Furthermore, since it is an API, we can embed this service within any business application or process where automated decisions need to be made.  Perhaps you want to check on the future production figures using massive amounts of distributed data and predictive analytics to optimize the management of suppliers.  Or, perhaps do a quick check on the current risk exposure around an insurance application, and automate the path it takes within a workflow.  When using this approach, applications have access to almost perfect information, instantly, often leveraging cloud-based APIs that can access complex and large datasets in any number of ways to support any number of business applications.

The database really is the killer application for the cloud.  If we eliminated the need to store and access data, either within traditional object stores or a true database, would the cloud see the same level of interest it enjoys today?

The growth of cloud-based databases in support of these types of operations is currently exploding.  No matter if it’s columnar, SQL, or no-SQL, many types of databases using many types of architectures are ending up in the cloud.  The reasons are certainly economics.  However, the most compelling reason is that the cloud provides an enterprise with the ability to change, or agility, as well as the ability to scale.

In response to the use of public and private clouds as database platforms, those who provide the platforms are optimizing their clouds to support databases that optimize for a specific technology, such as Mongo or Couch, or optimize storage systems to provide high performance I/O.  IaaS with PaaS clouds, such as AWS, offer their own database services, such as Red Shift and RDS.  What’s more, PaaS with IaaS-oriented clouds, such as those from Google and Microsoft, offer their own cloud-based databases as well, such as Google’s Cloud SQL and Cloud Datastore, or Microsoft’s SQL database.

The links between clouds and databases are tightly coupled, and will only become tighter as time marches on.  Cloud providers that adapt to this trend support compelling database technology, as well as optimize their platforms in support of the database.  I suspect that will trend continue for some time.