How to manage data complexity in the cloud without killing anyone
The current patterns of cloud migration include simple “lift and shift,” which attempts to move applications and data with as little work as possible, refactoring or redoing of the applications and data so that they work more efficiently on a cloud-based platform, and rewriting the application from primitives on up so the data and the applications work and play better on the cloud platform.
While each approach has its own advantages and disadvantages, typically cost versus efficiency, the path to migrate both applications and data to cloud-based platforms, as well as the changing nature of data itself, leads enterprises to an the problem of how to deal with increasing data complexity. Indeed, data complexity, not application migration and development to and on cloud computing platforms, could be the largest impediment to success in the cloud.
The reasons for the rising data complexity issues are fairly well known, including:
- The explosive growth of data (at least 50 percent year over year growth), and thus the need to distribute data on various storage systems inside and outside of the enterprise. In some instances, the data is tiered, or stored on different storage systems, depending on the type and the age of the data.
- The rising use of unstructured data that doesn’t have native schemas. Schemas are typically defined at access, and there is rarely a pre-determined order as to how the data exists in storage. This data includes documents, audio and video files, and other binary data, anything that may have informational value, but does not exist in a traditional database.
- The rising use of streaming data that many businesses employ to gather information as it happens, and process it in flight.
- The rise of devices that spin off massive amounts of data, such as those defined around the concept of the Internet of Things, or IoT. This data is typically streaming and unstructured, and needs to be acted upon as it’s produced from the device, or stored and analyzed for trending information.
- The rising use of analytical databases, such as in-memory databases that can process a tremendous amount of data at an extremely high rate of speed.
- The rise of Hadoop, and products based on Hadoop. Enough said.
- The changing nature of transactional databases, moving to NoSQL and other non-relational models. However, at the same time, enterprises are not displacing their existing legacy relational databases.
- The continued practice of binding single purpose databases to applications. Those who design systems typically employ the best database for the job, for a specific application or applications, and do not use the existing databases that are already deployed and in production. As a result, the number of databases continues to increase within enterprises.
- Finally, and most importantly, the rise of as-a-service cloud-based and cloud-only databases, such as those now offered by Google, Microsoft, and AWS, that are emerging as the preferred databases for applications built both inside and outside of the public clouds.
So, those who must manage the rise of both cloud and new database technology within enterprises, and thus the growth of data and data complexity, are about give up their control of enterprise data. Indeed, the work ongoing seems to be more about just keeping up, rather than getting ahead of the data complexity, which rolls over to data management and data governance issues.
The core issue is to move toward application architectures that decouple the database from the applications, or even toward collections of services, so you can deal with the data at another layer of abstraction. The use of abstraction is not new, but we haven’t had the required capabilities until the last several years, such as the ability to provide master data management (MDM), data service enablement, and the ability to deal with the physical databases using a configuration mechanism that can place volatility and complexity into a single domain. Not in the applications, the data, or all points in-between.
As you can see in the figure below, what I suggest is that you create an architecture to deal with the database complexity issue. This architecture will deal with many different types of physical databases, both unstructured and structured, sometimes leveraging abstraction (or virtual databases), and sometimes direct reads and writes for database access.
The use of virtual databases, which are a feature of database middleware services that are provided by technology suppliers such as Red Hat and Informatica, serve to drive a configurable structure and management layer over existing physical databases, if that is indeed in the requirements. This means that you can alter the way the databases are accessed, creating common access mechanisms that are changeable within the middleware and do not require risky and expensive changes to the underlying physical database.
Moving up the stack, we have data orchestration and data management that, again, provide those charged with managing enterprise data in the cloud or local, with the ability to provide services such as MDM, recovery, access management, performance management, etc., as core services that exist on top of the physical or virtual databases.
Moving up to the next layer, we have the externalization and management of core data services or microservices. These are managed, governed, and secured under common governance and security layers that can track, provision, control, and provide access to any number of requesting applications or users.
The use of services means that we control access to the underlying data stores (at the bottom of the diagram), as well as bind some behaviors to the use of data. An example would be to call out to an external credit check service to validate data coming from the physical database, produced from a data service. The applications merely consume the data using services, but have the option of going directly to the physical database, if there is a requirement to do so. Moreover, security and governance are considered systemic to every part of this structure, and each layer, in part or in whole, can run locally within the enterprise or on a public cloud service. The platform selection goes to both cost and functionality, but it’s a safe assumption that public cloud-based platforms will play an increasing role in these architectures.
While this document is not completely inclusive, it does demonstrate the use of the types of architectures needed to deal with the ongoing rise of data complexity. Your technology and approaches will vary somewhat, but the underlying patterns are the same.
The shame of this trend is that most enterprises are ignoring the rapid rise in data, as well as data complexity, and are hoping that something magical occurs, such as standards, that will solve the problem for them. Unfortunately, that won’t happen. Left unmanaged, things will only get worse, and even limit the value that core business systems will bring to the enterprise.
More proactive enterprises will find that these efforts pay a great deal of dividends. But, you’ll have so spend some money now to see any value this year. I figure that, for every dollar spent on dealing with data complexity, you’ll get 30 to 40 dollars back by 2016. That’s a good deal.
