A number of issues hinder the movement to cloud computing, including the perception that cloud computing lacks security, the NSA scandal, even the need to see and hug your servers from time-to-time. However, the issue that truly hinders cloud computing is the lack of understanding about data integration approaches and technology that are applicable for use within cloud-based applications and data.
Here is the dilemma. The best bang for the cloud computing buck is the use of public cloud resources, such as AWS, Rackspace, IBM, Microsoft… you get the idea. However, to get the true value of public cloud computing, you have to move some of your corporate data to the public cloud to take advantage of the cheap storage and per-drink rental of compute cycles.
However, when you do move a smidgen or more of data to the public cloud, you quickly understand that some sort of synchronization needs to exist back to on-premise enterprise systems. Else, we’re rekeying data, sending USB drives via overnight delivery, or other ugly approaches to data movement that is more prevalent than many think.
While some enterprises see these issues coming, many more won’t deal with these data integration needs until there is a well-understood requirement. This typically comes in the form of those who leverage cloud-based applications not having access to core data, such as customer and sales information, from systems that reside within the firewall.
So, what happens? In many enterprises, there are no further public cloud computing projects until IT figures this out. Indeed, the lack of data integration is holding back the migration to the cloud for a few core reasons:
- The cloud applications and databases are becoming additional silos within an IT infrastructure that really can’t afford to maintain yet another silo.
- Unexpected costs of leveraging ad hoc integration approaches, such as ftp, batch, or transport of data using common shipping services (not kidding).
- The movement to cloud-based platforms shines a much brighter light on the lack of a data integration strategy around the existing IT infrastructure, and thus IT must stop and solve that problem before proceeding to the cloud.
This issue is easily solvable with just a bit of planning. Data integration technology is in its 5th or 6th generation these days, and there are data integration solutions that can be consumed on-demand as a cloud itself, such as Dell’s Boomi, Jitterbit, snapLogic, Informatica, Actian, and many others.
So, how do you fix it? As with many of the architectural issues that need some face time when moving to the public clouds, including security, governance, performance, etc., integration requires some upfront thinking, as well as planning and selection of the right technology for the job.
Core requirements should be understood, such as: Type and frequency of data movement between the enterprise and the public cloud provider, content and structure changes that are required, encryption and other security requirements, as well as logging and auditing requirements and how to deal with exceptions that occur during operations.
From there you model how the information should flow cloud-to-cloud, cloud-to-enterprise, as well as many-to-one or many-to-many. This means you must define what pieces of information need to flow where, when, and why.
Most public cloud providers make data integration easy, typically providing well-defined APIs (often RESTful web services), that allow for data to be placed in the cloud, or consumed from the cloud. Moreover, most data integration technologies that are “cloud aware” provide pre-built connectors for most major cloud computing providers, and sub-services they offer.
The best practice around data integration as it pertains to cloud-based implementations, including application and data migration, is to do your planning as early as possible. However, if you’re like most of the enterprises I run across, data integration is an afterthought.
If you’ve already relocated to the cloud, and are looking to retrofit data integration approaches and technology, all is not lost. It’s just a matter of defining the data, and the core interfaces you plan to employ. Make sure to determine the structure at rest, or, how the data is stored in the source or target application or database. Using that information, figure out how you should deal with the mediation of the data, including changing structure and content as it moves from systems to system, such as from a database hosted in the public cloud to a traditional system within the enterprise.
Once all of this is understood, it’s just a matter of selecting the right data integration technologies for the job. There are a few choices here, including data integration technology providers that offer data integration services out of the cloud, on-demand. In some instances, more traditional data integration technology is a better fit, specifically when there are more traditional systems, such as enterprise applications and mainframe applications, that are part of the problem domain.
The core message here is that those who are successful with cloud computing consider security, governance, and, yes, data integration, as they progress through the design, build, testing, and the deployment of new or existing systems in public clouds. Lacking this effort means that you’ll likely have problems to solve down the road, and that just adds risk, and risk adds cost.