Table of Contents
- The Role of the Data Warehouse in a World with Data Lakes, Data Science and Decentralization
- Options for Provisioning the Data Warehouse and Why Multiple Clouds
- Cloudwashing – Cloud-Enabled/Hosted vs Cloud-Native vs Cloud-Owned
- Multi-Cloud Flexibility and Deployment Freedom
- Summary and Planning
- About William McKnight
- About GigaOm
The Role of the Data Warehouse in a World with Data Lakes, Data Science and Decentralization
Enterprises from every industry and at every scale continue to work to leverage data to achieve their strategic objectives—whether those are to become more profitable, effective, risk tolerant, prepared, sustainable, and/or adaptable in an ever-changing world. To leverage data, it must be accessed, combined, and governed efficiently and managed effectively. Historically the data warehouse, a generalized multi-use and multi-source data store for the modern enterprise, has been vital to this objective.
Requirements are shifting from data warehouses and extract, transform, load (ETL) exclusively to a combination of data warehouses, data lakes, data fabrics, AI, data science and pipelines. Despite this, the data warehouse remains vital to this day. As a matter of fact, of all the constructs in enterprise information management, the data warehouse would often deliver maximum ROI from data.
Great data warehouses take the “build once, use many” value proposition concept as far as it can reasonably go. It means multiple business projects can use the data without having to build separate data layers. Allowing concurrent use of data at the data warehouse layer or creating a mart off the data warehouse is a lot less work, reduces risk, and lowers overall costs over building from uncultivated, original source data. The shared data approach is worth pursuing.
Also, many relational databases now allow you to join the capabilities of the data warehouse with those traditionally associated with the data lake, often using object storage, which can store both the structured data of traditional data warehouses, and “big data” (aka high-volume data), wide variety of data structures, and rapidly streaming data. Analytical databases of today can often treat other data in the object storage as extended parts of itself, aka “external tables.” The industry has dubbed this combination of capabilities a “lakehouse” or “unified analytics.”
Yet, even that evolved data warehouse is morphing with the advent of multi-cloud and hybrid cloud as common deployment models.