Reducing Risk and Costs of Downtime in Petabyte-scale Environments

Table of Contents

  1. Summary
  2. The Problem
  3. A Deeper Look at the Problem
  4. The Solution
  5. How It Works
  6. Why It Is Important
  7. An Example of Uptime at Scale
  8. Final Notes
  9. About Enrico Signoretti

1. Summary

Modern scale-out architectures are designed to provide the highest uptime with minimum effort and cost. Highly automated through policies, every logical component is resilient to multiple failures. When correctly deployed, these infrastructures tolerate disasters and automatically resume normal operations once failed resources become available again.

With users accessing applications and data at anytime and from anywhere, the uptime for IT infrastructure is more critical than ever. Service disruption for even a few minutes, including scheduled maintenance, adversely affects the overall total cost of ownership (TCO). Traditional storage systems with a scarce number of controllers and a moderate incidence of failure are no longer sufficient for supporting multi-petabyte-scale infrastructure requiring multiple nines of uptime. These infrastructures need a completely different approach if they are to grant the best uptime at the lowest possible operational cost.

Metro-wide high availability clusters as well as various replication mechanisms are sufficient to serve a few data volumes and applications concurrently, but when the quantity of data and number of applications surpass a certain level, the complexity and constraints these protection techniques impose make them too rigid to be viable. Additionally, most of the traditional replication mechanisms used for business continuity must be tested often and carefully to ensure a real effectiveness in case of failure.

Thumbnail image courtesy of baranozdemir/iStock.

Full content available to GigaOm Subscribers.

Sign Up For Free