Google (s goog) has made public the details of its Spanner database technology, which allows a database to store data across multiple data centers, millions of machines and trillions of rows. But it’s not just larger than the average database, Spanner also allows applications that use the database to dictate where specific data is stored so as to reduce latency when retrieving it.
Making this whole concept work is what Google calls its True Time API, which combines an atomic clock and a GPS clock to timestamp data so it can then be synched across as many data centers and machines as needed. From the Google paper:
…Spanner has evolved from a Bigtable-like versioned key-value store into a temporal multi-version database. Data is stored in schematized semi-relational tables; data is versioned, and each version is automatically timestamped with its commit time; old versions of data are subject to con?gurable garbage-collection policies; and applications can read data at old timestamps. Spanner supports general-purpose transactions, and provides a SQL-based query language.
Because of the importance of the True Time API, Google has GPS antennas and atomic clocks on the servers in the data centers running Spanner technology. The approach is also fairly unusual, but Google’s innovations have a way of spreading once they are publicized.
For the full walk-through on Spanner, Google’s paper delves into the specifics. Here are a few tidbits to help determine if Spanner is something you’d care about.
- Spanner automatically reshards data across machines, and it automatically migrates data across machines and data centers to balance load and in case of failures.
- This makes Spanner good for high availability as well as applications that need a semi-relational database that handles reads and writes faster than Google’s Megastore option.
- Spanner exposes the following set of data features to applications: a data model based on schematized semi-relational tables, a query language, and general-purpose transactions.
- Spanner’s data model is not purely relational. Rows don’t need names but they must have an ordered set of one or more primary-key columns familiar to people who work with key-value-stores. The primary keys form the name for a row, and each table defines a mapping from the primary-key columns to the non-primary-key columns. The paper says imposing this structure is useful because it lets applications control data locality through their choices of keys.
Spanner is cool as a database tool for the current era of real-time data, but it also indicates how Google is thinking about building a compute infrastructure that is designed to run amid a dynamic environment where the hardware, the software and the data itself being processed is constantly changing.