Don’t bind too early

On Tuesday evening, Gigaom and Collective[i] hosted a dinner in Boston for area executives, with Ray Ozzie as the special guest and speaker. This post isn’t a recap of that dinner, but rather a discussion of a concept Ozzie raised that is very important in the world of big data and also NoSQL: late binding.

Better late than ever?
As it happens, Ozzie did not present the concept of late binding within the context of big data. In fact, the discussion in which the concept arose centered around competing social communication tools and at which point organizations should standardize on one product in that category. Ozzie’s recommendation: don’t bind too early.

In other words, don’t rush to commit. Play the field. Use lots of tools and see which works best, for the greatest number of business units, and then make a decision. And if different groups are each getting value out of different tools, don’t be afraid to use multiple tools.

Romancing the code
“Late binding” is a programming term. It refers to a capability and technique whereby variables’ data types are determined at execution time (late) rather than at coding time (early). By declaring a generic variable and assigning it a text, numeric, Boolean or another value, the data type of that variable can be implicitly defined on the fly. You keep your options open until concrete values are assigned.

So if late binding can be applied to both tool standardization and variable typing, why mention it in a weekly big data update? Because late binding is what so-called “unstructured” data is all about. In the world of Hadoop and NoSQL, schemas are determined on a just-in-time basis.

It’s not that the data is unstructured. Without structure, you couldn’t perform analytics on the data. But the structure is determined at query time rather than at the time the database is designed. It’s interpretive, and it’s not stored.

As it turns out, determining structure at the last minute allows more analyses to be performed, and eliminates the inefficiencies of negotiating a schema of consensus. The power to structure is delegated. It’s more inclusive that way, and it eliminates a lot of bureaucratic process that would kick in if that power were centrally consolidated.

There are many programming languages out there that support both early and late binding of variables because they provide for both dynamic and strong typing. So what does that say about databases? It says that the status quo today, where NoSQL databases supply late binding and relational databases are essentially strongly typed, is silly. Having a few pure-play products is fine, but just as many programming languages support early and late binding, so too should many databases.

Which is to say that if the major relational databases were to add late-bound structure capabilities we could possibly eliminate the situation where relational and NoSQL engines can’t be common or run on a common platforms. We could eliminate data silos, respect existing skillsets, and get the industry to focus on quality rather than just competing in what is still an immature product market.

In short, we could mitigate some of the fragmentation that now exists in the industry. While it’s lovely to have so many choices, eventually we have to commit to a database standard. The decision can be late bound, but it can’t be unbound.