WibiData, a Hadoop-based startup focused on making it easier to analyze user behavior, has raised $5 million from New Enterprise Associates. The company, formerly known as Odiago, launched in late 2011 already claiming Wikipedia and Atlassian among its early customers.
Details about how, exactly, WibiData goes about letting users do web analytics have been sparse, but co-founder Aaron Kimball, who will present at our Structure: Data conference next month, explained some of it in a blog post on Monday. The post is fairly technical, but the gist is that WibiData leverages Apache Hadoop, HBase and Avro, as well as ample proprietary code, to enable both real-time and batch processing of user data. This lets users model customer profiles based on historical data, but also adjust those models in reaction to real-time activity on the site.
Here’s how Kimball describes the problems WibiData addresses:
Data about users has challenges associated with it that you don’t necessarily see with other large-scale data.
- To analyze users, you need to digest large volumes of log-oriented transactional data as well as more concise profile data
- You need to serve recommendations and other derived data interactively
- There’s a mix of batch (offline) and on-the-fly calculations required to deliver recommendations at web speed
WibiData is designed to store this transactional data side-by-side with profile and other derived data attributes. Keeping data logically and physically close enables high-performance analysis of the entire data picture surrounding a user.
In some cases, as FoneDoktor’s Alex Loddengaard explained in a December blog post, WibiData can obviate the need to maintain a Hadoop cluster and a separate online transaction processing system (OLTP) because WibiData provides both capabilities. It does this by using HBase as the real-time data store for transactions, and by incorporating a programming framework that’s abstracted from MapReduce so users can perform either batch or real-time analyses.
Where Avro comes in is for adding fields to data records, or adjusting schema, without affecting existing processes that have to access that data. As Kimball explains, “Does your web site track a new cookie? This can be added as a new field. But even though you start collecting that new data, your existing analysis pipelines can treat records like they always did; programs that don’t yet know about the new cookie are still compatible with both the old records already collected, and the new records with the additional field.”
Its data-management methods and machine-learning libraries for capabilities such as content recommendation make WibiData ideal for web-user data, but Kimball points out it’s also a good fit for “mobile, online gaming, healthcare, finance, and several other industries.” However, WibiData is just one of many startups looking to parlay its founders’ Hadoop expertise into a higher-level analytics product that does things Hadoop alone without requiring deep Hadoop or analytics expertise on the customer end. Good thing there’s plenty of data to go around.