For all the talk about big data and how it can help us track down needles in haystacks, there’s still a lot of work to when it comes to issues like public health. When successful intervention might require timelines of minutes or hours rather than days, it takes a might keen eye to monitor lots of needles in lots of haystacks and, more importantly, spot new and important ones as they pop up.
We’ve been following news out of the Global Database of Events, Languages and Tones (GDELT) project for the past several months, and it’s very impressive as a tool for historical analysis of the world’s happenings. It takes and indexes real-time streams from news sources around the world, and now includes hundreds of millions data points spanning the past 35 years. It has been used for all sorts of analyses so far, ranging from tracking the spread of terrorist groups to comparing how activity patterns of today’s political uprisings compare to those of decades past.

But in a blog post published on Saturday, GDELT project leader Kalev Leetaru points out a major limitation of the database: It’s only as useful as scope of data it includes and the analysts using it. Using analysis of the Ebola outbreak as an example, he explains how GDELT actually ingested an international news article referencing the Guinea government’s concern over hemorrhagic fever one day before Harvard’s HealthMap signaled an alert based on local social media activity. Only, without someone monitoring for that type of news in that part of the world, the single reference was very easy to miss.
Indeed, GDELT, like HealthMap, only picked up on the growing epidemic a day later as mentions ramped up on news and social media alike. Leetaru suggests some technical improvements, including broader machine-translation capabilities, that might help GDELT serve as a better real-time alerting system by letting it track even hyper-local sources within remote countries. The more events it picks up, and the earlier it picks them up, the harder they are to miss by anyone paying attention.

An expanded coverage footprint certainly would be a big help, both for GDELT and commercially available services such as Dataminr. Dataminr monitors activity on Twitter and, when it identifies a meaningful situation taking place, sends alerts to journalists, government agents and first responders. It has already helped identify breaking news domestically but, as with GDELT, Dataminr could be even more valuable if it were able to monitor more social networks and more language around the world.
However, if we want to make progress in responding to potential emergencies or other situations, we also just need more people buying into the promise of these types of databases and alerting systems. More news sources should help GDELT, Dataminr and other services identify more trends, but each one is still just a needle in a haystack that’s expanding, as well. Databases can ingest more data faster and algorithms can identify more trends faster, but reacting faster means we need more people paying attention to what the data is saying, as it’s being said.
For more thoughts from GDELT’s Leetaru on the promise of collecting and analyzing so much data, check out this Structure Show podcast interview with him.
[soundcloud url=”https://api.soundcloud.com/tracks/165051736?secret_token=s-YTgYs” params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]