There are some things that just weren’t possible before the world wide web and cloud computing, and a recently launched emotion-quantification project called “We Feel” is one of them. The project, which is a partnership between Australia’s Black Dog Institute and its Commonwealth Scientific and Industrial Research Organization (CSIRO), is analyzing every English-language Twitter post around the world in order to determine how people are feeling.
Using data from Gnip, the social-media data feed that Twitter recently acquired, We Feel gauges where tweets range on a spectrum from “joy” to “fear” (as well as “surprise”) and then breaks them down at a more-granular level (e.g., from “joy” to “zest” to “invigorated”). It also captures metadata on the countries from which tweets are coming, the sex of the person doing the tweeting and the timestamp of the tweet.
The project website includes an interactive visualization where visitors can drill down as deeply as they want, or they can access the data via APIs or build their own tables sorted by time, place, sex and emotion. Because We Feel is analyzing tweets in near real-time — new summaries of Twitter activity are generated every 15 minutes — the data is relatively current.
The Black Dog Institute has been trying to monitor Australians’ emotional well-being for years via quarterly snail-mail surveys, “but it’s very expensive, it’s very time consuming,” said David Milne, a postdoctoral research fellow at CSIRO. It also has a small-scale research project focused on Australian Twitter users, trying to determine whether tweets mentioning suicide or depression are legitimate or just sarcastic. But We Feel is the institute’s first attempt to monitor emotion on such a large scale.
Finding partners in the cloud
Ironically, although Milne said the institutes “wouldn’t have considered trying to do this” without using cloud computing resources, they almost declined when Amazon Web Services came forward requesting to help them with their research, because they didn’t need its infrastructure to do the research they were currently doing. It was only after AWS asked the The Black Dog Institute to conjure up a dream project that the concept for We Feel came to be. That was in March.
AWS accepted the proposal, and donated computing power and architects to help develop the system, which has been up and running since late April. Amazon’s Kinesis streaming service handles communication between the cloud servers analyzing the tweets, and the summaries available online are stored in the DynamoDB database service.
We Feel gets its data — about 19,000 tweets per minute — by taking a 1 percent sample of all tweets using Twitter’s public API, a 10 percent sample using Gnip’s firehose feed and a targeted Gnip feed focused on a “large vocabulary” of emotional words.
These types of goodwill gestures by web companies and cloud computing providers are not entirely uncommon. AWS, Google and Microsoft have all hosted and made available for free large datasets spanning the economy, genomics, oncology and other important areas. One of the biggest to date came online in late May, when Google opened the GDELT database of more than 250 million sociopolitical events to public analysis using its BigQuery service. Twitter recently granted six research projects access to its entire historical collection of tweets.
Now, to prove there’s scientific gold in those tweets
The project hasn’t yet collected enough data to really draw any meaningful conclusions, Milne explained, but it has provided some interesting results around individual events. Milne and his fellow researchers analyzed emotion surrounding the announcement of Australia’s federal budget on Tuesday, May 13 (it’s quite a big deal in the country, he says) to the previous Tuesday and saw a 30 percent increase in fearful tweets and a 27 percent increase in angry tweets on budget day. The fear tweets began right before the announcement and sustained for hours, while the angry tweets spiked that evening.
We Feel currently has computing resources to run for about a year, but will require adequate grant funding in order to keep the project staffed. Hopefully, Milne said, once We Feel has gathered enough data, researchers will be able to validate it against the results of traditional data-collection techniques and prove that it’s an effective method for gauging the public mood.
However, that’s no guarantee, he acknowledged. Twitter data has inherent biases that could affect its utility as a proxy for the entire English-speaking world, and the text analysis We Feel uses is pretty elementary. In the phrase “I’m not sad,” for example, it will pick up “sad” and ignore the “not.”
Still, like many big data devotees before him, Milne is hopeful We Feel’s methodological shortcomings will sort themselves in the sheer volume of data it’s collecting. “Do the simplest thing,” he said, “but do it at a stupidly large scale.”