The Obama administration’s open data mandate announced on Thursday was made all the better by the unveiling of the new ScraperWiki service on Friday. If you’re not familiar with ScraperWiki, it’s a web-scraping service that has been around for a while but has primarily focused on users with some coding chops or data journalists willing to pay to have someone scrape data sets for them. Its new service, though, currently in beta, also makes it possible for anyone to scrape Twitter to create a custom data set without having to write a single line of code.
Taken alone, ScraperWiki isn’t that big of a deal, but it’s part of a huge revolution that has been called the democratization of data. More data is becoming available all the time — whether from the government, corportations or even our own lives — only it’s not of much use unless you’re able to do something with it. ScraperWiki is now one of a growing list of tools dedicated to helping everyone, not just expert data analysts or coders, analyze — and, in its case, generate — the data that matters to them.
After noticing a particularly large numbers of tweets in my stream about flight delays yesterday, I thought I’d test out ScraperWiki’s new Twitter search function by gathering a bunch of tweets directed to @United. The results — from 1,697 tweets dating back to May 3 — are pretty fun to play with, if not that surprising. (Also, I have no idea how far back the tweet search will go or how long it will take using the free account, which is limited to 30 minutes of compute time a day. I just stopped at some point so I could start digging in.)
First things first, I ran my query. Here’s what the data looks like viewed in a table in the ScraperWiki app.
Next, it’s a matter of analyzing it. ScraperWiki lets you view it in a table (like above), export it to Excel or query it using SQL, and will also summarize it for you. This being Twitter data, the natural thing to do seemed to be analyzing it for sentiment. One simple way to do this right inside the ScraperWiki table is to search for a particular term that might suggest joy or anger. I chose a certain four-letter word that begins with f.
Surprisingly, I only found eight instances. Here’s my favorite: “Your Customer Service is better than a hooker. I paid a bunch of money and you’re still…” (You probably get the idea.)
But if you read my “data for dummies” post from January, you know that we mere mortals have tools at our disposal for dealing with text data in a more refined way. IBM’s Many Eyes service won’t let me score tweets for sentiment, but I can get a pretty good idea overall by looking at how words are used. For this job, though, a simple word cloud won’t work, even after filtering out common words, @united and other obvious terms. Think of how “thanks” can be used sarcastically and you can see why.
Using the customized word tree, you can see that “thanks” sometimes means “thanks.” Other times, not so much. I know it’s easy to dwell on the negative, but consider this: “worst” had 28 hits while “best” had 15. One of those was referring to Tito’s vodka and at least three were referring to skyline views. (Click here to access it and search by whatever word you want.)
Here’s a phrase net filtering the results by phrases where the word “for” connects two words.
Anyhow, this was just a fast, simple and fairly crude example of what ScraperWiki now allows users to do, and how that resulting data can be combined with other tools to analyze and visualize it. Obviously, it’s more powerful if you can code, but new tools are supposedly on the way (remember, this is just a beta version) that should make it easier to scrape data from even more sources.
In the long term, though, services like ScraperWiki should become a lot more valuable as tools for helping us generate and analyze data rather than just believe what we’re told. Want to improve your small business, put your life in context or perhaps just write the best book report your teacher has ever seen? It’s getting easier every day.