Twitter said on Tuesday that it open sourced a tool called AnomalyDetection that can discover when unexpected activity occurs that may negatively impact Twitter’s service, such as traffic spikes or spam bots.
AnomalyDetection, which is an R programming package, helps [company]Twitter[/company] scan for the big traffic influxes that might take place during times of breaking news, major sports events like the World Series and the holiday season. It can also unearth spammers who are generating bad statistics regarding common Twitter metrics like the number of favorites or followers a person might have.
Although this may sound similar to the recently open-sourced BreakoutDetection tool that Twitter detailed in October, the main difference is that a breakout refers to when there is a sudden shift of activity that results in “two steady states and an intermediate transition period,” according to a Twitter posting on BreakoutDetection. An anomaly, on the other hand, can be described as an unusual data point as opposed to the changing of a state. Twitter said the tools are complementary to each other.
Detecting anomalies presents an interesting challenge to Twitter because the social-networking service studies traffic trends across different lengths of time and locations. If Twitter were to try to scan for unusual activity during a long period of time (let’s say 12 months), anomalies that might be discovered in a smaller chunk of time could be “masked” and “more difficult to detect in a robust fashion,” the post on AnomalyDetection explains.
And not all anomalies are created equal. Some are positive, like an increase of tweets during a popular event, and some are negative, like a “point-in-time decrease in QPS (queries per second),” which could indicate a hardware or data-collection problem.
AnomalyDetection is open sourced through the GNU public license and is available to download via GitHub.