Publications

Stats

View publication

Title On-line Relevant Anomaly Detection in the Twitter Stream: An Efficient Bursty Keyword Detection Model
Authors Jheser Guzman, Bárbara Poblete
Publication date 2013
Abstract On-line social networks have become a massive
communication and
information channel for users world-wide. In particular, the microblogging
platform Twitter, is characterized by short-text message exchanges at
extremely high rates. In this type of scenario, the detection of emerging
topics in text streams becomes an important research area, essential for
identifying relevant new conversation topics, such as breaking news and
trends. Although emerging topic detection in text is a well established
research area, its application to large volumes of streaming text data is
quite novel. Making scalability, efficiency and rapidness, the key aspects
for any emerging topic detection algorithm in this type of environment.
\n\n
Our research addresses the aforementioned problem by focusing on detecting
significant and unusual bursts in keyword arrival rates or bursty keywords.
We propose a scalable and fast on-line method that uses normalized
individual frequency signals per term and a windowing variation technique.
This method reports keyword bursts which can be composed of single or
multiple terms, ranked according to their importance. The average complexity
of our method is O(n log n), where n is the number of messages in the time
window. This complexity allows our approach to be scalable for large
streaming datasets. If bursts are only detected and not ranked, the
algorithm remains with lineal complexity O(n), making it the fastest in
comparison to the current state-of-the-art. We validate our approach by
comparing our performance to similar systems using the TREC Tweet 2011
Challenge tweets, obtaining 91% of matches with LDA, an off-line gold
standard used in similar evaluations. In addition, we study Twitter messages
related to the SuperBowl football events in 2011 and 2013.
Downloaded 7 times
Pages 31-39
Conference name ACM SIGKDD Workshop on Outlier Detection and Description
Publisher ACM Press (New York, NY, USA)
PDF View PDF
Reference URL View reference page