Affiliation:
1. Department of Computer Science, University of Houston, Houston, TX 77204-3010, USA
Abstract
Nowadays, Twitter has become one of the fastest-growing microblogging services; consequently, analyzing this rich and continuously user-generated content can reveal unprecedentedly valuable knowledge. In this paper, we propose a novel two-stage system to detect and track events from tweets by integrating a Latent Dirichlet Allocation (LDA)-based approach and an efficient density–contour-based spatio-temporal clustering approach. In the proposed system, we first divide the geotagged tweet stream into temporal time windows; next, events are identified as topics in tweets using an LDA-based topic discovery step; then, each tweet is assigned an event label; next, a density–contour-based spatio-temporal clustering approach is employed to identify spatio-temporal event clusters. In our approach, topic continuity is established by calculating KL-divergences between topics and spatio-temporal continuity is established by a family of newly formulated spatial cluster distance functions. Moreover, the proposed density–contour clustering approach considers two types of densities: “absolute” density and “relative” density to identify event clusters where either there is a high density of event tweets or there is a high percentage of event tweets. We evaluate our approach using real-world data collected from Twitter, and the experimental results show that the proposed system can not only detect and track events effectively but also discover interesting patterns from geotagged tweets.
Publisher
World Scientific Pub Co Pte Lt
Subject
Artificial Intelligence,Computer Networks and Communications,Computer Science Applications,Linguistics and Language,Information Systems,Software
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献