Affiliation:
1. Kunming University of Science and Technology, China
2. Beijing Institute of Technology, China
3. Beihang University, China
4. North China Electric Power University, China
5. University of Illinois at Chicago, USA
Abstract
Social Event Detection (SED) aims to identify significant events from social streams, and has a wide application ranging from public opinion analysis to risk management. In recent years, Graph Neural Network (GNN) based solutions have achieved state-of-the-art performance. However, GNN-based methods often struggle with missing and noisy edges between messages, affecting the quality of learned message embedding. Moreover, these methods statically initialize node embedding before training, which, in turn, limits the ability to learn from message texts and relations simultaneously. In this paper, we approach social event detection from a new perspective based on Pre-trained Language Models (PLMs), and present
\(\mathrm{RPLM}_{SED}\)
(
R
elational prompt-based
P
re-trained
L
anguage
M
odels for
S
ocial
E
vent
D
etection). We first propose a new pairwise message modeling strategy to construct social messages into message pairs with multi-relational sequences. Secondly, a new multi-relational prompt-based pairwise message learning mechanism is proposed to learn more comprehensive message representation from message pairs with multi-relational prompts using PLMs. Thirdly, we design a new clustering constraint to optimize the encoding process by enhancing intra-cluster compactness and inter-cluster dispersion, making the message representation more distinguishable. We evaluate the
\(\mathrm{RPLM}_{SED}\)
on three real-world datasets, demonstrating that the
\(\mathrm{RPLM}_{SED}\)
model achieves state-of-the-art performance in offline, online, low-resource, and long-tail distribution scenarios for social event detection tasks.
Publisher
Association for Computing Machinery (ACM)
Reference90 articles.
1. Charu C Aggarwal and Karthik Subbian. 2012. Event detection in social streams. In Proceedings of the 2012 SIAM international conference on data mining. 624–635.
2. Alaa Alharbi and Mark Lee. 2021. Kawarith: an Arabic Twitter corpus for crisis events. In Proceedings of the Sixth Arabic Natural Language Processing Workshop. Association for Computational Linguistics, 42–52.
3. Hadi Amiri and Hal Daume III. 2016. Short text representation for detecting churn in microblogs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30. 1–7.
4. Mihael Ankerst, Markus M Breunig, Hans-Peter Kriegel, and Jörg Sander. 1999. OPTICS: Ordering points to identify the clustering structure. ACM Sigmod record 28, 2 (1999), 49–60.
5. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings