Affiliation:
1. IRIT, Université de Toulouse, CNRS, Toulouse INP, UT3, France
Abstract
The development of deep neural networks and the emergence of pre-trained language models such as BERT allow to increase performance on many NLP tasks. However, these models do not meet the same popularity for tweet stream summarization, which is probably because their computation limitation requires to drastically truncate the textual input.
Our contribution in this article is threefold. First, we propose a neural model to automatically and incrementally summarize huge tweet streams. This extractive model combines in an original way pre-trained language models and vocabulary frequency based representations to predict tweet salience. An additional advantage of the model is that it automatically adapts the size of the output summary according to the input tweet stream. Second, we detail an original methodology to construct tweet stream summarization datasets requiring little human effort. Third, we release the TES 2012-2016 dataset constructed using the aforementioned methodology. Baselines, oracle summaries, gold standard, and qualitative assessments are made publicly available.
To evaluate our approach, we conducted extensive quantitative experiments using three different tweet collections as well as an additional qualitative evaluation. Results show that our method outperforms state-of-the-art ones. We believe that this work opens avenues of research for incremental summarization, which has not received much attention yet.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Reference71 articles.
1. A Graph is Worth a Thousand Words: Telling Event Stories using Timeline Summarization Graphs
2. Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. In Proceedings of the 5th International Conference on Learning Representations: Conference Track Proceedings (ICLR’17).https://openreview.net/forum?id=SyK00v5xx.
3. Personal Knowledge Graphs
4. The use of MMR, diversity-based reranking for reordering documents and producing summaries
5. Yi Chang, Jiliang Tang, Dawei Yin, Makoto Yamada, and Yan Liu. 2016. Timeline summarization from social media with life cycle models. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). 3698–3704. http://www.ijcai.org/Abstract/16/520.
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献