Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features

Author:

Meštrović AnaORCID,Petrović MilanORCID,Beliga SlobodanORCID

Abstract

Retweet prediction is an important task in the context of various problems, such as information spreading analysis, automatic fake news detection, social media monitoring, etc. In this study, we explore retweet prediction based on heterogeneous data sources. In order to classify a tweet according to the number of retweets, we combine features extracted from the multilayer network and text. More specifically, we introduce a multilayer framework for the multilayer network representation of Twitter. This formalism captures different users’ actions and complex relationships, as well as other key properties of communication on Twitter. Next, we select a set of local network measures from each layer and construct a set of multilayer network features. We also adopt a BERT-based language model, namely Cro-CoV-cseBERT, to capture the high-level semantics and structure of tweets as a set of text features. We then trained six machine learning (ML) algorithms: random forest, multilayer perceptron, light gradient boosting machine, category-embedding model, neural oblivious decision ensembles, and an attentive interpretable tabular learning model for the retweet-prediction task. We compared the performance of all six algorithms in three different setups: with text features only, with multilayer network features only, and with both feature sets. We evaluated all the setups in terms of standard evaluation measures. For this task, we first prepared an empirical dataset of 199,431 tweets in Croatian posted between 1 January 2020 and 31 May 2021. Our results indicate that the prediction model performs better by integrating multilayer network features with text features than by using only one set of features.

Funder

Croatian Science Foundation

University of Rijeka

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Reference73 articles.

1. Retweet Prediction based on Topic, Emotion and Personality;Online Soc. Netw. Media,2021

2. Tweet retweet prediction based on deep multitask learning;Neural Process. Lett.,2022

3. Infodemiology: The epidemiology of (mis) information;Am. J. Med.,2002

4. Petrović, M., Levnajić, Z., and Meštrović, A. (2022, January 3–6). Analysis of the COVID-19 Communication on Twitter via Multilayer Network. Proceedings of the 2nd International Symposium on Automation, Information and Computing (ISAIC 2021), Beijing, China.

5. Social media can have an impact on how we manage and investigate the COVID-19 pandemic;J. Clin. Epidemiol.,2020

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. First Insight into Social Media User Sentiment Spreading Potential to Enhance the Conceptual Model for Disinformation Detection;Data Science—Analytics and Applications;2024

2. Integrating Multi-Source Heterogeneous Fuzzy Spatiotemporal Data;2023 3rd International Conference on Mobile Networks and Wireless Communications (ICMNWC);2023-12-04

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3