ClickbaitTR: Dataset for clickbait detection from Turkish news sites and social media with a comparative analysis via machine learning algorithms

Author:

Genç Şura1,Surer Elif2ORCID

Affiliation:

1. Department of Cognitive Science, Graduate School of Informatics, Middle East Technical University, Turkey

2. Department of Modeling and Simulation, Graduate School of Informatics, Middle East Technical University, Turkey

Abstract

Clickbait is a strategy that aims to attract people’s attention and direct them to specific content. Clickbait titles, created by the information that is not included in the main content or using intriguing expressions with various text-related features, have become very popular, especially in social media. This study expands the Turkish clickbait dataset that we had constructed for clickbait detection in our proof-of-concept study, written in Turkish. We achieve a 48,060 sample size by adding 8859 tweets and release a publicly available dataset – ClickbaitTR – with its open-source data analysis library. We apply machine learning algorithms such as Artificial Neural Network (ANN), Logistic Regression, Random Forest, Long Short-Term Memory Network (LSTM), Bidirectional Long Short-Term Memory (BiLSTM) and Ensemble Classifier on 48,060 news headlines extracted from Twitter. The results show that the Logistic Regression algorithm has 85% accuracy; the Random Forest algorithm has a performance of 86% accuracy; the LSTM has 93% accuracy; the ANN has 93% accuracy; the Ensemble Classifier has 93% accuracy; and finally, the BiLSTM has 97% accuracy. A thorough discussion is provided for the psychological aspects of clickbait strategy focusing on curiosity and interest arousal. In addition to a successful clickbait detection performance and the detailed analysis of clickbait sentences in terms of language and psychological aspects, this study also contributes to clickbait detection studies with the largest clickbait dataset in Turkish.

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Fact or Fake? How News Title, Sentiment and Writing Style help AI to detect COVID-19 Fake News?;Applied Artificial Intelligence;2024-08-10

2. A Literature Review on Clickbait Detection Techniques for Social Media;2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO);2024-03-14

3. Using Deep Learning and Word Embedding to Detect Clickbait in Indonesian Headline News;2023 International Conference of Computer Science and Information Technology (ICOSNIKOM);2023-11-10

4. CA-CD: context-aware clickbait detection using new Chinese clickbait dataset with transfer learning method;Data Technologies and Applications;2023-08-29

5. Çift Yönlü Transformatör Kodlayıcı Temsilleriyle Gelişmiş Türkçe Sahte Haber Tahmini;Konya Journal of Engineering Sciences;2022-09-01

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3