Normalized effect size (NES): a novel feature selection model for Urdu fake news classification

Author:

Wasim Muhammad1,Cheema Sehrish Munawar2,Pires Ivan Miguel34

Affiliation:

1. Department of Computer Science, University of Management & Technology, Sialkot Campus, Sialkot, Pakistan

2. Department of Computer Science, University of Management and Technology, Lahore, Pakistan

3. Instituto de Telecomunicações, Covilhã, Portugal

4. Escola Superior de Tecnologia e Gestão de Águeda, Universidade de Aveiro, Águeda, Portugal

Abstract

Social media has become an essential source of news for everyday users. However, the rise of fake news on social media has made it more difficult for users to trust the information on these platforms. Most research studies focus on fake news detection in the English language, and only a limited number of studies deal with fake news in resource-poor languages such as Urdu. This article proposes a globally weighted term selection approach named normalized effect size (NES) to select highly discriminative features for Urdu fake news classification. The proposed model is based on the traditional inverse document frequency (TF-IDF) weighting measure. TF-IDF transforms the textual data into a weighted term-document matrix and is usually prone to the curse of dimensionality. Our novel statistical model filters the most discriminative terms to reduce the data’s dimensionality and improve classification accuracy. We compare the proposed approach with the seven well-known feature selection and ranking techniques, namely normalized difference measure (NDM), bi-normal separation (BNS), odds ratio (OR), GINI, distinguished feature selector (DFS), information gain (IG), and Chi square (Chi). Our ensemble-based approach achieves high performance on two benchmark datasets, BET and UFN, achieving an accuracy of 88% and 90%, respectively.

Publisher

PeerJ

Subject

General Computer Science

Reference84 articles.

1. Detecting fake news using machine learning: a systematic literature review;Ahmed,2021

2. Detecting opinion spams and fake news using text classification;Ahmed;Security and Privacy,2018

3. Fake news, disinformation and misinformation in social media: a review;Aïmeur;Social Network Analysis and Mining,2023

4. Urdu fake news dataset;Akhter,2023

5. Automatic detection of offensive language for Urdu and Roman Urdu;Akhter;IEEE Access,2020

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3