TwIdw—A Novel Method for Feature Extraction from Unstructured Texts-Reference-Cited by-同舟云学术

TwIdw—A Novel Method for Feature Extraction from Unstructured Texts

Published:2023-05-25 Issue:11 Volume:13 Page:6438
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Szabó Nagy Kitti¹,Kapusta Jozef¹²^ORCID

Affiliation:

1. Department of Informatics, Faculty of Natural Sciences and Informatics, Constantine the Philosopher University in Nitra, 949 01 Nitra, Slovakia

2. Institute of Computer Science, Pedagogical University of Cracow, 30-084 Krakόw, Poland

Abstract

This research proposes a novel technique for fake news classification using natural language processing (NLP) methods. The proposed technique, TwIdw (Term weight–inverse document weight), is used for feature extraction and is based on TfIdf, with the term frequencies replaced by the depth of the words in documents. The effectiveness of the TwIdw technique is compared to another feature extraction method—basic TfIdf. Classification models were created using the random forest and feedforward neural networks, and within those, three different datasets were used. The feedforward neural network method with the KaiDMML dataset showed an increase in accuracy of up to 3.9%. The random forest method with TwIdw was not as successful as the neural network method and only showed an increase in accuracy with the KaiDMML dataset (1%). The feedforward neural network, on the other hand, showed an increase in accuracy with the TwIdw technique for all datasets. Precision and recall measures also confirmed good results, particularly for the neural network method. The TwIdw technique has the potential to be used in various NLP applications, including fake news classification and other NLP classification problems.

Funder

Slovak Research and Development Agency

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/11/6438/pdf

Reference31 articles.

1. Survey on supervised machine learning techniques for automatic text classification;Kadhim;Artif. Intell. Rev.,2019

2. Hiramath, C.K., and Deshpande, G.C. (2019, January 25–27). Fake News Detection Using Deep Learning Techniques. Proceedings of the 1st International Conference on Advances in Information Technology, Chikmagalur, India.

3. Zhang, J., Dong, B., and Yu, P.S. (2020, January 20–24). FakeDetector: Effective Fake News Detection with Deep Diffusive Neural Network. Proceedings of the 2020 IEEE 36th International Conference on Data Engineering (ICDE), Dallas, TX, USA.

4. Big Data ML-Based Fake News Detection using Distributed Learning;Altheneyan;IEEE Access,2023

5. Analyzing Machine Learning Enabled Fake News Detection Techniques for Diversified Datasets;Mishra;Wirel. Commun. Mob. Comput.,2022

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing hydrological data completeness: A performance evaluation of various machine learning techniques using probabilistic fusion imputer with neural networks for streamflow data reconstruction;Journal of Hydrology;2024-08

2. Precognition of mental health and neurogenerative disorders using AI-parsed text and sentiment analysis;Acta Universitatis Sapientiae, Informatica;2023-12-01

3. Feature extraction from unstructured texts as a combination of the morphological and the syntactic analysis and its usage in fake news classification tasks;Neural Computing and Applications;2023-09-07