Affiliation:
1. Department of Computer Science, College of Computer Science, King Khalid University, Abha 62529, Saudi Arabia
Abstract
In text classification tasks, such as sentiment analysis (SA), feature representation and weighting schemes play a crucial role in classification performance. Traditional term weighting schemes depend on the term frequency within the entire document collection; therefore, they are called unsupervised term weighting (UTW) schemes. One of the most popular UTW schemes is term frequency–inverse document frequency (TF-IDF); however, this is not sufficient for SA tasks. Newer weighting schemes have been developed to take advantage of the membership of documents in their categories. These are called supervised term weighting (STW) schemes; however, most of them weigh the extracted features without considering the characteristics of some noisy features and data imbalances. Therefore, in this study, a novel STW approach was proposed, known as term frequency–term discrimination ability (TF-TDA). TF-TDA mainly presents the extracted features with different degrees of discrimination by categorizing them into several groups. Subsequently, each group is weighted based on its contribution. The proposed method was examined over four SA datasets using naive Bayes (NB) and support vector machine (SVM) models. The experimental results proved the superiority of TF-TDA over two baseline term weighting approaches, with improvements ranging from 0.52% to 3.99% in the F1 score. The statistical test results verified the significant improvement obtained by TF-TDA in most cases, where the p-value ranged from 0.0000597 to 0.0455.
Funder
Deanship of Scientific Research at King Khalid University
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference52 articles.
1. On term frequency factor in supervised term weighting schemes for text classification;Dogan;Arab. J. Sci. Eng.,2019
2. Like it or not: A survey of twitter sentiment analysis methods;Giachanou;ACM Comput. Surv. (CSUR),2016
3. Dogra, V., Alharithi, F.S., Álvarez, R.M., Singh, A., and Qahtani, A.M. (2022). NLP-Based Application for Analyzing Private and Public Banks Stocks Reaction to News Events in the Indian Stock Exchange. Systems, 10.
4. Kharde, V., and Sonawane, P. (2016). Sentiment analysis of twitter data: A survey of techniques. arXiv.
5. Narayanaswamy, G.R. (2021). Exploiting BERT and RoBERTa to Improve Performance for Aspect Based Sentiment Analysis. [Master’s Thesis, Technological University Dublin].
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Mental Health Detection with TF-IDF Feature Extraction;2024 IEEE International Conference on Artificial Intelligence and Mechatronics Systems (AIMS);2024-02-21