Textual Feature Extraction Using Ant Colony Optimization for Hate Speech Classification-Reference-Cited by-同舟云学术

Textual Feature Extraction Using Ant Colony Optimization for Hate Speech Classification

Published:2023-03-06 Issue:1 Volume:7 Page:45
ISSN:2504-2289
Container-title:Big Data and Cognitive Computing
language:en
Short-container-title:BDCC

Author:

Gite Shilpa¹,Patil Shruti¹^ORCID,Dharrao Deepak²^ORCID,Yadav Madhuri¹,Basak Sneha¹,Rajendran Arundarasi¹^ORCID,Kotecha Ketan³^ORCID

Affiliation:

1. Symbiosis Centre for Applied Artificial Intelligence, Department of Artificial Intelligence and Machine Learning, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune 412115, India

2. Department of Computer Science and Engineering, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune 412115, India

3. Symbiosis Centre for Applied Artificial Intelligence, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune 412115, India

Abstract

Feature selection and feature extraction have always been of utmost importance owing to their capability to remove redundant and irrelevant features, reduce the vector space size, control the computational time, and improve performance for more accurate classification tasks, especially in text categorization. These feature engineering techniques can further be optimized using optimization algorithms. This paper proposes a similar framework by implementing one such optimization algorithm, Ant Colony Optimization (ACO), incorporating different feature selection and feature extraction techniques on textual and numerical datasets using four machine learning (ML) models: Logistic Regression (LR), K-Nearest Neighbor (KNN), Stochastic Gradient Descent (SGD), and Random Forest (RF). The aim is to show the difference in the results achieved on both datasets with the help of comparative analysis. The proposed feature selection and feature extraction techniques assist in enhancing the performance of the machine learning model. This research article considers numerical and text-based datasets for stroke prediction and detecting hate speech, respectively. The text dataset is prepared by extracting tweets consisting of positive, negative, and neutral sentiments from Twitter API. A maximum improvement in accuracy of 10.07% is observed for Random Forest with the TF-IDF feature extraction technique on the application of ACO. Besides, this study also highlights the limitations of text data that inhibit the performance of machine learning models, justifying the difference of almost 18.43% in accuracy compared to that of numerical data.

Publisher

MDPI AG

Subject

Artificial Intelligence,Computer Science Applications,Information Systems,Management Information Systems

Link

https://www.mdpi.com/2504-2289/7/1/45/pdf

Reference72 articles.

1. Linardatos, P., Papastefanopoulos, V., and Kotsiantis, S. (2021). Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy, 23.

2. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery;Lundberg;Nat. Biomed. Eng.,2018

3. Ant colony optimization for text feature selection in sentiment analysis;Ahmad;Intell. Data Anal.,2019

4. Deep learning applications and challenges in big data analytics;Najafabadi;J. Big Data,2015

5. An ant colony optimization based feature selection for web page classification;Sci. World J.,2014

Cited by 15 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An efficient method for disaster tweets classification using gradient-based optimized convolutional neural networks with BERT embeddings;MethodsX;2024-12

2. Complex hilly terrain agricultural UAV trajectory planning driven by Grey Wolf Optimizer with interference model;Applied Soft Computing;2024-07

3. Application of Natural Language Processing and Genetic Algorithm to Fine-Tune Hyperparameters of Classifiers for Economic Activities Analysis;Big Data and Cognitive Computing;2024-06-13

4. A Review of Metaheuristic Optimization Techniques in Text Classification;International Journal of Computational and Experimental Science and Engineering;2024-04-30

5. Towards understanding the role of content-based and contextualized features in detecting abuse on Twitter;Heliyon;2024-04