A novel filter feature selection method for text classification: Extensive Feature Selector-Reference-Cited by-同舟云学术

A novel filter feature selection method for text classification: Extensive Feature Selector

Published:2021-04-13 Issue: Volume: Page:016555152199103
ISSN:0165-5515
Container-title:Journal of Information Science
language:en
Short-container-title:Journal of Information Science

Author:

Parlak Bekir¹^ORCID,Uysal Alper Kursat²

Affiliation:

1. Department of Computer Engineering Faculty of Technology, Amasya University, Turkey

2. Department of Computer Engineering, Faculty of Engineering, Canakkale Onsekiz Mart University, Turkey

Abstract

As the huge dimensionality of textual data restrains the classification accuracy, it is essential to apply feature selection (FS) methods as dimension reduction step in text classification (TC) domain. Most of the FS methods for TC contain several number of probabilities. In this study, we proposed a new FS method named as Extensive Feature Selector (EFS), which benefits from corpus-based and class-based probabilities in its calculations. The performance of EFS is compared with nine well-known FS methods, namely, Chi-Squared (CHI2), Class Discriminating Measure (CDM), Discriminative Power Measure (DPM), Odds Ratio (OR), Distinguishing Feature Selector (DFS), Comprehensively Measure Feature Selection (CMFS), Discriminative Feature Selection (DFSS), Normalised Difference Measure (NDM) and Max–Min Ratio (MMR) using Multinomial Naive Bayes (MNB), Support-Vector Machines (SVMs) and k-Nearest Neighbour (KNN) classifiers on four benchmark data sets. These data sets are Reuters-21578, 20-Newsgroup, Mini 20-Newsgroup and Polarity. The experiments were carried out for six different feature sizes which are 10, 30, 50, 100, 300 and 500. Experimental results show that the performance of EFS method is more successful than the other nine methods in most cases according to micro- F1 and macro- F1 scores.

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

Link

http://journals.sagepub.com/doi/pdf/10.1177/0165551521991037

Reference39 articles.

1. An improved global feature selection scheme for text classification

2. Variable Global Feature Selection Scheme for automatic classification of text documents

3. An ensemble scheme based on language function analysis and feature engineering for text genre classification

4. E-Mail Spam Filtering: A Review of Techniques and Trends

5. SMS Spam Message Detection using Term Frequency-Inverse Document Frequency and Random Forest Algorithm

Cited by 46 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An integrated approach to Bayesian weight regulations and multitasking learning methods for generating emotion-based content in the metaverse;Expert Systems with Applications;2025-01

2. A two-stage feature selection approach using hybrid elitist self-adaptive cat and mouse based optimization algorithm for document classification;Expert Systems with Applications;2024-11

3. Processing imbalanced medical data at the data level with assisted-reproduction data as an example;BioData Mining;2024-09-04

4. Integrating feature importance techniques and causal inference to enhance early detection of heart disease;2024-08-12

5. A RULE-BASED APPROACH USING THE ROUGH SET ON COVID-19 DATA;Eskişehir Osmangazi Üniversitesi Mühendislik ve Mimarlık Fakültesi Dergisi;2024-08-12