Hybrid supervised clustering based ensemble scheme for text classification-Reference-Cited by-同舟云学术

Hybrid supervised clustering based ensemble scheme for text classification

Published:2017-02-06 Issue:2 Volume:46 Page:330-348
ISSN:0368-492X
Container-title:Kybernetes
language:en
Short-container-title:K

Author:

Onan Aytug

Abstract

Purpose The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in information retrieval, such as document organization, text filtering and sentiment analysis. Ensemble learning has been extensively studied to construct efficient text classification schemes with higher predictive performance and generalization ability. The purpose of this paper is to provide diversity among the classification algorithms of ensemble, which is a key issue in the ensemble design. Design/methodology/approach An ensemble scheme based on hybrid supervised clustering is presented for text classification. In the presented scheme, supervised hybrid clustering, which is based on cuckoo search algorithm and k-means, is introduced to partition the data samples of each class into clusters so that training subsets with higher diversities can be provided. Each classifier is trained on the diversified training subsets and the predictions of individual classifiers are combined by the majority voting rule. The predictive performance of the proposed classifier ensemble is compared to conventional classification algorithms (such as Naïve Bayes, logistic regression, support vector machines and C4.5 algorithm) and ensemble learning methods (such as AdaBoost, bagging and random subspace) using 11 text benchmarks. Findings The experimental results indicate that the presented classifier ensemble outperforms the conventional classification algorithms and ensemble learning methods for text classification. Originality/value The presented ensemble scheme is the first to use supervised clustering to obtain diverse ensemble for text classification

Publisher

Emerald

Subject

Computer Science (miscellaneous),Social Sciences (miscellaneous),Theoretical Computer Science,Control and Systems Engineering,Engineering (miscellaneous)

Reference58 articles.

1. A survey of text classification algorithms,2012

2. Adapting k-means for supervised clustering;Applied Intelligence,2006

3. RFBoost: an improved multi-label boosting algorithm and its application to text categorization;Knowledge-Based Systems,2016

4. Probabilistic topic models;Communications of the ACM,2012

5. Latent Dirichlet allocation;Journal of Machine Learning Research,2003

Cited by 99 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comic exploration and Insights: Recent trends in LDA-Based recognition studies;Expert Systems with Applications;2024-12

2. An assessment of heterogenous ensemble classifiers for analyzing change‐proneness in open‐source software systems;Journal of Software: Evolution and Process;2024-02-24

3. Machine learning in concept drift detection using statistical measures;International Journal of Computers and Applications;2023-12-15

4. Assisting pre-delivery firmware quality assessments using ensemble learning;Journal of the Chinese Institute of Engineers;2023-10-04

5. Cluster-based ensemble learning model for improving sentiment classification of Arabic documents;Natural Language Engineering;2023-06-01