A study on using data clustering for feature extraction to improve the quality of classification-Reference-Cited by-同舟云学术

A study on using data clustering for feature extraction to improve the quality of classification

Published:2021-05-04 Issue:7 Volume:63 Page:1771-1805
ISSN:0219-1377
Container-title:Knowledge and Information Systems
language:en
Short-container-title:Knowl Inf Syst

Author:

Piernik Maciej^ORCID,Morzy Tadeusz

Abstract

AbstractThere is a certain belief among data science researchers and enthusiasts alike that clustering can be used to improve classification quality. Insofar as this belief is fairly uncontroversial, it is also very general and therefore produces a lot of confusion around the subject. There are many ways of using clustering in classification and it obviously cannot always improve the quality of predictions, so a question arises, in which scenarios exactly does it help? Since we were unable to find a rigorous study addressing this question, in this paper, we try to shed some light on the concept of using clustering for classification. To do so, we first put forward a framework for incorporating clustering as a method of feature extraction for classification. The framework is generic w.r.t. similarity measures, clustering algorithms, classifiers, and datasets and serves as a platform to answer ten essential questions regarding the studied subject. Each answer is formulated based on a separate experiment on 16 publicly available datasets, followed by an appropriate statistical analysis. After performing the experiments and analyzing the results separately, we discuss them from a global perspective and form general conclusions regarding using clustering as feature extraction for classification.

Funder

Narodowe Centrum Nauki

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Hardware and Architecture,Human-Computer Interaction,Information Systems,Software

Link

https://link.springer.com/content/pdf/10.1007/s10115-021-01572-6.pdf

Reference35 articles.

1. Alapati YK, Sindhu K (2016) Combining clustering with classification a technique to improve classification accuracy bibtex. Int J Comput Sci Eng 5(6):336–338

2. Bonet I, Saeys Y, Ábalo RG, García MM, Sanchez R, Van de Peer Y (2006) Feature extraction using clustering of protein. In: Martínez-Trinidad JF, Carrasco Ochoa JA, Kittler J (eds) Progress in pattern recognition. Image analysis and applications, Springer, Berlin, pp 614–623

3. Chakraborty T (2017) Ec3: Combining clustering and classification for ensemble learning. In: 2017 IEEE international conference on data mining (ICDM), pp 781–786

4. Cheng H, Yan X, Han J, Hsu CW (2007) Discriminative frequent pattern analysis for effective classification. In: Proceedings of the 2007 IEEE 23rd international conference on data engineering (ICDE ’07). IEEE Computer Society, Washington, DC, USA, pp 716–725

5. Cheng H, Yan X, Han J, Yu PS (2008) Direct discriminative pattern mining for effective classification. In: Proceedings of the 2008 IEEE 24th international conference on data engineering (ICDE ’08). IEEE Computer Society, Washington, DC, USA, pp 169–178

Cited by 23 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Damage Detection and Localization Methodology Based on Strain Measurements and Finite Element Analysis: Structural Health Monitoring in the Context of Industry 4.0;Aerospace;2024-08-30

2. Deep K-Means Algorithm Based Autoencoder for Outlier Detection;2024 IEEE 6th Symposium on Computers & Informatics (ISCI);2024-08-10

3. Prediction of weld bead cross-sectional area in wire arc additive manufacturing using vision system integrated with machine learning approach;International Journal on Interactive Design and Manufacturing (IJIDeM);2024-07-04

4. Uncontrolled dataset clustering to improve the quality indicators of multilevel data processing models;VESTN TOMSK GOS U-UP;2024

5. CLCC-FS(OBWOA): an efficient hybrid evolutionary algorithm for motor imagery electroencephalograph classification;Multimedia Tools and Applications;2024-02-15