A systematical approach to classification problems with feature space heterogeneity-Reference-Cited by-同舟云学术

A systematical approach to classification problems with feature space heterogeneity

Published:2019-10-07 Issue:9 Volume:48 Page:2006-2029
ISSN:0368-492X
Container-title:Kybernetes
language:en
Short-container-title:K

Author:

Xiao Hongshan,Wang Yu

Abstract

Purpose Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance. Design/methodology/approach A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification. Findings The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets. Research limitations/implications Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue. Practical implications Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems. Originality/value A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.

Publisher

Emerald

Subject

Computer Science (miscellaneous),Social Sciences (miscellaneous),Theoretical Computer Science,Control and Systems Engineering,Engineering (miscellaneous)

Reference52 articles.

1. Marketing models of consumer heterogeneity;Journal of Econometrics,1999

2. Evolving fuzzy classifiers using different model architectures;Fuzzy Sets and Systems,2008

3. Probabilistic modeling and visualization for bankruptcy prediction;Applied Soft Computing,2017

4. Decomposition of heterogeneous classification problems;Intelligent Data Analysis,1998

5. Avilcs-Cruz, C., Guerin-Deguc, A., Voz, J.L. and Van Cappel, D. (1999), “Enhanced learning for evolutive neural architecture (ELENA)”, Technical Report R3-B1-P, available at: www.dice.ucl.ac.be/neural-nets/Research/Projects/ELENA/elena.htm

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hierarchical visual semantic guidance for enhanced relationship recognition in domain knowledge graphs;Engineering Applications of Artificial Intelligence;2024-11

2. PD-BertEDL: An Ensemble Deep Learning Method Using BERT and Multivariate Representation to Predict Peptide Detectability;International Journal of Molecular Sciences;2022-10-16

3. Development and validity of computerized neuropsychological assessment devices for screening mild cognitive impairment: Ensemble of models with feature space heterogeneity and retrieval practice effect;Journal of Biomedical Informatics;2022-07

4. A Regression Model Tree Algorithm by Multi-task Learning;Industrial Engineering & Management Systems;2021-06-30