Abstract
Purpose
Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance.
Design/methodology/approach
A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification.
Findings
The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets.
Research limitations/implications
Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue.
Practical implications
Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems.
Originality/value
A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.
Subject
Computer Science (miscellaneous),Social Sciences (miscellaneous),Theoretical Computer Science,Control and Systems Engineering,Engineering (miscellaneous)
Reference52 articles.
1. Marketing models of consumer heterogeneity;Journal of Econometrics,1999
2. Evolving fuzzy classifiers using different model architectures;Fuzzy Sets and Systems,2008
3. Probabilistic modeling and visualization for bankruptcy prediction;Applied Soft Computing,2017
4. Decomposition of heterogeneous classification problems;Intelligent Data Analysis,1998
5. Avilcs-Cruz, C., Guerin-Deguc, A., Voz, J.L. and Van Cappel, D. (1999), “Enhanced learning for evolutive neural architecture (ELENA)”, Technical Report R3-B1-P, available at: www.dice.ucl.ac.be/neural-nets/Research/Projects/ELENA/elena.htm
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献