Abstract
Generally, when developing classification models using supervised learning methods (e.g., support vector machine, neural network, and decision tree), feature selection, as a pre-processing step, is essential to reduce calculation costs and improve the generalization scores. In this regard, the minimum reference set (MRS), which is a feature selection algorithm, can be used. The original MRS considers a feature subset as effective if it leads to the correct classification of all samples by using the 1-nearest neighbor algorithm based on small samples. However, the original MRS is only applicable to numerical features, and the distances between different classes cannot be considered. Therefore, herein, we propose a novel feature subset evaluation algorithm, referred to as the “E2H distance-weighted MRS,” which can be used for a mixture of numerical and categorical features and considers the distances between different classes in the evaluation. Moreover, a Bayesian swap feature selection algorithm, which is used to identify an effective feature subset, is also proposed. The effectiveness of the proposed methods is verified based on experiments conducted using artificially generated data comprising a mixture of numerical and categorical features.
Funder
JSPS Grant-in-Aid for Scientific Research
Subject
General Economics, Econometrics and Finance
Reference33 articles.
1. A survey on feature selection methods;Chandrashekar;Comput. Electr. Eng.,2014
2. Gopika, N., and Kowshalaya, M. (2018, January 15–16). Correlation Based Feature Selection Algorithm for Machine Learning. Proceedings of the 3rd International Conference on Communication and Electronics Systems, Coimbatore, Tamil Nadu, India.
3. Feature Selection Based on Random Forest for Partial Discharges Characteristic Set;Yao;IEEE Access,2020
4. Yun, C., and Yang, J. (2007, January 28–31). Experimental comparison of feature subset selection methods. Proceedings of the Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007), Omaha, NE, USA.
5. Experimental Study of Information Measure and Inter-Intra Class Distance Ratios on Feature Selection and Orderings;Lin;IEEE Trans. Syst. Man Cybern.,1973
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献