Affiliation:
1. Faculty of Information Technology, Ho Chi Minh City Open University, Ho Chi Minh City 700000, Vietnam
Abstract
In this paper, we propose a semisupervised feature selection approach that is based on feature clustering and hypothesis margin maximization. The aim is to improve the classification accuracy by choosing the right feature subset and to allow building more interpretable models. Our approach handles the two core aspects of feature selection, i.e., relevance and redundancy, and is divided into three steps. First, the similarity weights between features are represented by a sparse graph where each feature can be reconstructed from the sparse linear combination of the others. Second, features are then hierarchically clustered identifying groups of the most similar ones. Finally, a semisupervised margin-based objective function is optimized to select the most data discriminative feature from within each cluster, hence maximizing relevance while minimizing redundancy among features. Eventually, we empirically validate our proposed approach on multiple well-known UCI benchmark datasets in terms of classification accuracy and representation entropy, where it proved to outperform four other semisupervised and unsupervised methods and competed with two widely used supervised ones.
Funder
Ho Chi Minh City Open University
Subject
General Mathematics,General Medicine,General Neuroscience,General Computer Science
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献