Affiliation:
1. Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
Abstract
Background:
Dimension disaster is often associated with feature extraction. The extracted features may contain more redundant feature information, which leads to the limitation of computing ability and overfitting problems.
Objective:
Feature selection is an important strategy to overcome the problems from dimension disaster. In most machine learning tasks, features determine the upper limit of the model performance. Therefore, more and more feature selection methods should be developed to optimize features.
Methods:
In this paper, we introduce a new technique to optimize sequence features based on the binomial distribution (BD). Firstly, the principle of the binomial distribution algorithm is introduced in detail. Then, the proposed algorithm is compared with other commonly used feature selection methods on three different types of datasets by using a Random Forest classifier with the same parameters.
Results and Conclusion:
The results confirm that BD has a promising improvement in feature selection and classification accuracy. Finally, we provide the source code and executable program package (http://lin-group.cn/server/BDselect/), by which users can easily perform our algorithm in their research.
Publisher
Bentham Science Publishers Ltd.
Subject
Computational Mathematics,Genetics,Molecular Biology,Biochemistry
Cited by
18 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献