A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data-Reference-Cited by-同舟云学术

A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data

Published:2017 Issue: Volume:2017 Page:1-18
ISSN:1748-670X
Container-title:Computational and Mathematical Methods in Medicine
language:en
Short-container-title:Computational and Mathematical Methods in Medicine

Author:

Bommert Andrea¹^ORCID,Rahnenführer Jörg¹,Lang Michel¹

Affiliation:

1. Department of Statistics, TU Dortmund University, 44221 Dortmund, Germany

Abstract

Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria when fitting a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for the stability assessment behaviour it is most important that a measure contains a correction for chance or large numbers of chosen features. Then, we analyse Pareto fronts and conclude that it is possible to find models with a stable selection of few features without losing much predictive accuracy.

Funder

Deutsche Forschungsgemeinschaft

Publisher

Hindawi Limited

Subject

Applied Mathematics,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,Modeling and Simulation,General Medicine

Link

http://downloads.hindawi.com/journals/cmmm/2017/7907163.pdf

Reference40 articles.

1. Automatic model selection for high-dimensional survival analysis

2. Stability of feature selection algorithms: a study on high-dimensional spaces

3. Stable feature selection for biomarker discovery