Affiliation:
1. Department of Mathematics Grinnell College Grinnell Iowa USA
2. Department of Biostatistics University of Iowa Iowa City Iowa USA
Abstract
Penalized regression methods such as the lasso are a popular approach to analyzing high‐dimensional data. One attractive property of the lasso is that it naturally performs variable selection. An important area of concern, however, is the reliability of these selections. Motivated by local false discovery rate methodology from the large‐scale hypothesis testing literature, we propose a method for calculating a local false discovery rate for each variable under consideration by the lasso model. These rates can be used to assess the reliability of an individual feature, or to estimate the model's overall false discovery rate. The method can be used for any level of regularization. This is particularly useful for models with a few highly significant features but a high overall false discovery rate, a relatively common occurrence when using cross validation to select a model. It is also flexible enough to be applied to many varieties of penalized likelihoods including generalized linear models and Cox regression, and a variety of penalties, including the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) penalty. We demonstrate the validity of this approach and contrast it with other inferential methods for penalized regression as well as with local false discovery rates for univariate hypothesis tests. Finally, we show the practical utility of our method by applying it to a case study involving gene expression in breast cancer patients.
Subject
Statistics and Probability,Epidemiology
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献