Affiliation:
1. Department of Mathematics University of Geneva Geneva Switzerland
2. School of Science National University of Defense Technology Changsha China
3. Securities Institute for Financial Studies Shandong University Jinan China
Abstract
AbstractIn multiple regression, when covariates are numerous, it is often reasonable to assume that only a small number of them has predictive information. In some medical applications for instance, it is believed that only a few genes out of thousands are responsible for cancer. In that case, the aim is not only to propose a good fit, but also to select the relevant covariates (genes). We propose to perform model selection with additive models in high dimensions (sample size and number of covariates). Our approach is computationally efficient thanks to fast wavelet transforms, it does not rely on cross validation, and it solves a convex optimization problem for a prescribed penalty parameter, called the quantile universal threshold. We also propose a second rule based on Stein unbiased risk estimation geared toward prediction. We use Monte Carlo simulations and real data to compare various methods based on false discovery rate (FDR), true positive rate (TPR) and mean squared error. Our approach is the only one to handle high dimensions, and has a good FDR–TPR trade‐off.
Funder
China Scholarship Council
Subject
Statistics, Probability and Uncertainty,Statistics and Probability