Abstract
In this paper, we consider prediction and variable selection in the misspecified binary classification models under the high-dimensional scenario. We focus on two approaches to classification, which are computationally efficient, but lead to model misspecification. The first one is to apply penalized logistic regression to the classification data, which possibly do not follow the logistic model. The second method is even more radical: we just treat class labels of objects as they were numbers and apply penalized linear regression. In this paper, we investigate thoroughly these two approaches and provide conditions, which guarantee that they are successful in prediction and variable selection. Our results hold even if the number of predictors is much larger than the sample size. The paper is completed by the experimental results.
Subject
General Physics and Astronomy
Reference43 articles.
1. The Elements of Statistical Learning; Data Mining, Inference and Prediction;Hastie,2001
2. Statistics for High-Dimensional Data: Methods, Theory and Applications;Bühlmann,2011
3. Regression Shrinkage and Selection Via the Lasso
4. High-dimensional graphs and variable selection with the Lasso
5. On Model Selection Consistency of Lasso;Zhao;J. Mach. Learn. Res.,2006
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献