Large numbers of explanatory variables, a semi-descriptive analysis-Reference-Cited by-同舟云学术

Large numbers of explanatory variables, a semi-descriptive analysis

Published:2017-07-24 Issue:32 Volume:114 Page:8592-8595
ISSN:0027-8424
Container-title:Proceedings of the National Academy of Sciences
language:en
Short-container-title:Proc Natl Acad Sci USA

Author:

Cox D. R.,Battey H. S.

Abstract

Data with a relatively small number of study individuals and a very large number of potential explanatory features arise particularly, but by no means only, in genomics. A powerful method of analysis, the lasso [Tibshirani R (1996) J Roy Stat Soc B 58:267–288], takes account of an assumed sparsity of effects, that is, that most of the features are nugatory. Standard criteria for model fitting, such as the method of least squares, are modified by imposing a penalty for each explanatory variable used. There results a single model, leaving open the possibility that other sparse choices of explanatory features fit virtually equally well. The method suggested in this paper aims to specify simple models that are essentially equally effective, leaving detailed interpretation to the specifics of the particular study. The method hinges on the ability to make initially a very large number of separate analyses, allowing each explanatory feature to be assessed in combination with many other such features. Further stages allow the assessment of more complex patterns such as nonlinear and interactive dependences. The method has formal similarities to so-called partially balanced incomplete block designs introduced 80 years ago [Yates F (1936) J Agric Sci 26:424–455] for the study of large-scale plant breeding trials. The emphasis in this paper is strongly on exploratory analysis; the more formal statistical properties obtained under idealized assumptions will be reported separately.

Publisher

Proceedings of the National Academy of Sciences

Subject

Multidisciplinary

Reference9 articles.

1. Hastie T Tibshirani R Wainright M (2015) Statistical Learning with Sparsity: The Lasso and Generalizations (CRC, Boca Raton, FL).

2. van de Geer S (2016) Estimation and Testing Under Sparsity (Springer, Cham, Switzerland).

3. Empirical Bayes posterior concentration in sparse high-dimensional linear models;Martin;Bernoulli,2017

4. A new method of arranging variety trials involving a large number of varieties;Yates;J Agric Sci,1936

5. Tests of linearity, multivariate normality and the adequacy of linear scores;Cox;J Appl Stat,1994

Cited by 25 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. On some publications of Sir David Cox;Scandinavian Journal of Statistics;2024-09-11

2. Sir David R. Cox. 15 July 1924 — 18 January 2022;Biographical Memoirs of Fellows of the Royal Society;2024-08-21

3. Footbathing and Foot Trimming, and No Quarantine: Risks for High Prevalence of Lameness in a Random Sample of 269 Sheep Flocks in England, 2022;Animals;2024-07-14

4. PtWAVE: A High-Sensitive deconvolution software of sequencing trace for the Detection of Large Indels in Genome Editing;2024-04-17

5. On inference in high-dimensional logistic regression models with separated data;Biometrika;2023-11-02