Affiliation:
1. Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
2. Department of Data Sciences and Operations, Marshall School of Business, University of Southern California, Los Angeles, California 90089;
Abstract
Due to rapid technological advances, researchers are now able to collect and analyze ever larger data sets. Statistical inference for big data often requires solving thousands or even millions of parallel inference problems simultaneously. This poses significant challenges and calls for new principles, theories, and methodologies. This review provides a selective survey of some recently developed methods and results for large-scale statistical inference, including detection, estimation, and multiple testing. We begin with the global testing problem, where the goal is to detect the existence of sparse signals in a data set, and then move to the problem of estimating the proportion of nonnull effects. Finally, we focus on multiple testing with false discovery rate (FDR) control. The FDR provides a powerful and practical approach to large-scale multiple testing and has been successfully used in a wide range of applications. We discuss several effective data-driven procedures and also present efficient strategies to handle various grouping, hierarchical, and dependency structures in the data.
Subject
Economics and Econometrics
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献