Author:
Dey Kushal K.,Stephens Matthew
Abstract
AbstractEstimation of correlation matrices and correlations among variables is a ubiquitous problem in statistics. In many cases – especially when the number of observations is small relative to the number of variables – some kind of shrinkage or regularization is necessary to improve estimation accuracy. Here, we propose an Empirical Bayes shrinkage approach, CorShrink, which adaptively learns how much to shrink correlations by combining information across all pairs of variables. One key feature of CorShrink, which distinguishes it from most existing methods, is its flexibility in dealing with missing data. Indeed, CorShrink explicitly accounts for varying amounts of missingness among pairs of variables. Numerical studies suggest CorShrink is competitive with other popular correlation shrinkage methods, even when there is no missing data. We illustrate CorShrink on gene expression data from GTEx project, which suffers from extensive missing observations, and where existing methods struggle. We also illustrate its flexibility by applying it to estimate cosine similarities between word vectors from word2vec models, thereby generating more accurate word similarity rankings.
Publisher
Cold Spring Harbor Laboratory
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献