AdaReg: data adaptive robust estimation in linear regression with application in GTEx gene expressions
Author:
Wang Meng1, Jiang Lihua1, Snyder Michael P.1
Affiliation:
1. Department of Genetics , Stanford University , Stanford , 94305 , USA
Abstract
Abstract
The Genotype-Tissue Expression (GTEx) project provides a valuable resource of large-scale gene expressions across multiple tissue types. Under various technical noise and unknown or unmeasured factors, how to robustly estimate the major tissue effect becomes challenging. Moreover, different genes exhibit heterogeneous expressions across different tissue types. Therefore, we need a robust method which adapts to the heterogeneities of gene expressions to improve the estimation for the tissue effect. We followed the approach of the robust estimation based on γ-density-power-weight in the works of Fujisawa, H. and Eguchi, S. (2008). Robust parameter estimation with a small bias against heavy contamination. J. Multivariate Anal. 99: 2053–2081 and Windham, M.P. (1995). Robustifying model fitting. J. Roy. Stat. Soc. B: 599–609, where γ is the exponent of density weight which controls the balance between bias and variance. As far as we know, our work is the first to propose a procedure to tune the parameter γ to balance the bias-variance trade-off under the mixture models. We constructed a robust likelihood criterion based on weighted densities in the mixture model of Gaussian population distribution mixed with unknown outlier distribution, and developed a data-adaptive γ-selection procedure embedded into the robust estimation. We provided a heuristic analysis on the selection criterion and found that our practical selection trend under various γ’s in average performance has similar capability to capture minimizer γ as the inestimable mean squared error (MSE) trend from our simulation studies under a series of settings. Our data-adaptive robustifying procedure in the linear regression problem (AdaReg) showed a significant advantage in both simulation studies and real data application in estimating tissue effect of heart samples from the GTEx project, compared to the fixed γ procedure and other robust methods. At the end, the paper discussed some limitations on this method and future work.
Publisher
Walter de Gruyter GmbH
Subject
Computational Mathematics,Genetics,Molecular Biology,Statistics and Probability
Reference36 articles.
1. Arias-Castro, E. and Wang, M. (2017). Distribution-free tests for sparse heterogeneous mixtures. Test 26: 71–94. https://doi.org/10.1007/s11749-016-0499-x. 2. Basu, A., Harris, I.R., Hjort, N.L., and Jones, M. (1998). Robust and efficient estimation by minimising a density power divergence. Biometrika 85: 549–559. https://doi.org/10.1093/biomet/85.3.549. 3. Bates, D., Chambers, J., Dalgaard, P., Gentleman, R., Hornik, K., Ihaka, R., Kalibera, T., Lawrence, M., Leisch, F., Ligges, U., et al.. (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 4. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. B 57: 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x. 5. Chen, T.-L., Hsieh, D.-N., Hung, H., Tu, I.-P., Wu, P.-S., Wu, Y.-M., Chang, W.-H., Huang, S.-Y. (2014). gamma-sup: a clustering algorithm for cryo-electron microscopy images of asymmetric particles. Ann. Appl. Stat. 8: 259–285. https://doi.org/10.1214/13-aoas680.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|