Abstract
SummaryIntegrative analysis of multiple data sets has the potential of fully leveraging the vast amount of high throughput biological data being generated. In particular such analysis will be powerful in making inference from publicly available collections of genetic, transcriptomic and epigenetic data sets which are designed to study shared biological processes, but which vary in their target measurements, biological variation, unwanted noise, and batch variation. Thus, methods that enable the joint analysis of multiple data sets are needed to gain insights into shared biological processes that would otherwise be hidden by unwanted intra-data set variation. Here, we propose a method called two-stage linked component analysis (2s-LCA) to jointly decompose multiple biologically related experimental data sets with biological and technological relationships that can be structured into the decomposition. The consistency of the proposed method is established and its empirical performance is evaluated via simulation studies. We apply 2s-LCA to jointly analyze four data sets focused on human brain development and identify meaningful patterns of gene expression in human neurogenesis that have shared structure across these data sets.
Publisher
Cold Spring Harbor Laboratory
Reference33 articles.
1. Multi‐Omics Factor Analysis—a framework for unsupervised integration of multi‐omics data sets
2. Regularized estimation of large covariance matrices;The Annals of Statistics,2008
3. Convex banding of the covariance matrix;Journal of the American Statistical Association,2016
4. BrainSpan, BrainSpan. (2011). Atlas of the developing human brain. Secondary BrainSpan: Atlas of the Developing Human Brain.
5. On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fpca;Bernoulli,2015
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献