Two-stage linked component analysis for joint decomposition of multiple biologically related data sets

Author:

Chen Huan1,Caffo Brian1,Stein-O’Brien Genevieve2ORCID,Liu Jinrui3,Langmead Ben4ORCID,Colantuoni Carlo5ORCID,Xiao Luo6ORCID

Affiliation:

1. Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health , Baltimore, MD, 21205, USA

2. Department of Neuroscience, Johns Hopkins University , Baltimore, MD, 21205, USA

3. Department of Neurology, Johns Hopkins University , Baltimore, MD, 21287, USA

4. Department of Computer Science, Johns Hopkins University , Baltimore, MD, 21218, USA

5. Department of Neuroscience, Johns Hopkins University , Baltimore, MD, 21205, USA, Department of Neurology, Johns Hopkins University, Baltimore, MD, 21287, USA and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA

6. Department of Statistics, North Carolina State University , Raleigh, North Carolina, 27607, USA

Abstract

SUMMARY Integrative analysis of multiple data sets has the potential of fully leveraging the vast amount of high throughput biological data being generated. In particular such analysis will be powerful in making inference from publicly available collections of genetic, transcriptomic and epigenetic data sets which are designed to study shared biological processes, but which vary in their target measurements, biological variation, unwanted noise, and batch variation. Thus, methods that enable the joint analysis of multiple data sets are needed to gain insights into shared biological processes that would otherwise be hidden by unwanted intra-data set variation. Here, we propose a method called two-stage linked component analysis (2s-LCA) to jointly decompose multiple biologically related experimental data sets with biological and technological relationships that can be structured into the decomposition. The consistency of the proposed method is established and its empirical performance is evaluated via simulation studies. We apply 2s-LCA to jointly analyze four data sets focused on human brain development and identify meaningful patterns of gene expression in human neurogenesis that have shared structure across these data sets.

Funder

National Institute of Health

NIH

National Institute of Biomedical Imaging and Bioengineering

NIBIB

National Institute of Neurological Disorders

Kavli NDS Distinguished Postdoctoral Fellowship and Johns Hopkins Provost Postdoctoral Fellowship

Johns Hopkins University Discovery Award 2019

Publisher

Oxford University Press (OUP)

Subject

Statistics, Probability and Uncertainty,General Medicine,Statistics and Probability

Reference33 articles.

1. Multi-omics factor analysis—a framework for unsupervised integration of multi-omics data sets;Argelaguet,;Molecular Systems Biology,2018

2. Regularized estimation of large covariance matrices;Bickel,;The Annals of Statistics,2008

3. Convex banding of the covariance matrix;Bien,;Journal of the American Statistical Association,2016

4. Atlas of the developing human brain;Secondary BrainSpan: Atlas of the Developing Human Brain,2011

5. On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA;Bunea,;Bernoulli,2015

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3