Uncovering Cross-Cohort Molecular Features with Multi-Omics Integration Analysis

Author:

Jiang Min-ZhiORCID,Aguet François,Ardlie Kristin,Chen Jiawen,Cornell Elaine,Cruz Dan,Durda Peter,Gabriel Stacey B.,Gerszten Robert E.,Guo Xiuqing,Johnson Craig W.,Kasela Silva,Lange Leslie A.,Lappalainen Tuuli,Liu Yongmei,Reiner Alex P.,Smith Josh,Sofer TamarORCID,Taylor Kent D.,Tracy Russell P.,VanDenBerg David J.,Wilson James G.,Rich Stephen S.,Rotter Jerome I.ORCID,Love Michael I.,Raffield Laura M.ORCID,Li YunORCID, ,

Abstract

AbstractIntegrative approaches that simultaneously model multi-omics data have gained increasing popularity because they provide holistic system biology views of multiple or all components in a biological system of interest. Canonical correlation analysis (CCA) is a correlation-based integrative method. It was initially designed to extract latent features shared between two assays by finding the linear combinations of features – referred to as canonical vectors (CVs) – within each assay that achieve maximal across-assay correlation. Sparse multiple CCA (SMCCA), a widely-used derivative of CCA, allows more than two assays but can result in non-orthogonal CVs when applied to high-dimensional data. Here, we incorporated a variation of the Gram-Schmidt (GS) algorithm with SMCCA to improve orthogonality among CVs. Applying our SMCCA-GS method to proteomics and methylomics data from the Multi-Ethnic Study of Atherosclerosis (MESA) and Jackson Heart Study (JHS), we identified strong associations between blood cell counts and protein abundance. This finding suggests that adjustment of blood cell composition should be considered in protein-based association studies. Importantly, CVs obtained from two independent cohorts demonstrate transferability across the cohorts. For example, proteomic CVs learned from JHS explain similar amounts of blood cell count phenotypic variance in MESA, explaining 39.0% ~ 50.0% variation in JHS and 38.9% ~ 49.1% in MESA, similar transferability was observed for other omics-CV-trait pairs. This suggests that biologically meaningful and cohort-agnostic variation is captured by CVs. We further developed Sparse Supervised Multiple CCA (SSMCCA) to allow supervised integration analysis for more than two assays. We anticipate that applying our SMCCA-GS and SSMCCA on various cohorts would help identify cohort-agnostic biologically meaningful relationships between multi-omics data and phenotypic traits.Author SummaryComprehensive understanding of human complex traits may benefit from incorporation of molecular features from multiple biological layers such as genome, epigenome, transcriptome, proteome, and metabolome. CCA is a correlation-based method for multi-omics data which reduces the dimension of each omic assay to several orthogonal components – commonly referred to as canonical vectors (CVs). The widely-used SMCCA method allows effective dimension reduction and integration of multi-omics data, but suffers from potentially highly correlated CVs when applied to high-dimensional omics data. Here, we improve the statistical independence among the CVs by adopting a variation of the GS algorithm. We applied our SMCCA-GS method to proteomic and methylomic data from two cohort studies, MESA and JHS. Our results reveal a pronounced effect of blood cell counts on protein abundance, strongly suggesting blood cell composition adjustment in protein-based association studies may be necessary. Finally, we present SSMCCA which allows supervised CCA analysis for the association between one phenotype of interest and more than two assays. We anticipate that SMCCA-GS would help reveal meaningful system-level factors from biological processes involving features from multiple assays; and SSMCCA would further empower interrogation of these factors for phenotypic traits related to health and diseases.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3