Author:
Vacher Michael,Canovas Rodrigo,Laws Simon M.,Doecke James D.
Abstract
BackgroundComplex disorders, such as Alzheimer’s disease (AD), result from the combined influence of multiple biological and environmental factors. The integration of high-throughput data from multiple omics platforms can provide system overviews, improving our understanding of complex biological processes underlying human disease. In this study, integrated data from four omics platforms were used to characterise biological signatures of AD.MethodThe study cohort consists of 455 participants (Control:148, Cases:307) from the Religious Orders Study and Memory and Aging Project (ROSMAP). Genotype (SNP), methylation (CpG), RNA and proteomics data were collected, quality-controlled and pre-processed (SNP = 130; CpG = 83; RNA = 91; Proteomics = 119). Using a diagnosis of Mild Cognitive Impairment (MCI)/AD combined as the target phenotype, we first used Partial Least Squares Regression as an unsupervised classification framework to assess the prediction capabilities for each omics dataset individually. We then used a variation of the sparse generalized canonical correlation analysis (sGCCA) to assess predictions of the combined datasets and identify multi-omics signatures characterising each group of participants.ResultsAnalysing datasets individually we found methylation data provided the best predictions with an accuracy of 0.63 (95%CI = [0.54–0.71]), followed by RNA, 0.61 (95%CI = [0.52–0.69]), SNP, 0.59 (95%CI = [0.51–0.68]) and proteomics, 0.58 (95%CI = [0.51–0.67]). After integration of the four datasets, predictions were dramatically improved with a resulting accuracy of 0.95 (95% CI = [0.89–0.98]).ConclusionThe integration of data from multiple platforms is a powerful approach to explore biological systems and better characterise the biological signatures of AD. The results suggest that integrative methods can identify biomarker panels with improved predictive performance compared to individual platforms alone. Further validation in independent cohorts is required to validate and refine the results presented in this study.