Affiliation:
1. Intelligent Data Center, School of Mathematics, Sun Yat-Sen University, Guangzhou, 510275, China
Abstract
Abstract
With diverse types of omics data widely available, many computational methods have been recently developed to integrate these heterogeneous data, providing a comprehensive understanding of diseases and biological mechanisms. But most of them hardly take noise effects into account. Data-specific patterns unique to data types also make it challenging to uncover the consistent patterns and learn a compact representation of multi-omics data. Here we present a multi-omics integration method considering these issues. We explicitly model the error term in data reconstruction and simultaneously consider noise effects and data-specific patterns. We utilize a denoised network regularization in which we build a fused network using a denoising procedure to suppress noise effects and data-specific patterns. The error term collaborates with the denoised network regularization to capture data-specific patterns. We solve the optimization problem via an inexact alternating minimization algorithm. A comparative simulation study shows the method’s superiority at discovering common patterns among data types at three noise levels. Transcriptomics-and-epigenomics integration, in seven cancer cohorts from The Cancer Genome Atlas, demonstrates that the learned integrative representation extracted in an unsupervised manner can depict survival information. Specially in liver hepatocellular carcinoma, the learned integrative representation attains average Harrell’s C-index of 0.78 in 10 times 3-fold cross-validation for survival prediction, which far exceeds competing methods, and we discover an aggressive subtype in liver hepatocellular carcinoma with this latent representation, which is validated by an external dataset GSE14520. We also show that DeFusion is applicable to the integration of other omics types.
Funder
National Natural Science Foundation of China
Publisher
Oxford University Press (OUP)
Subject
Molecular Biology,Information Systems
Reference47 articles.
1. More is better: recent progress in multi-omics data integration methods;Huang;Front Genet,2017
2. A review on machine learning principles for multi-view biological data integration;Li;Brief Bioinform,2018
3. Multi-omic and multi-view clustering algorithms: review and cancer benchmark;Rappoport;Nucleic Acids Res,2018
4. A statistical framework for genomic data fusion;Lanckriet;Bioinformatics,2004
5. Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery;Speicher;Bioinformatics,2016
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献