Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration-Reference-Cited by-同舟云学术

Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration

Published:2019-12-03 Issue:6 Volume:21 Page:2011-2030
ISSN:1477-4054
Container-title:Briefings in Bioinformatics
language:en
Short-container-title:

Author:

Pierre-Jean Morgane¹,Deleuze Jean-François²,Le Floch Edith³,Mauger Florence³

Affiliation:

1. CNRGH, Evry, France

2. Institute of Biology François Jacob, CEA

3. CNRGH

Abstract

Abstract Recent advances in NGS sequencing, microarrays and mass spectrometry for omics data production have enabled the generation and collection of different modalities of high-dimensional molecular data. The integration of multiple omics datasets is a statistical challenge, due to the limited number of individuals, the high number of variables and the heterogeneity of the datasets to integrate. Recently, a lot of tools have been developed to solve the problem of integrating omics data including canonical correlation analysis, matrix factorization and SM. These commonly used techniques aim to analyze simultaneously two or more types of omics. In this article, we compare a panel of 13 unsupervised methods based on these different approaches to integrate various types of multi-omics datasets: iClusterPlus, regularized generalized canonical correlation analysis, sparse generalized canonical correlation analysis, multiple co-inertia analysis (MCIA), integrative-NMF (intNMF), SNF, MoCluster, mixKernel, CIMLR, LRAcluster, ConsensusClustering, PINSPlus and multi-omics factor analysis (MOFA). We evaluate the ability of the methods to recover the subgroups and the variables that drive the clustering on eight benchmarks of simulation. MOFA does not provide any results on these benchmarks. For clustering, SNF, MoCluster, CIMLR, LRAcluster, ConsensusClustering and intNMF provide the best results. For variable selection, MoCluster outperforms the others. However, the performance of the methods seems to depend on the heterogeneity of the datasets (especially for MCIA, intNMF and iClusterPlus). Finally, we apply the methods on three real studies with heterogeneous data and various phenotypes. We conclude that MoCluster is the best method to analyze these omics data. Availability: An R package named CrIMMix is available on GitHub at https://github.com/CNRGH/crimmix to reproduce all the results of this article.

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Link

http://academic.oup.com/bib/article-pdf/21/6/2011/34672159/bbz138.pdf

Reference64 articles.

1. Methods of integrating data to uncover genotype–phenotype interactions;Ritchie;Nat Rev Genet,2015

2. Trans-omics: how to reconstruct biochemical networks across multiple omic layers;Yugi;Trends Biotechnol,2016

3. Multi-omics of single cells: strategies and applications;Bock;Trends Biotechnol,2016

4. Onco-multi-omics approach: a new frontier in cancer research;Chakraborty;Biomed Res Int,2018

5. Single cell multi-omics technology: methodology and application;Hu;Front Cell Dev Biol,2018

Cited by 67 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Molecular precision medicine: Multi-omics-based stratification model for acute myeloid leukemia;Heliyon;2024-09

2. COPS: A novel platform for multi-omic disease subtype discovery via robust multi-objective evaluation of clustering algorithms;PLOS Computational Biology;2024-08-05

3. Peer victimization in adolescence alters gene expression and cytokine profiles;2024-07-26

4. Comprehensive evaluation of disulfidptosis in intestinal immunity and biologic therapy response in Ulcerative Colitis;Heliyon;2024-07

5. Unveiling divergent treatment prognoses in IDHwt-GBM subtypes through multiomics clustering: a swift dual MRI-mRNA model for precise subtype prediction;Journal of Translational Medicine;2024-06-18