Consistency and overfitting of multi-omics methods on experimental data-Reference-Cited by-同舟云学术

Consistency and overfitting of multi-omics methods on experimental data

Published:2019-07-04 Issue:4 Volume:21 Page:1277-1284
ISSN:1477-4054
Container-title:Briefings in Bioinformatics
language:en
Short-container-title:

Author:

McCabe Sean D¹,Lin Dan-Yu²,Love Michael I¹

Affiliation:

1. Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA

2. Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA

Abstract

Abstract Knowledge on the relationship between different biological modalities (RNA, chromatin, etc.) can help further our understanding of the processes through which biological components interact. The ready availability of multi-omics datasets has led to the development of numerous methods for identifying sources of common variation across biological modalities. However, evaluation of the performance of these methods, in terms of consistency, has been difficult because most methods are unsupervised. We present a comparison of sparse multiple canonical correlation analysis (Sparse mCCA), angle-based joint and individual variation explained (AJIVE) and multi-omics factor analysis (MOFA) using a cross-validation approach to assess overfitting and consistency. Both large and small-sample datasets were used to evaluate performance, and a permuted null dataset was used to identify overfitting through the application of our framework and approach. In the large-sample setting, we found that all methods demonstrated consistency and lack of overfitting; however, in the small-sample size setting, AJIVE provided the most stable results. We provide an R package so that our framework and approach can be applied to evaluate other methods and datasets.

Funder

National Institutes of Health

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Link

http://academic.oup.com/bib/article-pdf/21/4/1277/33583468/bbz070.pdf

Reference22 articles.

1. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis;Shen;Bioinformatics,2010

3. I-boost: an integrative boosting approach for predicting survival time with multiple genomics platforms;Wong;Genome Biol,2019

4. Extensions of sparse canonical correlation analysis with applications to genomic data;Witten;Stat Appl Genet Mol Biol,2009