Author:
Gupta Ayushi,Ahmad Saad,Sune Atharva,Gupta Chandan,Kaur Harleen,Kutum Rintu,Sethi Tavpritesh
Abstract
AbstractHigh throughput screening technologies have created a fundamental challenge for statistical and machine learning analyses, i.e., the curse of dimensionality. Gene expression data are a quintessential example, high dimensional in variables (Large P) and comparatively much smaller in samples (Small N). However, the large number of variables are not independent. This understanding is reflected in Systems Biology approaches to the transcriptome as a network of coordinated biological functioning or through principal Axes of variation underlying the gene expression. Recent advances in generative deep learning offers a new paradigm to tackle the curse of dimensionality by generating new data from the underlying latent space captured as a deep representation of the observed data. These have led to widespread applications of approaches such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), especially in domains where millions of data points exist, such as in computer vision and single cell data. Very few studies have focused on generative modeling of bulk transcriptomic data and microarrays, despite being one of the largest types of publicly available biomedical data. Here we review the potential of Generative models in recapitulating and extending biomedical knowledge from microarray data, which may thus limit the potential to yield hundreds of novel biomarkers. Here we review the potential of generative models and conduct a comparative analysis of VAE, GAN and gaussian mixture model (GMM) in a dataset focused on Tuberculosis. We further review whether previously known axes genes can be used as an effective strategy to employ domain knowledge while designing generative models as a means to further reduce biological noise and enhance signals that can be validated by standard enrichment approaches or functional experiments.
Publisher
Cold Spring Harbor Laboratory
Reference39 articles.
1. Draghici S Tarca AL , Romero R. Analysis of microarray experiments of gene expression profiling. American Journal of Obstetrics and Gynecology, 2006.
2. Badr A Zhang G Zhang J , Chiodini R. The impact of next-generation sequencing on genomics. J Genet Genomics, 2011.
3. Bishop DVM et al. Munafò MR , Nosek BA. A manifesto for reproducible science. Nature Human Behaviour, 2017.
4. Mokrysz C et al. Button KS , Ioannidis JPA. Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 2013.
5. Klein U ZTu Y , Stolovitzky G. Quantitative noise analysis for gene expression microarray experiments. Proceedings of the National Academy of Sciences of the United States of America., 2002.