Evaluating Sample Augmentation in Microarray Datasets with Generative Models: A Comparative Pipeline and Insights in Tuberculosis

Author:

Gupta Ayushi,Ahmad Saad,Sune Atharva,Gupta Chandan,Kaur Harleen,Kutum Rintu,Sethi Tavpritesh

Abstract

AbstractHigh throughput screening technologies have created a fundamental challenge for statistical and machine learning analyses, i.e., the curse of dimensionality. Gene expression data are a quintessential example, high dimensional in variables (Large P) and comparatively much smaller in samples (Small N). However, the large number of variables are not independent. This understanding is reflected in Systems Biology approaches to the transcriptome as a network of coordinated biological functioning or through principal Axes of variation underlying the gene expression. Recent advances in generative deep learning offers a new paradigm to tackle the curse of dimensionality by generating new data from the underlying latent space captured as a deep representation of the observed data. These have led to widespread applications of approaches such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), especially in domains where millions of data points exist, such as in computer vision and single cell data. Very few studies have focused on generative modeling of bulk transcriptomic data and microarrays, despite being one of the largest types of publicly available biomedical data. Here we review the potential of Generative models in recapitulating and extending biomedical knowledge from microarray data, which may thus limit the potential to yield hundreds of novel biomarkers. Here we review the potential of generative models and conduct a comparative analysis of VAE, GAN and gaussian mixture model (GMM) in a dataset focused on Tuberculosis. We further review whether previously known axes genes can be used as an effective strategy to employ domain knowledge while designing generative models as a means to further reduce biological noise and enhance signals that can be validated by standard enrichment approaches or functional experiments.

Publisher

Cold Spring Harbor Laboratory

Reference39 articles.

1. Draghici S Tarca AL , Romero R. Analysis of microarray experiments of gene expression profiling. American Journal of Obstetrics and Gynecology, 2006.

2. Badr A Zhang G Zhang J , Chiodini R. The impact of next-generation sequencing on genomics. J Genet Genomics, 2011.

3. Bishop DVM et al. Munafò MR , Nosek BA. A manifesto for reproducible science. Nature Human Behaviour, 2017.

4. Mokrysz C et al. Button KS , Ioannidis JPA. Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 2013.

5. Klein U ZTu Y , Stolovitzky G. Quantitative noise analysis for gene expression microarray experiments. Proceedings of the National Academy of Sciences of the United States of America., 2002.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3