Abstract
AbstractMotivationRNA-seq data is used for precision medicine (e.g., cancer predictions), which benefits from deep learning approaches to analyze complex gene expression data. However, transcriptomics datasets often have few samples compared to deep learning standards. Synthetic data generation is thus being explored to address this data scarcity. So far, only deep generative models such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) have been used for this aim. Considering the recent success of diffusion models (DM) in image generation, we propose the first generation pipeline that leverages the power of said diffusion models.ResultsThis paper presents two state-of-the-art diffusion models (DDPM and DDIM) and achieves their adaptation in the transcriptomics field. DM-generated data of L1000 landmark genes show better predictive performance over TCGA and GTEx datasets. We also compare linear and nonlinear reconstruction methods to recover the complete transcriptome. Results show that such reconstruction methods can boost the performances of diffusion models, as well as VAEs and GANs. Overall, the extensive comparison of various generative models using data quality indicators shows that diffusion models perform best and second-best, making them promising synthetic transcriptomics generators.Availability and implementationData processing and full code available at:https://forge.ibisc.univevry.fr/alacan/rna-diffusion.gitContactalice.lacan@univ-evry.frSupplementary informationSupplementary data are available atBioRxivonline.
Publisher
Cold Spring Harbor Laboratory
Reference34 articles.
1. Akiba, T. et al. (2019). Optuna: A next-generation hyperparameter optimization framework. In KDD.
2. Gene expression inference with deep learning
3. Dhariwal, P. and Nichol, A. Q. (2021). Diffusion models beat GANs on image synthesis. In NeurIPS, volume 34.
4. Goodfellow, I. et al. (2014). Generative adversarial nets. In NeurIPS, volume 27.
5. Heusel, M. et al. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, volume 30.