Adversarial deconfounding autoencoder for learning robust gene expression embeddings-Reference-Cited by-同舟云学术

Adversarial deconfounding autoencoder for learning robust gene expression embeddings

Published:2020-12 Issue:Supplement_2 Volume:36 Page:i573-i582
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Dincer Ayse B¹,Janizek Joseph D¹²,Lee Su-In¹

Affiliation:

1. Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA 98195, USA

2. Medical Scientist Training Program, University of Washington, Seattle, WA 98195, USA

Abstract

Abstract Motivation Increasing number of gene expression profiles has enabled the use of complex models, such as deep unsupervised neural networks, to extract a latent space from these profiles. However, expression profiles, especially when collected in large numbers, inherently contain variations introduced by technical artifacts (e.g. batch effects) and uninteresting biological variables (e.g. age) in addition to the true signals of interest. These sources of variations, called confounders, produce embeddings that fail to transfer to different domains, i.e. an embedding learned from one dataset with a specific confounder distribution does not generalize to different distributions. To remedy this problem, we attempt to disentangle confounders from true signals to generate biologically informative embeddings. Results In this article, we introduce the Adversarial Deconfounding AutoEncoder (AD-AE) approach to deconfounding gene expression latent spaces. The AD-AE model consists of two neural networks: (i) an autoencoder to generate an embedding that can reconstruct original measurements, and (ii) an adversary trained to predict the confounder from that embedding. We jointly train the networks to generate embeddings that can encode as much information as possible without encoding any confounding signal. By applying AD-AE to two distinct gene expression datasets, we show that our model can (i) generate embeddings that do not encode confounder information, (ii) conserve the biological signals present in the original space and (iii) generalize successfully across different confounder domains. We demonstrate that AD-AE outperforms standard autoencoder and other deconfounding approaches. Availability and implementation Our code and data are available at https://gitlab.cs.washington.edu/abdincer/ad-ae. Contact Supplementary information Supplementary data are available at Bioinformatics online.

Funder

National Institutes of Health

National Science Foundation

CAREER

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

https://academic.oup.com/bioinformatics/article-pdf/36/Supplement_2/i573/35337257/btaa796.pdf

Reference53 articles.

1. Exploring single-cell data with deep multitasking neural networks;Amodio;Nat. Methods,2019

2. Adjustment of systematic microarray data biases;Benito;Bioinformatics,2004

3. Integrating structured biological data by Kernel Maximum Mean Discrepancy;Borgwardt;Bioinformatics,2006

Cited by 39 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multi-task deep latent spaces for cancer survival and drug sensitivity prediction;Bioinformatics;2024-09-01

2. scCross: a deep generative model for unifying single-cell multi-omics with seamless integration, cross-modal generation, and in silico exploration;Genome Biology;2024-07-29

3. Cross-Domain Feature Disentanglement for Interpretable Modeling of Tumor Microenvironment Impact on Drug Response;IEEE Journal of Biomedical and Health Informatics;2024-07

4. Computing linkage disequilibrium aware genome embeddings using autoencoders;Bioinformatics;2024-05-22

5. Enhancing Gene Expression Representation and Drug Response Prediction with Data Augmentation and Gene Emphasis;2024-05-18