A variational autoencoder trained with priors from canonical pathways increases the interpretability of transcriptome data

Author:

Liu BinORCID,Rosenhahn Bodo,Illig Thomas,DeLuca David S.ORCID

Abstract

AbstractInterpreting transcriptome data is an important yet challenging aspect of bioinformatic analysis. While gene set enrichment analysis is a standard tool for interpreting regulatory changes, we utilize deep learning techniques, specifically autoencoder architectures, to learn latent variables that drive transcriptome signals. We investigate whether simple, variational autoencoder (VAE), and beta-weighted VAE are capable of learning reduced representations of transcriptomes that retain critical biological information. We propose a novel VAE which utilizes priors from biological data to direct the network to learn a representation of the transcriptome that is based on understandable biological concepts.After training five different autoencoder architectures on 22310 transcriptomes, we benchmarked their performance on organ and disease classification tasks on separate selection of 5577 test samples. Every tested architecture succeeded in reducing the transcriptomes to 50 latent dimensions, which captured enough variation for accurate reconstruction. The simple, fully connected autoencoder, performs best across the benchmarks, but lacks the characteristic of having directly interpretable latent dimensions. The beta-weighted, prior-informed VAE implementation is able to solve the benchmarking tasks, and provide semantically accurate latent features equating to biological pathways.This study opens a new direction for differential pathway analysis in transcriptomics with increased transparency and interpretability.Author summaryThe ability to measure the human transcriptome has been a critical tool to studying health and disease. However, transcriptomes data sets are too large and complex for direct human interpretation. Deep learning techniques such as autoencoders are capable of distilling high-level features from complex data. However, even if deep learning models find patterns, these patterns are not necessarily represented in a way that humans can easily understand. By bringing in the prior knowledge of biological pathways, we have trained the model to “speak the language” of the biologist, and represent complex transcrtomes, in simpler concepts that are already familiar to biologists. We can then apply the tool to compare for example samples from lung cancer cells to healthy cells, and show which biological processes are perturbed.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3