Abstract
AbstractThe increasing number of publicly available bacterial gene expression data sets provides an unprecedented resource for the study of gene regulation in diverse conditions, but emphasizes the need for self-supervised methods for the automated generation of new hypotheses. One approach for inferring coordinated regulation from bacterial expression data is through the use of neural networks known as denoising autoencoders (DAEs), which encode large datasets in a reduced bottleneck layer. We have generalized this application of DAEs to include deep networks and explore the effects of network architecture on gene set inference using deep learning. We developed a DAE-based pipeline to extract gene sets from a large compendium of transcriptomic data inEscherichia coli, independently of the DAE network parameters and architecture. We validate our method by identifying many of the inferred gene sets with known pathways inE. coli, and have subsequently used this pipeline to explore how the choice of network architecture impacts gene sets recovery. We find that increasing network depth leads the DAEs to explain gene expression in terms of fewer, more concisely defined gene sets, and that adjusting the network compression results in a trade-off between generalizability and overall biological inference. Finally, leveraging our understanding of the impact of DAE architecture choices on gene set inference, we apply our pipeline to an independent uropathogenicE. colidataset collected directly from infected patients to identify genes which are uniquely induced during human colonization.
Publisher
Cold Spring Harbor Laboratory
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献