Abstract
AbstractThe availability of thousands of assays of epigenetic activity necessitates compressed representations of these data sets that summarize the epigenetic landscape of the genome. Until recently, most such representations were celltype specific, applying to a single tissue or cell state. Recently, neural networks have made it possible to summarize data across tissues to produce a pan-celltype representation. In this work, we propose Epi-LSTM, a deep long short-term memory (LSTM) recurrent neural network autoencoder to capture the long-term dependencies in the epigenomic data. The latent representations from Epi-LSTM capture a variety of genomic phenomena, including gene-expression, promoter-enhancer interactions, replication timing, frequently interacting regions and evolutionary conservation. These representations outperform existing methods in a majority of cell-types, while yielding smoother representations along the genomic axis due to their sequential nature.
Publisher
Cold Spring Harbor Laboratory
Reference71 articles.
1. The Roadmap Epigenomics Mapping Consortium. [Online]. Available: http://www.roadmapepigenomics.org/
2. Encyclopedia of DNA Elements. [Online]. Available: https://www.encodeproject.org/
3. Unsupervised pattern discovery in human chromatin structure through genomic segmentation;Nature methods,2012
4. ChromHMM: automating chromatin-state discovery and characterization;Nature methods,2012
5. Unsupervised segmentation of continuous genomic data