Abstract
Deciphering the genomic regulatory code of enhancers is a key challenge in biology because this code underlies cellular identity. A better understanding of how enhancers work will improve the interpretation of noncoding genome variation and empower the generation of cell type–specific drivers for gene therapy. Here, we explore the combination of deep learning and cross-species chromatin accessibility profiling to build explainable enhancer models. We apply this strategy to decipher the enhancer code in melanoma, a relevant case study owing to the presence of distinct melanoma cell states. We trained and validated a deep learning model, called DeepMEL, using chromatin accessibility data of 26 melanoma samples across six different species. We show the accuracy of DeepMEL predictions on the CAGI5 challenge, where it significantly outperforms existing models on the melanoma enhancer of IRF4. Next, we exploit DeepMEL to analyze enhancer architectures and identify accurate transcription factor binding sites for the core regulatory complexes in the two different melanoma states, with distinct roles for each transcription factor, in terms of nucleosome displacement or enhancer activation. Finally, DeepMEL identifies orthologous enhancers across distantly related species, where sequence alignment fails, and the model highlights specific nucleotide substitutions that underlie enhancer turnover. DeepMEL can be used from the Kipoi database to predict and optimize candidate enhancers and to prioritize enhancer mutations. In addition, our computational strategy can be applied to other cancer or normal cell types.
Funder
European Research Council Consolidator
KU Leuven
Foundation Against Cancer
Fonds Wetenschappelijk Onderzoek
Kom op tegen Kanker
Stand up to Cancer
Flemish Cancer Society
Stichting tegen Kanker
Foundation against Cancer
Belgian Cancer Society
CRB-Anim PIA1
Publisher
Cold Spring Harbor Laboratory
Subject
Genetics(clinical),Genetics
Reference119 articles.
1. Abadi M , Agarwal A , Barham P , Brevdo E , Chen Z , Citro C , Corrado GS , Davis A , Dean J , Devin M , 2016. Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv :1603.04467 [cs.DC].
2. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
3. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning
4. The origin and evolution of cell types
5. An alignment-free method to identify candidate orthologous enhancers in multiple Drosophila genomes
Cited by
75 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献