Abstract
AbstractSequence-based deep learning models, particularly convolutional neural networks (CNNs), have shown superior performance on a wide range of genomic tasks. A key limitation of these models is the lack of interpretability, slowing down their adoption by the genomics community. Current approaches to model interpretation do not readily reveal how a model makes predictions, can be computationally intensive, and depend on the implemented architecture. Here, we introduce ExplaiNN, an adaptation of neural additive models[1] for genomic tasks wherein predictions are computed as a linear combination of multiple independent CNNs, each consisting of a single convolutional filter and fully connected layers. This approach brings together the expressiveness of CNNs with the interpretability of linear models, providing global (cell state level) as well as local (individual sequence level) biological insights into the data. We use ExplaiNN to predict transcription factor (TF) binding and chromatin accessibility states, demonstrating performance levels comparable to state-of-the-art methods, while providing a transparent view of the model’s predictions in a straightforward manner. Applied tode novomotif discovery, ExplaiNN identifies equivalent motifs to those obtained from specialized algorithms across a range of datasets. Finally, we present ExplaiNN as a plug-and-play platform in which pretrained TF binding models and annotated position weight matrices from reference databases can be easily combined. We expect that ExplaiNN will accelerate the adoption of deep learning by biological domain experts in their daily genomic sequence analyses.
Publisher
Cold Spring Harbor Laboratory
Reference76 articles.
1. Agarwal R , Melnick L , Frosst N , Zhang X , Lengerich B , Caruana R , et al. Neural Additive Models: Interpretable Machine Learning with Neural Nets. ArXiv200413912 Cs Stat [Internet]. 2021 [cited 2022 Apr 4]; Available from: http://arxiv.org/abs/2004.13912
2. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position;Nat Methods. Nature Publishing Group,2013
3. Genome-Wide Mapping of in Vivo Protein-DNA Interactions;Science. American Association for the Advancement of Science,2007
4. Machine learning applications in genetics and genomics;Nat Rev Genet. Nature Publishing Group,2015
5. Deep learning: new computational modelling techniques for genomics;Nat Rev Genet. Nature Publishing Group,2019
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献