Abstract
AbstractChromatin interactions play an important role in genome architecture and regulation. The Hi-C assay generates such interactions maps genome-wide, but at relatively low resolutions (e.g., 5-25kb), which is substantially larger than the resolution of transcription factor binding sites or open chromatin sites that are potential sources of such interactions. To predict the sources of Hi-C identified interactions at a high resolution (e.g., 100bp), we developed a computational method that integrates ChIP-seq data of transcription factors and histone marks and DNase-seq data. Our method,χ-SCNN, uses this data to first train a Siamese Convolutional Neural Network (SCNN) to discriminate between called Hi-C interactions and non-interactions.χ-SCNN then predicts the high-resolution source of each Hi-C interaction using a feature attribution method. We show these predictions recover original Hi-C peaks after extending them to be coarser. We also showχ-SCNN predictions enrich for evolutionarily conserved bases, eQTLs, and CTCF motifs, supporting their biological significance.χ-SCNN provides an approach for analyzing important aspects of genome architecture and regulation at a higher resolution than previously possible.χ-SCNN software is available on GitHub (https://github.com/ernstlab/X-SCNN).
Publisher
Cold Spring Harbor Laboratory
Reference49 articles.
1. Abadi, M. et al., 2016. TensorFlow: A System for Large-Scale Machine Learning. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, pp.265–283.
2. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts
3. Ballard, D.H. , 1987. Modular Learning in Neural Networks. AAAI Proceedings, pp.279–284.
4. 10.1162/153244303322533223
5. Auto-Association by Multilayer Perceptrons and Singular Value Decomposition;Biological Cybernetics,1988