An Empirical Bayes approach for the identification of long-range chromosomal interaction from Hi-C data
Author:
Zhang Qi1ORCID, Xu Zheng2, Lai Yutong3
Affiliation:
1. Department of Mathematics and Statistics , University of New Hampshire , Durham , NH 03824 , USA 2. Department of Mathematics and Statistics , Wright State University , Dayton , OH 45435 , USA 3. ClinChoice , Fort Washington , PA 19034 , USA
Abstract
Abstract
Hi-C experiments have become very popular for studying the 3D genome structure in recent years. Identification of long-range chromosomal interaction, i.e., peak detection, is crucial for Hi-C data analysis. But it remains a challenging task due to the inherent high dimensionality, sparsity and the over-dispersion of the Hi-C count data matrix. We propose EBHiC, an empirical Bayes approach for peak detection from Hi-C data. The proposed framework provides flexible over-dispersion modeling by explicitly including the “true” interaction intensities as latent variables. To implement the proposed peak identification method (via the empirical Bayes test), we estimate the overall distributions of the observed counts semiparametrically using a Smoothed Expectation Maximization algorithm, and the empirical null based on the zero assumption. We conducted extensive simulations to validate and evaluate the performance of our proposed approach and applied it to real datasets. Our results suggest that EBHiC can identify better peaks in terms of accuracy, biological interpretability, and the consistency across biological replicates. The source code is available on Github (https://github.com/QiZhangStat/EBHiC).
Publisher
Walter de Gruyter GmbH
Subject
Computational Mathematics,Genetics,Molecular Biology,Statistics and Probability
Reference44 articles.
1. Aguet, F., Brown, A.A., Castel, S.E., Davis, J.R., He, Y., Jo, B., Mohammadi, P., Park, Y., and Parsana, P., et al., GTEx Consortium (2017). Genetic effects on gene expression across human tissues. Nature 550: 204–213, (Epub 11 Oct 2017). https://doi.org/10.1038/nature24277. 2. Ay, F., Bailey, T.L., and Noble, W.S. (2014). Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res. 24: 999–1011. https://doi.org/10.1101/gr.160374.113. 3. Carty, M., Zamparo, L., Sahin, M., González, A., Pelossof, R., Elemento, O., and Leslie, C.S. (2017). An integrated model for detecting significant chromatin interactions from high-resolution Hi-C data. Nat. Commun. 8: 15454. https://doi.org/10.1038/ncomms15454. 4. Chen, H., Xiao, J., Shao, T., Wang, L., Bai, J., Lin, X., Ding, N., Qu, Y., Tian, Y., Chen, X., et al.. (2019). Landscape of enhancer-enhancer cooperative regulation during human cardiac commitment. Mol. Ther. Nucleic Acids 17: 840–851. https://doi.org/10.1016/j.omtn.2019.07.015. 5. Cideciyan, A.V., Zhao, X., Nielsen, L., Khani, S.C., Jacobson, S.G., and Palczewski, K. (1998). Null mutation in the rhodopsin kinase gene slows recovery kinetics of rod and cone phototransduction in man. Proc. Natl. Acad. Sci. U. S. A. 95: 328–333. https://doi.org/10.1073/pnas.95.1.328.
|
|