Abstract
AbstractAccurately identifying enhancer-promoter interactions (EPIs) is challenging because enhancers usually act on the promoters of distant target genes. Although a variety of machine learning and deep learning models have been developed, many of them are not designed to or could not be well applied to predict EPIs in cell types different from the training data. In this study, we develop the TransEPI model for EPI prediction based on datasets derived from Hi-C and ChIA-PET data. TransEPI compiles genomic features from large intervals harboring the enhancer-promoter pair and adopts a Transformer-based architecture to capture the long-range dependencies. Thus, TransEPI could achieve more accurate prediction by addressing the impact of other genomic loci that may competitively interact with the enhancer-promoter pair. We evaluate TransEPI in a challenging scenario, where the independent test samples are predicted by models trained on the data from different cell types and chromosomes. TransEPI robustly predicts cross-cell-type EPI prediction by achieving comparable performance in cross-validation and independent test. More importantly, TransEPI significantly outperforms the state-of-the-art EPI models on the independent test datasets, with the Area Under Precision-Recall Curve (auPRC) score increasing by 48.84 % on average. Hence, TransEPI is applicable for accurate EPI prediction in cell types without chromatin structure data. Moreover, we find the TransEPI framework could also be extended to identify the target gene of non-coding mutations, which may facilitate studying pathogenic non-coding mutations.
Publisher
Cold Spring Harbor Laboratory
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献