DeepHPV: a deep learning model to predict human papillomavirus integration sites

Author:

Tian Rui1,Zhou Ping2,Li Mengyuan3,Tan Jinfeng4,Cui Zifeng4,Xu Wei3,Wei Jingyue3,Zhu Jingjing5,Jin Zhuang4,Cao Chen6,Fan Weiwen7,Xie Weiling4,Huang Zhaoyue4,Xie Hongxian8,You Zeshan4,Niu Gang5,Wu Canbiao9,Guo Xiaofang10,Weng Xuchu9,Tian Xun11,Yu Fubing2,Yu Zhiying12,Liang Jiuxing9,Hu Zheng13

Affiliation:

1. Translational Medicine of the First Affiliated Hospital, Sun Yat-sen University

2. Dongguan Maternal and Child Health Care Hospital

3. Department of Obstetrics and Gynecology at the First Affiliated Hospital, Sun Yat-sen University

4. First Affiliated Hospital, Sun Yat-sen University

5. Department of Obstetrics and Gynecology of the First Affiliated Hospital, Sun Yat-sen University

6. Central Hospital of Wuhan, China

7. College of Medicine at the Sun Yat-sen University

8. GeneRulor Company Bio-X Lab

9. Institute for Brain Research and Rehabilitation at the South China Normal University

10. Department of Medical Oncology of the Eastern Hospital at the First Affiliated Hospital, Sun Yat-sen University

11. Central Hospital of Wuhan

12. Department of Gynecology, Shenzhen Second People's Hospital/the First Affiliated Hospital of Shenzhen University Health Science Center

13. Gynecological Oncology of the First Affiliated Hospital, Precision Medicine Institute, Sun Yat-sen University

Abstract

Abstract Human papillomavirus (HPV) integrating into human genome is the main cause of cervical carcinogenesis. HPV integration selection preference shows strong dependence on local genomic environment. Due to this theory, it is possible to predict HPV integration sites. However, a published bioinformatic tool is not available to date. Thus, we developed an attention-based deep learning model DeepHPV to predict HPV integration sites by learning environment features automatically. In total, 3608 known HPV integration sites were applied to train the model, and 584 reviewed HPV integration sites were used as the testing dataset. DeepHPV showed an area under the receiver-operating characteristic (AUROC) of 0.6336 and an area under the precision recall (AUPR) of 0.5670. Adding RepeatMasker and TCGA Pan Cancer peaks improved the model performance to 0.8464 and 0.8501 in AUROC and 0.7985 and 0.8106 in AUPR, respectively. Next, we tested these trained models on independent database VISDB and found the model adding TCGA Pan Cancer performed better (AUROC: 0.7175, AUPR: 0.6284) than the model adding RepeatMasker peaks (AUROC: 0.6102, AUPR: 0.5577). Moreover, we introduced attention mechanism in DeepHPV and enriched the transcription factor binding sites including BHLHA15, CHR, COUP-TFII, DMRTA2, E2A, HIC1, INR, NPAS, Nr5a2, RARa, SCL, Snail1, Sox10, Sox3, Sox4, Sox6, STAT6, Tbet, Tbx5, TEAD, Tgif2, ZNF189, ZNF416 near attention intensive sites. Together, DeepHPV is a robust and explainable deep learning model, providing new insights into HPV integration preference and mechanism. Availability: DeepHPV is available as an open-source software and can be downloaded from https://github.com/JiuxingLiang/DeepHPV.git, Contact: huzheng1998@163.com, liangjiuxing@m.scnu.edu.cn, lizheyzy@163.com

Funder

National Science and Technology

Ministry of science and technology of China

National Postdoctoral Program for Innovative Talents

China Postdoctoral Science Foundation

National Natural Science Foundation of China

Guangzhou Science and Technology Programme

National Ten Thousands Plan for Young Top Talents and Key Realm R&D Program of Guangdong Province

Gynecologic Malignant Tumors

Foundation of Health Commission of Hubei Province of China

Foundation of Wuhan Municipal Health Commission

Social Science and Technology Development

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3