Affiliation:
1. Department of Computer Science, The University of Hong Kong, Hong Kong, 999077, China
Abstract
Abstract
Relation extraction (RE) is a fundamental task for extracting gene–disease associations from biomedical text. Many state-of-the-art tools have limited capacity, as they can extract gene–disease associations only from single sentences or abstract texts. A few studies have explored extracting gene–disease associations from full-text articles, but there exists a large room for improvements. In this work, we propose RENET2, a deep learning-based RE method, which implements Section Filtering and ambiguous relations modeling to extract gene–disease associations from full-text articles. We designed a novel iterative training data expansion strategy to build an annotated full-text dataset to resolve the scarcity of labels on full-text articles. In our experiments, RENET2 achieved an F1-score of 72.13% for extracting gene–disease associations from an annotated full-text dataset, which was 27.22, 30.30, 29.24 and 23.87% higher than BeFree, DTMiner, BioBERT and RENET, respectively. We applied RENET2 to (i) ∼1.89M full-text articles from PubMed Central and found ∼3.72M gene–disease associations; and (ii) the LitCovid articles and ranked the top 15 proteins associated with COVID-19, supported by recent articles. RENET2 is an efficient and accurate method for full-text gene–disease association extraction. The source-code, manually curated abstract/full-text training data, and results of RENET2 are available at GitHub.
Publisher
Oxford University Press (OUP)
Reference38 articles.
1. Reporting, appraising, and integrating data on genotype prevalence and gene–disease associations;Little;Am. J. Epidemiol.,2002
2. PubMed Central: the GenBank of the published literature;Roberts;Proc. Natl Acad. Sci. U.S.A.,2001
3. Biomedical text mining for research rigor and integrity: tasks, challenges, directions;Kilicoglu;Brief. Bioinform.,2018
4. A review of relation extraction;Bach;Literat. Rev. Lang. Stat. II,2007
5. A survey of named entity recognition and classification;Nadeau;Lingvist. Investig.,2007
Cited by
13 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献