Affiliation:
1. National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health, Bethesda, MD, USA
Abstract
Abstract
Objective Identifying disease-mutation relationships is a significant challenge in the advancement of precision medicine. The aim of this work is to design a tool that automates the extraction of disease-related mutations from biomedical text to advance database curation for the support of precision medicine.
Materials and Methods We developed a machine-learning (ML) based method to automatically identify the mutations mentioned in the biomedical literature related to a particular disease. In order to predict a relationship between the mutation and the target disease, several features, such as statistical features, distance features, and sentiment features, were constructed. Our ML model was trained with a pre-labeled dataset consisting of manually curated information about mutation-disease associations. The model was subsequently used to extract disease-related mutations from larger biomedical literature corpora.
Results The performance of the proposed approach was assessed using a benchmarking dataset. Results show that our proposed approach gains significant improvement over the previous state of the art and obtains F-measures of 0.880 and 0.845 for prostate and breast cancer mutations, respectively.
Discussion To demonstrate its utility, we applied our approach to all abstracts in PubMed for 3 diseases (including a non-cancer disease). The mutations extracted were then manually validated against human-curated databases. The validation results show that the proposed approach is useful in a real-world setting to extract uncurated disease mutations from the biomedical literature.
Conclusions The proposed approach improves the state of the art for mutation-disease extraction from text. It is scalable and generalizable to identify mutations for any disease at a PubMed scale.
Publisher
Oxford University Press (OUP)
Reference27 articles.
1. Personalized medicine: challenges and opportunities for translational bioinformatics;Overby;Personalized Med.,2013
2. ClinVar: public archive of relationships among sequence variation and human phenotype;Landrum;Nucleic Acids Res.,2014
3. Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature;Doughty;Bioinformatics.,2011
4. PubTator: a web-based text mining tool for assisting Bio curation.;Wei;Nucleic Acids Res,2015
5. Adapting a natural language processing tool to facilitate clinical trial curation for personalized cancer therapy;Zeng;AMIA Summits on Translational Sci Proceed.,2014
Cited by
53 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献