Functional evaluation of out-of-the-box text-mining tools for data-mining tasks-Reference-Cited by-同舟云学术

Functional evaluation of out-of-the-box text-mining tools for data-mining tasks

Published:2014-10-21 Issue:1 Volume:22 Page:121-131
ISSN:1527-974X
Container-title:Journal of the American Medical Informatics Association
language:en
Short-container-title:

Author:

Jung Kenneth¹,LePendu Paea²,Iyer Srinivasan²,Bauer-Mehren Anna²,Percha Bethany¹,Shah Nigam H²

Affiliation:

1. Program in Biomedical Informatics, Stanford University, Stanford, California, USA

2. Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA

Abstract

Abstract Objective The trade-off between the speed and simplicity of dictionary-based term recognition and the richer linguistic information provided by more advanced natural language processing (NLP) is an area of active discussion in clinical informatics. In this paper, we quantify this trade-off among text processing systems that make different trade-offs between speed and linguistic understanding. We tested both types of systems in three clinical research tasks: phase IV safety profiling of a drug, learning adverse drug–drug interactions, and learning used-to-treat relationships between drugs and indications. Materials We first benchmarked the accuracy of the NCBO Annotator and REVEAL in a manually annotated, publically available dataset from the 2008 i2b2 Obesity Challenge. We then applied the NCBO Annotator and REVEAL to 9 million clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE) and used the resulting data for three research tasks. Results There is no significant difference between using the NCBO Annotator and REVEAL in the results of the three research tasks when using large datasets. In one subtask, REVEAL achieved higher sensitivity with smaller datasets. Conclusions For a variety of tasks, employing simple term recognition methods instead of advanced NLP methods results in little or no impact on accuracy when using large datasets. Simpler dictionary-based methods have the advantage of scaling well to very large datasets. Promoting the use of simple, dictionary-based methods for population level analyses can advance adoption of NLP in practice.

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Link

http://academic.oup.com/jamia/article-pdf/22/1/121/34145305/amiajnl-2014-002902.pdf

Reference58 articles.

1. Mining electronic health records for adverse drug effects using regression based methods;Harpaz,2010

2. Detection of pharmacovigilance-related adverse events using electronic health records and automated methods;Haerian;Clin Pharmacol Ther,2012

3. Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study;Wang;J Am Med Inform Assoc,2009

4. Comparative analysis of pharmacovigilance methods in the detection of adverse drug reactions using electronic medical records;Liu;J Am Med Inform Assoc,2013

Cited by 34 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Network medicine and systems pharmacology approaches to predicting adverse drug effects;British Journal of Pharmacology;2024-09-11

2. Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review;JMIR Medical Informatics;2023-12-15

3. Named Entity Recognition in Electronic Health Records: A Methodological Review;Healthcare Informatics Research;2023-10-31

4. Natural Language Processing and Graph Theory: Making Sense of Imaging Records in a Novel Representation Frame;JMIR Medical Informatics;2022-12-21

5. Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review (Preprint);2022-09-05