Affiliation:
1. Science and Engineering Faculty, Queensland University of Technology, Brisbane 4000, Queensland, Australia.
2. The Australian e-Health Research Centre, CSIRO, Brisbane 4029, Queensland, Australia
Abstract
Abstract
Objective This paper presents an automatic, active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort and (2) the robustness of incremental active learning framework across different selection criteria and data sets are determined.
Materials and methods The comparative performance of an active learning framework and a fully supervised approach were investigated to study how active learning reduces the annotation effort while achieving the same effectiveness as a supervised approach. Conditional random fields as the supervised method, and least confidence and information density as 2 selection criteria for active learning framework were used. The effect of incremental learning vs standard learning on the robustness of the models within the active learning framework with different selection criteria was also investigated. The following 2 clinical data sets were used for evaluation: the Informatics for Integrating Biology and the Bedside/Veteran Affairs (i2b2/VA) 2010 natural language processing challenge and the Shared Annotated Resources/Conference and Labs of the Evaluation Forum (ShARe/CLEF) 2013 eHealth Evaluation Lab.
Results The annotation effort saved by active learning to achieve the same effectiveness as supervised learning is up to 77%, 57%, and 46% of the total number of sequences, tokens, and concepts, respectively. Compared with the random sampling baseline, the saving is at least doubled.
Conclusion Incremental active learning is a promising approach for building effective and robust medical concept extraction models while significantly reducing the burden of manual annotation.
Publisher
Oxford University Press (OUP)
Reference23 articles.
1. Natural language processing: algorithms and tools to extract computable information from EHRs and from the biomedical literature;Ohno-Machado;J Am Med Inform Assoc.,2013
2. Automatic extraction of cancer characteristics from free-text pathology reports for cancer notifications;Nguyen;Stud Health Technol Inform.,2011
3. Automatic classification of free-text radiology reports to identify limb fractures using machine learning and the SNOMED CT ontology;Zuccon;AMIA Summit Clin Res Inform.,2013
4. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text;Uzuner;J Am Med Inform Assoc.,2011
5. Natural language processing: an introduction;Nadkarni;J Am Med Inform Assoc.,2011
Cited by
41 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献