Abstract
An enormous amount of clinical free-text information, such as pathology reports, progress reports, clinical notes and discharge summaries have been collected at hospitals and medical care clinics. These data provide an opportunity of developing many useful machine learning applications if the data could be transferred into a learn-able structure with appropriate labels for supervised learning. The annotation of this data has to be performed by qualified clinical experts, hence, limiting the use of this data due to the high cost of annotation. An underutilised technique of machine learning that can label new data called active learning (AL) is a promising candidate to address the high cost of the label the data. AL has been successfully applied to labelling speech recognition and text classification, however, there is a lack of literature investigating its use for clinical purposes. We performed a comparative investigation of various AL techniques using ML and deep learning (DL)-based strategies on three unique biomedical datasets. We investigated random sampling (RS), least confidence (LC), informative diversity and density (IDD), margin and maximum representativeness-diversity (MRD) AL query strategies. Our experiments show that AL has the potential to significantly reducing the cost of manual labelling. Furthermore, pre-labelling performed using AL expediates the labelling process by reducing the time required for labelling.
Subject
Artificial Intelligence,Applied Mathematics,Industrial and Manufacturing Engineering,Human-Computer Interaction,Information Systems,Control and Systems Engineering
Reference73 articles.
1. Automated cancer registry notifications: Validation of a medical text analytics system for identifying patients with cancer from a state-wide pathology repository;Nguyen;AMIA Annu. Symp. Proc.,2016
2. Automated reconciliation of radiology reports and discharge summaries;Koopman;AMIA Annu. Symp. Proc.,2015
3. Natural language processing: algorithms and tools to extract computable information from EHRs and from the biomedical literature
4. Natural language processing: an introduction
Cited by
35 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献