Iterative Annotation of Biomedical NER Corpora with Deep Neural Networks and Knowledge Bases-Reference-Cited by-同舟云学术

Iterative Annotation of Biomedical NER Corpora with Deep Neural Networks and Knowledge Bases

Published:2022-06-07 Issue:12 Volume:12 Page:5775
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Silvestri Stefano^ORCID,Gargiulo Francesco^ORCID,Ciampi Mario^ORCID

Abstract

The large availability of clinical natural language documents, such as clinical narratives or diagnoses, requires the definition of smart automatic systems for their processing and analysis, but the lack of annotated corpora in the biomedical domain, especially in languages different from English, makes it difficult to exploit the state-of-art machine-learning systems to extract information from such kinds of documents. For these reasons, healthcare professionals lose big opportunities that can arise from the analysis of this data. In this paper, we propose a methodology to reduce the manual efforts needed to annotate a biomedical named entity recognition (B-NER) corpus, exploiting both active learning and distant supervision, respectively based on deep learning models (e.g., Bi-LSTM, word2vec FastText, ELMo and BERT) and biomedical knowledge bases, in order to speed up the annotation task and limit class imbalance issues. We assessed this approach by creating an Italian-language electronic health record corpus annotated with biomedical domain entities in a small fraction of the time required for a fully manual annotation. The obtained corpus was used to train a B-NER deep neural network whose performances are comparable with the state of the art, with an F1-Score equal to 0.9661 and 0.8875 on two test sets.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/12/5775/pdf

Reference56 articles.

1. Mining Electronic Health Records (EHRs)

2. A Big Data Architecture for the Extraction and Analysis of EHR Data

3. Assessment of DistilBERT performance on Named Entity Recognition task for the detection of Protected Health Information and medical concepts

4. Natural Language Processing Supporting Interoperability in Healthcare

Cited by 15 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Special Issue on eHealth Innovative Approaches and Applications;Applied Sciences;2024-03-19

2. Extracting adverse drug events from clinical Notes: A systematic review of approaches used;Journal of Biomedical Informatics;2024-03

3. The Smart Improving of Translation Models Using Recurrent Neural Networks;2024 International Conference on Optimization Computing and Wireless Communication (ICOCWC);2024-01-29

4. A survey on semantic processing techniques;Information Fusion;2024-01

5. A study of deep active learning methods to reduce labelling efforts in biomedical relation extraction;PLOS ONE;2023-12-15