We are not ready yet: limitations of state-of-the-art disease named entity recognizers-Reference-Cited by-同舟云学术

We are not ready yet: limitations of state-of-the-art disease named entity recognizers

Published:2022-10-27 Issue:1 Volume:13 Page:
ISSN:2041-1480
Container-title:Journal of Biomedical Semantics
language:en
Short-container-title:J Biomed Semant

Author:

Kühnel Lisa^ORCID,Fluck Juliane

Abstract

Abstract Background Intense research has been done in the area of biomedical natural language processing. Since the breakthrough of transfer learning-based methods, BERT models are used in a variety of biomedical and clinical applications. For the available data sets, these models show excellent results - partly exceeding the inter-annotator agreements. However, biomedical named entity recognition applied on COVID-19 preprints shows a performance drop compared to the results on test data. The question arises how well trained models are able to predict on completely new data, i.e. to generalize. Results Based on the example of disease named entity recognition, we investigate the robustness of different machine learning-based methods - thereof transfer learning - and show that current state-of-the-art methods work well for a given training and the corresponding test set but experience a significant lack of generalization when applying to new data. Conclusions We argue that there is a need for larger annotated data sets for training and testing. Therefore, we foresee the curation of further data sets and, moreover, the investigation of continual learning processes for machine learning-based models.

Funder

Deutsche Zentralbibliothek für Medizin (ZBMED)

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Health Informatics,Computer Science Applications,Information Systems

Link

https://link.springer.com/content/pdf/10.1186/s13326-022-00280-6.pdf

Reference30 articles.

1. School HM. N2C2: National NLP Clinical Challenges. https://n2c2.dbmi.hms.harvard.edu/. Accessed 20 June 2021.

2. Doğan RI, Leaman R, Lu Z. The NCBI Disease Corpus. https://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/DISEASE/. Accessed 11 July 2021.

3. Li J, Sun Y, Johnson RJ, Sciaky D, Wei C-H, Leaman R, Davis AP, Mattingly CJ, Wiegers TC, Lu Z. BioCreative v CDR task corpus: a resource for chemical disease relation extraction. 2016. https://doi.org/10.1093/database/baw068. Accessed 11 July 2021.

4. The NCBI Disease Corpus Guidelines. https://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/DISEASE/Guidelines.html. Accessed 12 July 2021.

5. The BC5CDR Corpus Guidelines. https://biocreative.bioinformatics.udel.edu/media/store/files/2015/bc5_CDR_data_guidelines.pdf. Accessed 12 July 2021.

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. From zero to hero: Harnessing transformers for biomedical named entity recognition in zero- and few-shot contexts;Artificial Intelligence in Medicine;2024-10

2. The Future of Orthodontics: Deep Learning Technologies;Cureus;2024-06-10

3. Integrating deep learning architectures for enhanced biomedical relation extraction: a pipeline approach;Database;2024

4. Dataset of miRNA–disease relations extracted from textual data using transformer-based neural networks;Database;2024

5. Parallel-Based Corpus Annotation for Malay Health Documents;Applied Sciences;2023-12-09