Improving deep learning method for biomedical named entity recognition by using entity definition information-Reference-Cited by-同舟云学术

Improving deep learning method for biomedical named entity recognition by using entity definition information

Published:2021-12 Issue:S1 Volume:22 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Xiong Ying,Chen Shuai,Tang Buzhou^ORCID,Chen Qingcai,Wang Xiaolong,Yan Jun,Zhou Yi

Abstract

Abstract Background Biomedical named entity recognition (NER) is a fundamental task of biomedical text mining that finds the boundaries of entity mentions in biomedical text and determines their entity type. To accelerate the development of biomedical NER techniques in Spanish, the PharmaCoNER organizers launched a competition to recognize pharmacological substances, compounds, and proteins. Biomedical NER is usually recognized as a sequence labeling task, and almost all state-of-the-art sequence labeling methods ignore the meaning of different entity types. In this paper, we investigate some methods to introduce the meaning of entity types in deep learning methods for biomedical NER and apply them to the PharmaCoNER 2019 challenge. The meaning of each entity type is represented by its definition information. Material and method We investigate how to use entity definition information in the following two methods: (1) SQuad-style machine reading comprehension (MRC) methods that treat entity definition information as query and biomedical text as context and predict answer spans as entities. (2) Span-level one-pass (SOne) methods that predict entity spans of one type by one type and introduce entity type meaning, which is represented by entity definition information. All models are trained and tested on the PharmaCoNER 2019 corpus, and their performance is evaluated by strict micro-average precision, recall, and F1-score. Results Entity definition information brings improvements to both SQuad-style MRC and SOne methods by about 0.003 in micro-averaged F1-score. The SQuad-style MRC model using entity definition information as query achieves the best performance with a micro-averaged precision of 0.9225, a recall of 0.9050, and an F1-score of 0.9137, respectively. It outperforms the best model of the PharmaCoNER 2019 challenge by 0.0032 in F1-score. Compared with the state-of-the-art model without using manually-crafted features, our model obtains a 1% improvement in F1-score, which is significant. These results indicate that entity definition information is useful for deep learning methods on biomedical NER. Conclusion Our entity definition information enhanced models achieve the state-of-the-art micro-average F1 score of 0.9137, which implies that entity definition information has a positive impact on biomedical NER detection. In the future, we will explore more entity definition information from knowledge graph.

Funder

National Natural Science Foundations of China

Special Foundation for Technology Research Program of Guangdong Province

National Natural Science Foundations of Guangdong, China

Guangdong Province Covid-19 Pandemic Control Research Fund

Strategic Emerging Industry Development Special Funds of Shenzhen

Innovation Fund of Harbin Institute of Technology

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/s12859-021-04236-y.pdf

Reference52 articles.

1. Gonzalez-Agirre A, Marimon M, Intxaurrondo A, Rabal O, Villegas M, Krallinger M. PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity Recognition track. In: Proceedings of The 5th Workshop on BioNLP Open Shared Tasks. Hong Kong, China: Association for Computational Linguistics; 2019. p. 1–10. doi:https://doi.org/10.18653/v1/D19-5701.

2. Lyu C, Chen B, Ren Y, Ji D. Long short-term memory RNN for biomedical named entity recognition. BMC Bioinform. 2017;18:462.

3. Sun W, Rumshisky A, Uzuner Ö. Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J Am Med Inform Assoc. 2013;20:806–13.

4. Stubbs A, Kotfila C, Uzuner Ö. Automated systems for the de-identification of longitudinal clinical narratives: overview of 2014 i2b2/UTHealth shared task Track 1. J Biomed Inform. 2015;58:S11–9.

5. Smith L, Tanabe LK, nee Ando RJ, Kuo C-J, Chung I-F, Hsu C-N, et al. Overview of BioCreative II gene mention recognition. Genome Biol. 2008;9:S2.

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring Biomedical Named Entity Recognition via SciSpaCy and BioBERT Models;The Open Biomedical Engineering Journal;2024-06-05

2. Data Expansion for Named Entity Recognition based on migration learning;2024 5th International Conference on Computer Vision, Image and Deep Learning (CVIDL);2024-04-19

3. Integrated Deep Learning with Attention Layer Based Approach for Precise Biomedical Named Entity Recognition;Journal of Advances in Information Technology;2024

4. Named Entity Recognition from Chinese Medical Literature Based on Deep Learning Method;2023 China Automation Congress (CAC);2023-11-17

5. A Dataset for Entity Recognition of COVID-19 Public Opinion in Social Media;2023 10th International Conference on Behavioural and Social Computing (BESC);2023-10-30