Deep learning with language models improves named entity recognition for PharmaCoNER-Reference-Cited by-同舟云学术

Deep learning with language models improves named entity recognition for PharmaCoNER

Published:2021-12 Issue:S1 Volume:22 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Sun Cong,Yang Zhihao,Wang Lei,Zhang Yin,Lin Hongfei,Wang Jian

Abstract

Abstract Background The recognition of pharmacological substances, compounds and proteins is essential for biomedical relation extraction, knowledge graph construction, drug discovery, as well as medical question answering. Although considerable efforts have been made to recognize biomedical entities in English texts, to date, only few limited attempts were made to recognize them from biomedical texts in other languages. PharmaCoNER is a named entity recognition challenge to recognize pharmacological entities from Spanish texts. Because there are currently abundant resources in the field of natural language processing, how to leverage these resources to the PharmaCoNER challenge is a meaningful study. Methods Inspired by the success of deep learning with language models, we compare and explore various representative BERT models to promote the development of the PharmaCoNER task. Results The experimental results show that deep learning with language models can effectively improve model performance on the PharmaCoNER dataset. Our method achieves state-of-the-art performance on the PharmaCoNER dataset, with a max F1-score of 92.01%. Conclusion For the BERT models on the PharmaCoNER dataset, biomedical domain knowledge has a greater impact on model performance than the native language (i.e., Spanish). The BERT models can obtain competitive performance by using WordPiece to alleviate the out of vocabulary limitation. The performance on the BERT model can be further improved by constructing a specific vocabulary based on domain knowledge. Moreover, the character case also has a certain impact on model performance.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/s12859-021-04260-y.pdf

Reference33 articles.

1. Krallinger M, Rabal O, Lourenco A, et al. Information retrieval and text mining technologies for chemistry. Chem Rev. 2017;117(12):7673–761.

2. Krallinger M, Leitner F, Rabal O, et al. CHEMDNER: The drugs and chemical names extraction challenge. J Cheminform. 2015;7(1):1–11.

3. Elhadad N, Pradhan S, Gorman S, et al. SemEval-2015 task 14: analysis of clinical text. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015). Denver: Association for Computational Linguistics; 2015. p. 303–10.

4. Uzuner Ö, South BR, Shen S, et al. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc. 2011;18(5):552–6.

5. Agirre AG, Marimon M, Intxaurrondo A, et al. Pharmaconer: Pharmacological substances, compounds and proteins named entity recognition track. In: Proceedings of The 5th workshop on BioNLP open shared tasks. Hong Kong: Association for Computational Linguistics; 2019; p. 1–10.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Assessing the performance of large language models in literature screening for pharmacovigilance: a comparative study;Frontiers in Drug Safety and Regulation;2024-06-27

2. Improving biomedical entity linking for complex entity mentions with LLM-based text simplification;Database;2024

3. Integrating domain knowledge for biomedical text analysis into deep learning: A survey;Journal of Biomedical Informatics;2023-07

4. Review of Natural Language Processing in Pharmacology;Pharmacological Reviews;2023-03-17