Parallel sequence tagging for concept recognition-Reference-Cited by-同舟云学术

Parallel sequence tagging for concept recognition

Published:2021-12 Issue:S1 Volume:22 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Furrer Lenz^ORCID,Cornelius Joseph,Rinaldi Fabio^ORCID

Abstract

Abstract Background Named Entity Recognition (NER) and Normalisation (NEN) are core components of any text-mining system for biomedical texts. In a traditional concept-recognition pipeline, these tasks are combined in a serial way, which is inherently prone to error propagation from NER to NEN. We propose a parallel architecture, where both NER and NEN are modeled as a sequence-labeling task, operating directly on the source text. We examine different harmonisation strategies for merging the predictions of the two classifiers into a single output sequence. Results We test our approach on the recent Version 4 of the CRAFT corpus. In all 20 annotation sets of the concept-annotation task, our system outperforms the pipeline system reported as a baseline in the CRAFT shared task, a competition of the BioNLP Open Shared Tasks 2019. We further refine the systems from the shared task by optimising the harmonisation strategy separately for each annotation set. Conclusions Our analysis shows that the strengths of the two classifiers can be combined in a fruitful way. However, prediction harmonisation requires individual calibration on a development set for each annotation set. This allows achieving a good trade-off between established knowledge (training set) and novel information (unseen concepts).

Funder

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung

Innosuisse - Schweizerische Agentur für Innovationsförderung

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/s12859-021-04511-y.pdf

Reference51 articles.

1. Campos D, Matos S, Oliveira JL. Gimli: open source and high-performance biomedical name recognition. BMC Bioinformatics. 2013;14(1):54. https://doi.org/10.1186/1471-2105-14-54.

2. Xu D, Zhang M, Xie Y, Wang F, Chen M, Zhu KQ, Wei J. DTMiner: identification of potential disease targets through biomedical literature mining. Bioinformatics. 2016. https://doi.org/10.1093/bioinformatics/btw503.

3. Weber L, Münchmeyer J, Rocktäschel T, Habibi M, Leser U. HUNER: improving biomedical NER with pretraining. Bioinformatics. 2019;36(1):295–302. https://doi.org/10.1093/bioinformatics/btz528.

4. Giorgi JM, Bader GD. Towards reliable named entity recognition in the biomedical domain. Bioinformatics. 2019;36(1):280–6. https://doi.org/10.1093/bioinformatics/btz504.

5. Hong SK, Lee J-G. DTranNER: biomedical named entity recognition with deep learning-based label-label transition model. BMC Bioinformatics. 2020;21:53. https://doi.org/10.1186/s12859-020-3393-1.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Data augmentation and transfer learning for cross-lingual Named Entity Recognition in the biomedical domain;Language Resources and Evaluation;2024-05-10

2. Cancer-Alterome: a literature-mined resource for regulatory events caused by genetic alterations in cancer;Scientific Data;2024-03-02