BELHD: improving biomedical entity linking with homonym disambiguation-Reference-Cited by-同舟云学术

BELHD: improving biomedical entity linking with homonym disambiguation

Published:2024-07-27 Issue:8 Volume:40 Page:
ISSN:1367-4811
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Garda Samuele¹^ORCID,Leser Ulf¹

Affiliation:

1. Computer Science, Humboldt-Universität zu Berlin , Berlin 12489, Germany

Abstract

Abstract Motivation Biomedical entity linking (BEL) is the task of grounding entity mentions to a given knowledge base (KB). Recently, neural name-based methods, system identifying the most appropriate name in the KB for a given mention using neural network (either via dense retrieval or autoregressive modeling), achieved remarkable results for the task, without requiring manual tuning or definition of domain/entity-specific rules. However, as name-based methods directly return KB names, they cannot cope with homonyms, i.e. different KB entities sharing the exact same name. This significantly affects their performance for KBs where homonyms account for a large amount of entity mentions (e.g. UMLS and NCBI Gene). Results We present BELHD (Biomedical Entity Linking with Homonym Disambiguation), a new name-based method that copes with this challenge. BELHD builds upon the BioSyn model with two crucial extensions. First, it performs pre-processing of the KB, during which it expands homonyms with a specifically constructed disambiguating string, thus enforcing unique linking decisions. Second, it introduces candidate sharing, a novel strategy that strengthens the overall training signal by including similar mentions from the same document as positive or negative examples, according to their corresponding KB identifier. Experiments with 10 corpora and 5 entity types show that BELHD improves upon current neural state-of-the-art approaches, achieving the best results in 6 out of 10 corpora with an average improvement of 4.55pp recall@1. Furthermore, the KB preprocessing is orthogonal to the prediction model and thus can also improve other neural methods, which we exemplify for GenBioEL, a generative name-based BEL approach. Availability and implementation The code to reproduce our experiments can be found at: https://github.com/sg-wbi/belhd.

Funder

Deutsche Forschungsgemeinschaft

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btae474/58667435/btae474.pdf

Reference33 articles.

1. An overview of biomedical entity linking throughout the years;French;J Biomed Inform,2022