Improving biomedical entity linking for complex entity mentions with LLM-based text simplification-Reference-Cited by-同舟云学术

Improving biomedical entity linking for complex entity mentions with LLM-based text simplification

Published:2024 Issue: Volume:2024 Page:
ISSN:1758-0463
Container-title:Database
language:en
Short-container-title:

Author:

Borchert Florian¹^ORCID,Llorca Ignacio¹,Schapranow Matthieu-P¹^ORCID

Affiliation:

1. Hasso Plattner Institute for Digital Engineering, University of Potsdam , Prof.-Dr.-Helmert-Straße 2-3, Potsdam 14482, Germany

Abstract

Abstract Large amounts of important medical information are captured in free-text documents in biomedical research and within healthcare systems, which can be made accessible through natural language processing (NLP). A key component in most biomedical NLP pipelines is entity linking, i.e. grounding textual mentions of named entities to a reference of medical concepts, usually derived from a terminology system, such as the Systematized Nomenclature of Medicine Clinical Terms. However, complex entity mentions, spanning multiple tokens, are notoriously hard to normalize due to the difficulty of finding appropriate candidate concepts. In this work, we propose an approach to preprocess such mentions for candidate generation, building upon recent advances in text simplification with generative large language models. We evaluate the feasibility of our method in the context of the entity linking track of the BioCreative VIII SympTEMIST shared task. We find that instructing the latest Generative Pre-trained Transformer model with a few-shot prompt for text simplification results in mention spans that are easier to normalize. Thus, we can improve recall during candidate generation by 2.9 percentage points compared to our baseline system, which achieved the best score in the original shared task evaluation. Furthermore, we show that this improvement in recall can be fully translated into top-1 accuracy through careful initialization of a subsequent reranking model. Our best system achieves an accuracy of 63.6% on the SympTEMIST test set. The proposed approach has been integrated into the open-source xMEN toolkit, which is available online via https://github.com/hpi-dhc/xmen.

Funder

German Federal Ministry of Research and Education

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/database/article-pdf/doi/10.1093/database/baae067/58662801/baae067.pdf

Reference35 articles.

1. Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review;Koleck;J Am Med Inform Assoc,2019

2. What can natural language processing do for clinical decision support?;Demner-Fushman;J Biomed Inform,2009

3. An overview of biomedical entity linking throughout the years;French;J Biomed Inform,2023

4. Neural entity linking: a survey of models based on deep learning;Sevgili;Semant Web J,2022

5. A comprehensive evaluation of biomedical entity linking models;Kartchner,2023