Using clinical text to refine unspecific condition codes in Dutch general practitioner EHR data-Reference-Cited by-同舟云学术

Using clinical text to refine unspecific condition codes in Dutch general practitioner EHR data

Published:2024-01-05 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Seinen Tom M^ORCID,Kors Jan A^ORCID,van Mulligen Erik M^ORCID,Fridgeirsson Egill^ORCID,Verhamme Katia MC^ORCID,Rijnbeek Peter R^ORCID

Abstract

AbstractObjectiveObservational studies using electronic health record (EHR) databases often face challenges due to unspecific clinical codes that can obscure detailed medical information, hindering precise data analysis. In this study, we aimed to assess the feasibility of refining these unspecific condition codes into more specific codes in a Dutch general practitioner (GP) EHR database by leveraging the available clinical free text.MethodsWe utilized three approaches for text classification—search queries, semi-supervised learning, and supervised learning—to improve the specificity of ten unspecific International Classification of Primary Care (ICPC-1) codes. Two text representations and three machine learning algorithms were evaluated for the (semi-)supervised models. Additionally, we measured the improvement achieved by the refinement process on all code occurrences in the database.ResultsThe classification models performed well for most codes. In general, no single classification approach consistently outperformed the others. However, there were variations in the relative performance of the classification approaches within each code and in the use of different text representations and machine learning algorithms. Class imbalance and limited training data affected the performance of the (semi-)supervised models, yet the simple search queries remained particularly effective. Ultimately, the developed models improved the specificity of over half of all the unspecific code occurrences in the database.ConclusionsOur findings show the feasibility of using information from clinical text to improve the specificity of unspecific condition codes in observational healthcare databases, even with a limited range of machine-learning techniques and modest annotated training sets. Future work could investigate transfer learning, integration of structured data, alternative semi-supervised methods, and validation of models across healthcare settings. The improved level of detail enriches the interpretation of medical information and can benefit observational research and patient care.

Publisher

Cold Spring Harbor Laboratory

Reference30 articles.

1. Diagnosis code assignment: models and evaluation metrics

2. Automating the Assignment of Diagnosis Codes to Patient Encounters Using Example-based and Machine Learning Techniques

3. ICD-10 coding of Spanish electronic discharge summaries: An extreme classification problem;IEEE Access,2020

4. Remmer S , Lamproudis A , Dalianis H . Multi-label diagnosis classification of Swedish discharge summaries–ICD-10 code assignment using KB-BERT. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021); 2021.

5. Schäfer H , Friedrich CM . Multilingual ICD-10 Code Assignment with Transformer Architectures using MIMIC-III Discharge Summaries. CLEF (Working Notes); 2020.