Aiding ICD-10 Encoding of Clinical Health Records Using Improved Text Cosine Similarity and PLM-ICD

Author:

Silva Hugo1ORCID,Duque Vítor2,Macedo Mário3ORCID,Mendes Mateus1ORCID

Affiliation:

1. Polytechnic Institute of Coimbra, Coimbra Institute of Engineering, Rua Pedro Nunes–Quinta da Nora, 3030-199 Coimbra, Portugal

2. Department of Infectious Diseases, Coimbra Hospital and University Centre, 3000-075 Coimbra, Portugal

3. RCM2+ Research Centre for Asset Management and Systems Engineering, ISEC/IPC, Rua Pedro Nunes, 3030-199 Coimbra, Portugal

Abstract

The International Classification of Diseases, 10th edition (ICD-10), has been widely used for the classification of patient diagnostic information. This classification is usually performed by dedicated physicians with specific coding training, and it is a laborious task. Automatic classification is a challenging task for the domain of natural language processing. Therefore, automatic methods have been proposed to aid the classification process. This paper proposes a method where Cosine text similarity is combined with a pretrained language model, PLM-ICD, in order to increase the number of probably useful suggestions of ICD-10 codes, based on the Medical Information Mart for Intensive Care (MIMIC)-IV dataset. The results show that a strategy of using multiple runs, and bucket category search, in the Cosine method, improves the results, providing more useful suggestions. Also, the use of a strategy composed by the Cosine method and PLM-ICD, which was called PLM-ICD-C, provides better results than just the PLM-ICD.

Funder

FCT and FEDER

Publisher

MDPI AG

Reference30 articles.

1. (2024, March 10). International Classification of Diseases 11th Revision. Available online: https://www.who.int/standards/classifications/classification-of-diseases.

2. Health records as the basis of clinical coding: Is the quality adequate? A qualitative study of medical coders’ perceptions;Alonso;Health Inf. Manag. J.,2020

3. Barriers to data quality resulting from the process of coding health information to administrative data: A qualitative study;Lucyk;BMC Health Serv. Res.,2017

4. Med7: A transferable clinical natural language processing model for electronic health records;Kormilitzin;Artif. Intell. Med.,2021

5. Neural natural language processing for unstructured data in electronic health records: A review;Li;Comput. Sci. Rev.,2022

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3