Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks-Reference-Cited by-同舟云学术

Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks

Published:2021-02-26 Issue:1 Volume:4 Page:
ISSN:2398-6352
Container-title:npj Digital Medicine
language:en
Short-container-title:npj Digit. Med.

Author:

Sammani Arjan^ORCID,Bagheri Ayoub,van der Heijden Peter G. M.,te Riele Anneline S. J. M.,Baas Annette F.,Oosters C. A. J.,Oberski Daniel^ORCID,Asselbergs Folkert W.^ORCID

Abstract

AbstractStandard reference terminology of diagnoses and risk factors is crucial for billing, epidemiological studies, and inter/intranational comparisons of diseases. The International Classification of Disease (ICD) is a standardized and widely used method, but the manual classification is an enormously time-consuming endeavor. Natural language processing together with machine learning allows automated structuring of diagnoses using ICD-10 codes, but the limited performance of machine learning models, the necessity of gigantic datasets, and poor reliability of terminal parts of these codes restricted clinical usability. We aimed to create a high performing pipeline for automated classification of reliable ICD-10 codes in the free medical text in cardiology. We focussed on frequently used and well-defined three- and four-digit ICD-10 codes that still have enough granularity to be clinically relevant such as atrial fibrillation (I48), acute myocardial infarction (I21), or dilated cardiomyopathy (I42.0). Our pipeline uses a deep neural network known as a Bidirectional Gated Recurrent Unit Neural Network and was trained and tested with 5548 discharge letters and validated in 5089 discharge and procedural letters. As in clinical practice discharge letters may be labeled with more than one code, we assessed the single- and multilabel performance of main diagnoses and cardiovascular risk factors. We investigated using both the entire body of text and only the summary paragraph, supplemented by age and sex. Given the privacy-sensitive information included in discharge letters, we added a de-identification step. The performance was high, with F1 scores of 0.76–0.99 for three-character and 0.87–0.98 for four-character ICD-10 codes, and was best when using complete discharge letters. Adding variables age/sex did not affect results. For model interpretability, word coefficients were provided and qualitative assessment of classification was manually performed. Because of its high performance, this pipeline can be useful to decrease the administrative burden of classifying discharge diagnoses and may serve as a scaffold for reimbursement and research applications.

Publisher

Springer Science and Business Media LLC

Subject

Health Information Management,Health Informatics,Computer Science Applications,Medicine (miscellaneous)

Link

http://www.nature.com/articles/s41746-021-00404-9.pdf

Reference31 articles.

1. Jensen, P. B., Jensen, L. J. & Brunak, S. Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13, 395–405 (2012).

2. Bagheri, A., Sammani, A., van der Heijden, P. G. M., Asselbergs, F. W. & Oberski, D. L. Automatic ICD-10 classification of diseases from Dutch discharge letters. in BIOINFORMATICS 2020—11th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 13th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2020 vol. BIOSTEC202. 281–289 (SCITEPRESS—Science and Technology Publications, 2020).

3. Hirsch, J. A. et al. ICD-10: History and context. Am. J. Neuroradiol. 37, 596–599 (2016).

4. Atutxa, A., de Ilarraza, A. D., Gojenola, K., Oronoz, M. & Perez-de-Viñaspre, O. Interpretable deep learning to map diagnostic texts to ICD-10 codes. Int. J. Med. Inform. 129, 49–59 (2019).

5. Stausberg, J., Lehmann, N., Kaczmarek, D. & Stein, M. Reliability of diagnoses coding with ICD-10. Int. J. Med. Inf. 77, 50–57 (2008).

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Prediabetes: An overlooked risk factor for major adverse cardiac and cerebrovascular events in atrial fibrillation patients;World Journal of Diabetes;2024-01-15

2. Using clinical text to refine unspecific condition codes in Dutch general practitioner EHR data;2024-01-05

3. Artificial intelligence: revolutionizing cardiology with large language models;European Heart Journal;2024-01-03

4. Social Risk Factors are Associated with Risk for Hospitalization in Home Health Care: A Natural Language Processing Study;Journal of the American Medical Directors Association;2023-12

5. Deep spectral network for time series clustering;ICST Transactions on Scalable Information Systems;2023-09-25