Abstract
Proper codification of medical diagnoses and procedures is essential for optimized health care management, quality improvement, research, and reimbursement tasks within large healthcare systems. Assignment of diagnostic or procedure codes is a tedious manual process, often prone to human error. Natural Language Processing (NLP) has been suggested to facilitate this manual codification process. Yet, little is known on best practices to utilize NLP for such applications. With Large Language Models (LLMs) becoming more ubiquitous in daily life, it is critical to remember, not every task requires that level of resource and effort. Here we comprehensively assessed the performance of common NLP techniques to predict current procedural terminology (CPT) from operative notes. CPT codes are commonly used to track surgical procedures and interventions and are the primary means for reimbursement. Our analysis of 100 most common musculoskeletal CPT codes suggest that traditional approaches can outperform more resource intensive approaches like BERT significantly (P-value = 4.4e-17) with average AUROC of 0.96 and accuracy of 0.97, in addition to providing interpretability which can be very helpful and even crucial in the clinical domain. We also proposed a complexity measure to quantify the complexity of a classification task and how this measure could influence the effect of dataset size on model’s performance. Finally, we provide preliminary evidence that NLP can help minimize the codification error, including mislabeling due to human error.
Funder
Children’s Orthopaedic Surgery Foundation
Boston Children’s Hospital Research Faculty Council
NVIDIA Basic Research Accelerator Program
Publisher
Public Library of Science (PLoS)
Reference38 articles.
1. Big data in medicine is driving big changes;F Martin-Sanchez;Yearbook of medical informatics,2014
2. Can Natural Language Processing and Artificial Intelligence Automate The Generation of Billing Codes From Operative Note Dictations?;JS Kim;Global Spine Journal,2022
3. Kaur R, Ginige JA, Obst O. A Systematic Literature Review of Automated ICD Coding and Classification Systems using Discharge Summaries. arXiv preprint arXiv:210710652. 2021;.
4. SECNLP: A survey of embeddings in clinical natural language processing;KS Kalyan;Journal of biomedical informatics,2020
5. Classification of current procedural terminology codes from electronic health record data using machine learning;ML Burns;Anesthesiology,2020
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献