Using Natural Language Processing to Evaluate the Quality of Supervisor Narrative Comments in Competency-Based Medical Education-Reference-Cited by-同舟云学术

Using Natural Language Processing to Evaluate the Quality of Supervisor Narrative Comments in Competency-Based Medical Education

Published:2024-01-12 Issue:5 Volume:99 Page:534-540
ISSN:1040-2446
Container-title:Academic Medicine
language:en
Short-container-title:Acad Med

Author:

Spadafore Maxwell^ORCID,Yilmaz Yusuf^ORCID,Rally Veronica^ORCID,Chan Teresa M.^ORCID,Russell Mackenzie,Thoma Brent^ORCID,Singh Sim^ORCID,Monteiro Sandra^ORCID,Pardhan Alim^ORCID,Martin Lynsey,Monrad Seetha U.^ORCID,Woods Rob^ORCID

Abstract

Abstract Purpose Learner development and promotion rely heavily on narrative assessment comments, but narrative assessment quality is rarely evaluated in medical education. Educators have developed tools such as the Quality of Assessment for Learning (QuAL) tool to evaluate the quality of narrative assessment comments; however, scoring the comments generated in medical education assessment programs is time intensive. The authors developed a natural language processing (NLP) model for applying the QuAL score to narrative supervisor comments. Method Samples of 2,500 Entrustable Professional Activities assessments were randomly extracted and deidentified from the McMaster (1,250 comments) and Saskatchewan (1,250 comments) emergency medicine (EM) residency training programs during the 2019–2020 academic year. Comments were rated using the QuAL score by 25 EM faculty members and 25 EM residents. The results were used to develop and test an NLP model to predict the overall QuAL score and QuAL subscores. Results All 50 raters completed the rating exercise. Approximately 50% of the comments had perfect agreement on the QuAL score, with the remaining resolved by the study authors. Creating a meaningful suggestion for improvement was the key differentiator between high- and moderate-quality feedback. The overall QuAL model predicted the exact human-rated score or 1 point above or below it in 87% of instances. Overall model performance was excellent, especially regarding the subtasks on suggestions for improvement and the link between resident performance and improvement suggestions, which achieved 85% and 82% balanced accuracies, respectively. Conclusions This model could save considerable time for programs that want to rate the quality of supervisor comments, with the potential to automatically score a large volume of comments. This model could be used to provide faculty with real-time feedback or as a tool to quantify and track the quality of assessment comments at faculty, rotation, program, or institution levels.

Publisher

Ovid Technologies (Wolters Kluwer Health)

Reference45 articles.

1. Cracking the code: residents’ interpretations of written assessment comments;Med Educ,2017

2. The hidden value of narrative comments for assessment: a quantitative reliability analysis of qualitative data;Acad Med,2017

3. Nuance and noise: lessons learned from longitudinal aggregated assessment data;J Grad Med Educ,2017

4. When assessment data are words: validity evidence for qualitative educational assessments;Acad Med,2016

5. Narrative assessments in higher education: a scoping review to identify evidence-based quality indicators;Acad Med,2022

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Natural Language Processing in medicine and ophthalmology: A review for the 21st-century clinician;Asia-Pacific Journal of Ophthalmology;2024-07

2. Trainees' Perspectives on the Next Era of Assessment and Precision Education;ACAD MED;2024

3. Considering the Secondary Use of Clinical and Educational Data to Facilitate the Development of Artificial Intelligence Models;ACAD MED;2024