Validity evidence supporting clinical skills assessment by artificial intelligence compared with trained clinician raters-Reference-Cited by-同舟云学术

Validity evidence supporting clinical skills assessment by artificial intelligence compared with trained clinician raters

Published:2023-08-24 Issue:1 Volume:58 Page:105-117
ISSN:0308-0110
Container-title:Medical Education
language:en
Short-container-title:Medical Education

Author:

Johnsson Vilma¹²^ORCID,Søndergaard Morten Bo³⁴,Kulasegaram Kulamakan⁵⁶,Sundberg Karin¹,Tiblad Eleonor⁷⁸,Herling Lotta⁷⁹,Petersen Olav Bjørn¹¹⁰,Tolsgaard Martin G.¹³¹⁰^ORCID

Affiliation:

1. Center for Fetal Medicine, Department of Obstetrics Copenhagen University Hospital, Rigshospitalet Copenhagen Denmark

2. Faculty of Health and Medical Science University of Copenhagen Copenhagen Denmark

3. Copenhagen Academy for Medical Education and Simulation Copenhagen Denmark

4. Department of Computer Science University of Copenhagen Copenhagen Denmark

5. Department of Family and Community Medicine and Scientist Wilson Centre Toronto Ontario Canada

6. Temerty Faculty of Medicine University of Toronto Toronto Ontario Canada

7. Center for Fetal Medicine Karolinska University Hospital Stockholm Sweden

8. Clinical Epidemiology Division, Department of Medicine Solna Karolinska Institutet Stockholm Sweden

9. Department of Clinical Science, Intervention and Technology Karolinska Institutet Stockholm Sweden

10. Department of Clinical Medicine University of Copenhagen Copenhagen Denmark

Abstract

AbstractBackgroundArtificial intelligence (AI) is becoming increasingly used in medical education, but our understanding of the validity of AI‐based assessments (AIBA) as compared with traditional clinical expert‐based assessments (EBA) is limited. In this study, the authors aimed to compare and contrast the validity evidence for the assessment of a complex clinical skill based on scores generated from an AI and trained clinical experts, respectively.MethodsThe study was conducted between September 2020 to October 2022. The authors used Kane's validity framework to prioritise and organise their evidence according to the four inferences: scoring, generalisation, extrapolation and implications. The context of the study was chorionic villus sampling performed within the simulated setting. AIBA and EBA were used to evaluate performances of experts, intermediates and novice based on video recordings. The clinical experts used a scoring instrument developed in a previous international consensus study. The AI used convolutional neural networks for capturing features on video recordings, motion tracking and eye movements to arrive at a final composite score.ResultsA total of 45 individuals participated in the study (22 novices, 12 intermediates and 11 experts). The authors demonstrated validity evidence for scoring, generalisation, extrapolation and implications for both EBA and AIBA. The plausibility of assumptions related to scoring, evidence of reproducibility and relation to different training levels was examined. Issues relating to construct underrepresentation, lack of explainability, and threats to robustness were identified as potential weak links in the AIBA validity argument compared with the EBA validity argument.ConclusionThere were weak links in the use of AIBA compared with EBA, mainly in their representation of the underlying construct but also regarding their explainability and ability to transfer to other datasets. However, combining AI and clinical expert‐based assessments may offer complementary benefits, which is a promising subject for future research.

Funder

Novo Nordisk Fonden

Publisher

Wiley

Subject

Education,General Medicine

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1111/medu.15190

Reference49 articles.

1. JinA YeungS JoplingJ et al.Tool detection and operative skill assessment in surgical videos using region‐based convolutional neural networks. In:2018 IEEE Winter Conference on Applications of Computer Vision (WACV).2018:691–699.10.1109/WACV.2018.00081

2. Evaluation of Deep Learning Models for Identifying Surgical Actions and Measuring Performance

3. Artificial intelligence in medical education

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Why we should stop writing commentaries about AI;Medical Education;2024-07-11

2. Technical skills assessment: The expert versus the algorithm;The Clinical Teacher;2024-05

3. A scoping review of artificial intelligence in medical education: BEME Guide No. 84;Medical Teacher;2024-02-29