How suitable are clinical vignettes for the evaluation of symptom checker apps? A test theoretical perspective-Reference-Cited by-同舟云学术

How suitable are clinical vignettes for the evaluation of symptom checker apps? A test theoretical perspective

Published:2023-01 Issue: Volume:9 Page:
ISSN:2055-2076
Container-title:DIGITAL HEALTH
language:en
Short-container-title:DIGITAL HEALTH

Author:

Kopka Marvin¹²^ORCID,Feufel Markus A¹,Berner Eta S³,Schmieding Malte L²

Affiliation:

1. Department of Psychology and Ergonomics (IPA), Division of Ergonomics, Technische Universität Berlin, Berlin, Germany

2. Institute of Medical Informatics, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany

3. Department of Health Services Administration, University of Alabama at Birmingham, Birmingham, AL, USA

Abstract

Objective To evaluate the ability of case vignettes to assess the performance of symptom checker applications and to suggest refinements to the methodology used in case vignette-based audit studies. Methods We re-analyzed the publicly available data of two prominent case vignette-based symptom checker audit studies by calculating common metrics of test theory. Furthermore, we developed a new metric, the Capability Comparison Score (CCS), which compares symptom checker capability while controlling for the difficulty of the set of cases each symptom checker evaluated. We then scrutinized whether applying test theory and the CCS altered the performance ranking of the investigated symptom checkers. Results In both studies, most symptom checkers changed their rank order when adjusting the triage capability for item difficulty (ID) with the CCS. The previously reported triage accuracies commonly overestimated the capability of symptom checkers because they did not account for the fact that symptom checkers tend to selectively appraise easier cases (i.e., with high ID values). Also, many case vignettes in both studies showed insufficient (very low and even negative) values of item-total correlation (ITC), suggesting that individual items or the composition of item sets are of low quality. Conclusions A test–theoretic perspective helps identify previously undetected threats to the validity of case vignette-based symptom checker assessments and provides guidance and specific metrics to improve the quality of case vignettes, in particular by controlling for the difficulty of the vignettes an app was (not) able to evaluate correctly. Such measures might prove more meaningful than accuracy alone for the competitive assessment of symptom checkers. Our approach helps elaborate and standardize the methodology used for appraising symptom checker capability, which, ultimately, may yield more reliable results.

Publisher

SAGE Publications

Subject

Health Information Management,Computer Science Applications,Health Informatics,Health Policy

Link

http://journals.sagepub.com/doi/pdf/10.1177/20552076231194929

Reference41 articles.

1. Evaluation of symptom checkers for self diagnosis and triage: audit study

2. Accuracy of online symptom checkers and the potential impact on service utilisation

3. Characteristics of Users and Nonusers of Symptom Checkers in Germany: Cross-Sectional Survey Study

4. Web Use for Symptom Appraisal of Physical Health Conditions: A Systematic Review

5. EPatient Analytics GmbH. EPatient Survey 2020, https://www.hcm-magazin.de/epatient-survey-2020-digital-health-studie/150/10992/407743 (2020, accessed 6 March 2021).

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Ubie Symptom Checker: A Clinical Vignette Simulation Study;2024-08-31

2. Statistical refinement of case vignettes for digital health research;2024-08-30

3. Software symptomcheckR: an R package for analyzing and visualizing symptom checker triage performance;BMC Digital Health;2024-07-22

4. Controlling Inputter Variability in Vignette Studies Assessing Web-Based Symptom Checkers: Evaluation of Current Practice and Recommendations for Isolated Accuracy Metrics;JMIR Formative Research;2024-05-31

5. Evaluating self-triage accuracy of laypeople, symptom-assessment apps, and large language models: A framework for case vignette development using a representative design approach (RepVig);2024-04-03