How accurate are digital symptom assessment apps for suggesting conditions and urgency advice? A clinical vignettes comparison to GPs-Reference-Cited by-同舟云学术

How accurate are digital symptom assessment apps for suggesting conditions and urgency advice? A clinical vignettes comparison to GPs

Published:2020-12 Issue:12 Volume:10 Page:e040269
ISSN:2044-6055
Container-title:BMJ Open
language:en
Short-container-title:BMJ Open

Author:

Gilbert Stephen^ORCID,Mehl Alicia,Baluch Adel,Cawley Caoimhe,Challiner Jean,Fraser Hamish,Millen Elizabeth,Montazeri Maryam^ORCID,Multmeier Jan,Pick Fiona,Richter Claudia,Türk Ewelina,Upadhyay Shubhanan,Virani Vishaal,Vona Nicola,Wicks Paul,Novorol Claire

Abstract

ObjectivesTo compare breadth of condition coverage, accuracy of suggested conditions and appropriateness of urgency advice of eight popular symptom assessment apps.DesignVignettes study.Setting200 primary care vignettes.Intervention/comparatorFor eight apps and seven general practitioners (GPs): breadth of coverage and condition-suggestion and urgency advice accuracy measured against the vignettes’ gold-standard.Primary outcome measures(1) Proportion of conditions ‘covered’ by an app, that is, not excluded because the user was too young/old or pregnant, or not modelled; (2) proportion of vignettes with the correct primary diagnosis among the top 3 conditions suggested; (3) proportion of ‘safe’ urgency advice (ie, at gold standard level, more conservative, or no more than one level less conservative).ResultsCondition-suggestion coverage was highly variable, with some apps not offering a suggestion for many users: in alphabetical order, Ada: 99.0%; Babylon: 51.5%; Buoy: 88.5%; K Health: 74.5%; Mediktor: 80.5%; Symptomate: 61.5%; Your.MD: 64.5%; WebMD: 93.0%. Top-3 suggestion accuracy was GPs (average): 82.1%±5.2%; Ada: 70.5%; Babylon: 32.0%; Buoy: 43.0%; K Health: 36.0%; Mediktor: 36.0%; Symptomate: 27.5%; WebMD: 35.5%; Your.MD: 23.5%. Some apps excluded certain user demographics or conditions and their performance was generally greater with the exclusion of corresponding vignettes. For safe urgency advice, tested GPs had an average of 97.0%±2.5%. For the vignettes with advice provided, only three apps had safety performance within 1 SD of the GPs—Ada: 97.0%; Babylon: 95.1%; Symptomate: 97.8%. One app had a safety performance within 2 SDs of GPs—Your.MD: 92.6%. Three apps had a safety performance outside 2 SDs of GPs—Buoy: 80.0% (p<0.001); K Health: 81.3% (p<0.001); Mediktor: 87.3% (p=1.3×10-3).ConclusionsThe utility of digital symptom assessment apps relies on coverage, accuracy and safety. While no digital tool outperformed GPs, some came close, and the nature of iterative improvements to software offers scalable improvements to care.

Publisher

BMJ

Subject

General Medicine

Reference29 articles.

1. McDaid D , Park A-L . Online health: untangling the web, 2011.

2. The effect of Dr Google on doctor–patient encounters in primary care: a quantitative, observational, cross-sectional study

3. Comparison of physician and computer diagnostic accuracy;Semigran;JAMA Intern Med,2016

4. Digital and online symptom checkers and health assessment/triage services for urgent health problems: systematic review

5. Beyond Dr. Google: the evidence on consumer-facing digital tools for diagnosis

Cited by 122 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Supporting parents with acutely ill children: Environment scan and user evaluation of mobile applications (the SuPa kids project);International Journal of Medical Informatics;2024-09

2. Family member and service provider experiences and perspectives of a digital surveillance and service navigation approach in multicultural context: a qualitative study in identifying the barriers and enablers to Watch Me Grow-Electronic (WMG-E) program with a culturally diverse community;BMC Health Services Research;2024-08-24

3. Comparison of Two Symptom Checkers (Ada and Symptoma) in the Emergency Department: Randomized, Crossover, Head-to-Head, Double-Blinded Study;Journal of Medical Internet Research;2024-08-20

4. Evaluating the diagnostic and triage performance of digital and online symptom checkers for the presentation of myocardial infarction; A retrospective cross-sectional study;PLOS Digital Health;2024-08-05

5. Evaluation of a Musculoskeletal Digital Assessment Routing Tool (DART): Crossover Noninferiority Randomized Pilot Trial;JMIR Formative Research;2024-07-30