Abstract
AbstractBackgroundAI-driven symptom checkers (SC) are increasingly adopted in healthcare for their potential to provide users with accessible and immediate preliminary health education. These tools, powered by advanced artificial intelligence algorithms, assist patients in quickly assessing their symptoms. Previous studies using clinical vignette approaches have evaluated SC accuracy, highlighting both strengths and areas for improvement.ObjectiveThis study aims to evaluate the performance of the Ubie Symptom Checker (Ubie SC) using an innovative large language model-assisted (LLM) simulation method.MethodsThe study employed a three-phase methodology: gathering 400 publicly available clinical vignettes, medical entity linking these vignettes to the Ubie SC using large language models and physician supervision, and evaluation of accuracy metrics. The analysis focused on 328 vignettes that were within the scope of the Ubie SC with accuracy measured by Top-5 hit rates.ResultsUbie achieved a Top-5 hit accuracy of 63.4% and a Top-10 hit accuracy of 71.6%, indicating its effectiveness in providing relevant information based on symptom input. The system performed particularly well in domains such as the nervous system and respiratory conditions, though variability in accuracy was observed across different ICD groupings, highlighting areas for further refinement. When compared to physicians and comparator SC’s that used the same clinical vignettes set, Ubie compared favorably to the median physician hit accuracy.ConclusionsThe Ubie Symptom Checker shows considerable promise as a supportive education tool in healthcare. While the study highlights the system’s strengths, it also identifies areas for improvement suggesting continued refinement and real-world testing are essential to fully realize Ubie’s potential in AI-assisted healthcare.
Publisher
Cold Spring Harbor Laboratory