Abstract
When judging the quality of a computational system for a pathological screening task, several factors seem to be important, like sensitivity, specificity, accuracy, etc. With machine learning based approaches showing promise in the multi-label paradigm, they are being widely adopted to diagnostics and digital therapeutics. Metrics are usually borrowed from machine learning literature, and the current consensus is to report results on a diverse set of metrics. It is infeasible to compare efficacy of computational systems which have been evaluated on different sets of metrics. From a diagnostic utility standpoint, the current metrics themselves are far from perfect, often biased by prevalence of negative samples or other statistical factors and importantly, they are designed to evaluate general purpose machine learning tasks. In this paper we outline the various parameters that are important in constructing a clinical metric aligned with diagnostic practice, and demonstrate their incompatibility with existing metrics. We propose a new metric, MedTric that takes into account several factors that are of clinical importance. MedTric is built from the ground up keeping in mind the unique context of computational diagnostics and the principle of risk minimization, penalizing missed diagnosis more harshly than over-diagnosis. MedTric is a unified metric for medical or pathological screening system evaluation. We compare this metric against other widely used metrics and demonstrate how our system outperforms them in key areas of medical relevance.
Publisher
Public Library of Science (PLoS)
Reference20 articles.
1. Application of machine learning in the diagnosis of gastric cancer based on noninvasive characteristics;SL Zhu;PLOS ONE,2021
2. Identifying neuroanatomical and behavioral features for autism spectrum disorder diagnosis in children using machine learning;Y Han;PLOS ONE,2022
3. Application of multi-label classification models for the diagnosis of diabetic complications;L Zhou;BMC Medical Informatics and Decision Making,2021
4. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network;AY Hannun;Nature Medicine,2019
5. Reliable Multi-Label Learning via Conformal Predictor and Random Forest for Syndrome Differentiation of Chronic Fatigue in Traditional Chinese Medicine;H Wang;PLOS ONE,2014
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献