Abstract
AbstractBackgroundTrustability is crucial for Al models in clinical settings. Conformal prediction as a robust uncertainty quantification framework has been receiving increasing attention as a valuable tool in improving model trustability. An area of active research is the method of non-conformity score calculation for conformal prediction.MethodWe propose deep conformal supervision (DCS) which leverages the intermediate outputs of deep supervision for non-conformity score calculation, via weighted averaging based on the inverse of mean calibration error for each stage. We benchmarked our method on two publicly available datasets focused on medical image classification; a pneumonia chest radiography dataset and a preprocessed version of the 2019 RSNA Intracranial Hemorrhage dataset.ResultsOur method achieved mean coverage errors of 16e-4 (CI: le-4, 41e-4) and 5e-4 (CI: le-4, 10e-4) compared to baseline mean coverage errors of 28e-4 (CI: 2e-4, 64e-4) and 21e-4 (CI: 8e-4, 3e-4) on the two datasets, respectively.ConclusionIn this non-inferiority study, we observed that the baseline results of conformal prediction already exhibit small coverage errors. Our method shows a relative enhancement, particularly noticeable in scenarios involving smaller datasets or when considering smaller acceptable error levels, although this improvement is not statistically significant.
Publisher
Cold Spring Harbor Laboratory