Author:
Hekler Achim,Brinker Titus J.,Buettner Florian
Abstract
Communicating the predictive uncertainty of deep neural networks transparently and reliably is important in many safety-critical applications such as medicine. However, modern neural networks tend to be poorly calibrated, resulting in wrong predictions made with a high confidence. While existing post-hoc calibration methods like temperature scaling or isotonic regression yield strongly calibrated predictions in artificial experimental settings, their efficiency can significantly reduce in real-world applications, where scarcity of labeled data or domain drifts are commonly present. In this paper, we first investigate the impact of these characteristics on post-hoc calibration and introduce an easy-to-implement extension of common post-hoc calibration methods based on test time augmentation. In extensive experiments, we demonstrate that our approach results in substantially better calibration on various architectures. We demonstrate the robustness of our proposed approach on a real-world application for skin cancer classification and show that it facilitates safe decision-making under real-world uncertainties.
Publisher
Association for the Advancement of Artificial Intelligence (AAAI)
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献