Classification Confidence in Exploratory Learning: A User’s Guide
-
Published:2023-07-21
Issue:3
Volume:5
Page:803-829
-
ISSN:2504-4990
-
Container-title:Machine Learning and Knowledge Extraction
-
language:en
-
Short-container-title:MAKE
Author:
Salamon Peter1ORCID, Salamon David1, Cantu V. Adrian2, An Michelle3, Perry Tyler2, Edwards Robert A.4ORCID, Segall Anca M.5ORCID
Affiliation:
1. Department of Mathematics, San Diego State University, San Diego, CA 92182, USA 2. Computational Science Research Center, San Diego State University, San Diego, CA 92182, USA 3. Bioinformatics and Medical Informatics Program, San Diego State University, San Diego, CA 92182, USA 4. Flinders Accelerator for Microbiome Exploration, Flinders University, Flinders, Adelaide, SA 5001, Australia 5. Department of Biology, San Diego State University, San Diego, CA 92182, USA
Abstract
This paper investigates the post-hoc calibration of confidence for “exploratory” machine learning classification problems. The difficulty in these problems stems from the continuing desire to push the boundaries of which categories have enough examples to generalize from when curating datasets, and confusion regarding the validity of those categories. We argue that for such problems the “one-versus-all” approach (top-label calibration) must be used rather than the “calibrate-the-full-response-matrix” approach advocated elsewhere in the literature. We introduce and test four new algorithms designed to handle the idiosyncrasies of category-specific confidence estimation using only the test set and the final model. Chief among these methods is the use of kernel density ratios for confidence calibration including a novel algorithm for choosing the bandwidth. We test our claims and explore the limits of calibration on a bioinformatics application (PhANNs) as well as the classic MNIST benchmark. Finally, our analysis argues that post-hoc calibration should always be performed, may be performed using only the test dataset, and should be sanity-checked visually.
Funder
NIDDK Computational and Experimental Resources for Virome Analysis in Inflammatory Bowel Disease
Subject
Artificial Intelligence,Engineering (miscellaneous)
Reference33 articles.
1. Gawlikowski, J., Tassi, C.R.N., Ali, M., Lee, J., Humt, M., Feng, J., Kruspe, A., Triebel, R., Jung, P., and Roscher, R. (2021). A Survey of Uncertainty in Deep Neural Networks. arXiv. 2. Kuppers, F., Kronenberger, J., Shantia, A., and Haselhoff, A. (2020, January 14–19). Multivariate Confidence Calibration for Object Detection. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA. 3. Lakshminarayanan, B., Pritzel, A., and Blundell, C. (2017). Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. arXiv. 4. Jiang, H., Kim, B., Guan, M.Y., and Gupta, M. (2018). To Trust or Not to Trust A Classifier. arXiv. 5. Zhang, J., Kailkhura, B., and Han, T.Y.J. (2020, January 13–18). Mix-n-match: Ensemble and compositional methods for uncertainty calibration in deep learning. Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event.
|
|