Uncertainty Quantification of Machine Learning Model Performance via Anomaly-Based Dataset Dissimilarity Measures
-
Published:2024-02-29
Issue:5
Volume:13
Page:939
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Incorvaia Gabriele1, Hond Darryl1, Asgari Hamid1
Affiliation:
1. Thales UK - Research, Technology & Innovation, Reading RG2 6GF, UK
Abstract
The use of Machine Learning (ML) models as predictive tools has increased dramatically in recent years. However, data-driven systems (such as ML models) exhibit a degree of uncertainty in their predictions. In other words, they could produce unexpectedly erroneous predictions if the uncertainty stemming from the data, choice of model and model parameters is not taken into account. In this paper, we introduce a novel method for quantifying the uncertainty of the performance levels attained by ML classifiers. In particular, we investigate and characterize the uncertainty of model accuracy when classifying out-of-distribution data that are statistically dissimilar from the data employed during training. A main element of this novel Uncertainty Quantification (UQ) method is a measure of the dissimilarity between two datasets. We introduce an innovative family of data dissimilarity measures based on anomaly detection algorithms, namely the Anomaly-based Dataset Dissimilarity (ADD) measures. These dissimilarity measures process feature representations that are derived from the activation values of neural networks when supplied with dataset items. The proposed UQ method for classification performance employs these dissimilarity measures to estimate the classifier accuracy for unseen, out-of-distribution datasets, and to give an uncertainty band for those estimates. A numerical analysis of the efficacy of the UQ method is conducted using standard Artificial Neural Network (ANN) classifiers and public domain datasets. The results obtained generally demonstrate that the amplitude of the uncertainty band associated with the estimated accuracy values tends to increase as the data dissimilarity measure increases. Overall, this research contributes to the verification and run-time performance prediction of systems composed of ML-based elements.
Funder
UK MoD DSTL Thales UK
Reference55 articles.
1. Kumar, Y., Komalpree, K., and Gurpreet, S. (2020, January 9–10). Machine learning aspects and its applications towards different research areas. Proceedings of the International Conference on Computation, Automation and Knowledge Management, Dubai, United Arab Emirates. 2. Machine learning-based approach: Global trends; research directions, and regulatory standpoints;Pugliese;Data Sci. Manag.,2021 3. Siddique, T., Mahmud, M.S., Keesee, A.M., Ngwira, C.M., and Connor, H. (2022). A survey of uncertainty quantification in machine learning for space weather prediction. Geosciences, 12. 4. Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., and Mané, D. (2016). Concrete problems in AI safety. arXiv. 5. Cobb, A.D., Jalaian, B., Bastian, N.D., and Russell, S. (2021). Systems Engineering and Artificial Intelligence, Springer.
|
|