Analysis of machine learning prediction reliability based on sampling distance evaluation with feature decorrelation-Reference-Cited by-同舟云学术

Analysis of machine learning prediction reliability based on sampling distance evaluation with feature decorrelation

Published:2024-05-03 Issue:2 Volume:5 Page:025030
ISSN:2632-2153
Container-title:Machine Learning: Science and Technology
language:
Short-container-title:Mach. Learn.: Sci. Technol.

Author:

Askanazi Evan^ORCID,Grinberg Ilya

Abstract

Abstract Despite successful use in a wide variety of disciplines for data analysis and prediction, machine learning (ML) methods suffer from a lack of understanding of the reliability of predictions due to the lack of transparency and black-box nature of ML models. In materials science and other fields, typical ML model results include a significant number of low-quality predictions. This problem is known to be particularly acute for target systems which differ significantly from the data used for ML model training. However, to date, a general method for uncertainty quantification (UQ) of ML predictions has not been available. Focusing on the intuitive and computationally efficient similarity-based UQ, we show that a simple metric based on Euclidean feature space distance and sampling density together with the decorrelation of the features using Gram–Schmidt orthogonalization allows effective separation of the accurately predicted data points from data points with poor prediction accuracy. To demonstrate the generality of the method, we apply it to support vector regression models for various small data sets in materials science and other fields. We also show that this metric is a more effective UQ tool than the standard approach of using the average distance of k nearest neighbors (k = 1–10) in features space for similarity evaluation. Our method is computationally simple, can be used with any ML learning method and enables analysis of the sources of the ML prediction errors. Therefore, it is suitable for use as a standard technique for the estimation of ML prediction reliability for small data sets and as a tool for data set design.

Funder

Army Research Laboratory

Israel Science Foundation

Publisher

IOP Publishing

Link

https://iopscience.iop.org/article/10.1088/2632-2153/ad4231/pdf

Reference34 articles.

1. Uncertainty quantification using neural networks for molecular property prediction;Hirschfeld;J. Chem. Inf. Model.,2020

2. Methods for comparing uncertainty quantifications for material property predictions

3. Uncertainty quantification for predictions of atomistic neural networks;Salazar;Chem. Sci.,2022

4. Robust and scalable uncertainty estimation with conformal prediction for machine-learned interatomic potentials

5. Uncertainty quantification in molecular simulations with dropout neural network potentials;Wen;npj Comput. Mater.,2020

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Aggregation and assessment of grape quality parameters with visible-near-infrared spectroscopy: Introducing a novel quantitative index;Postharvest Biology and Technology;2024-12