Machine Learning Regions of Reliability based on Sampling Distance Evaluation with Feature Decorrelation for Tabular Time Datasets-Reference-Cited by-同舟云学术

Machine Learning Regions of Reliability based on Sampling Distance Evaluation with Feature Decorrelation for Tabular Time Datasets

Published:2024-06-20 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Askanazi Evan¹,Grinberg Ilya¹

Affiliation:

1. Bar Ilan University

Abstract

Despite successful use in a wide variety of disciplines for data analysis and prediction, machine learning (ML) methods suffer from a lack of understanding of the reliability of predictions due to the lack of transparency and black-box nature of ML models. In materials science and other fields, typical ML model results include a significant number of low-quality predictions. This problem is known to be particularly acute for target systems which differ significantly from the data used for ML model training. However, to date, a general method for uncertainty quantification (UQ) of ML predictions has not been available. Focusing on the intuitive and computationally efficient similarity-based UQ, we show that a simple metric based on Euclidean feature space distance and sampling density together with the decorrelation of the features using Gram-Schmidt orthogonalization allows effective separation of the accurately predicted data points from data points with poor prediction accuracy. To demonstrate the generality of the method, we apply it to light GBM machine learning using a set of time series tabular data sets. We also show that this metric is a more effective UQ tool than the standard approach of using the average distance of k nearest neighbors (k = 1–10) in features space for similarity evaluation. The computational simplicity of this dataset combined with its applicability to time series datasets allows it to be readily used in numerous real world problems.

Publisher

Research Square Platform LLC

Reference40 articles.

1. Uncertainty Quantification Using Neural Networks for Molecular Property Prediction;Lior Hirschfeld Kyle;Journal of Chemical Information and Modeling,2020

2. Kevin Tran et al 2020 Mach. Learn.: Sci. Technol. 1 025006 (2022)

3. Uncertainty quantification for predictions of atomistic neural networks;Salazar L;Chem. Sci.,2022

4. Yuge Hu et al 2022 Mach. Learn.: Sci. Technol. 3 045028 (2022)

5. Uncertainty quantification in molecular simulations with dropout neural network potentials;Wen Mingjian;npj Computational Materials.,2020