Cracking black-box models: Revealing hidden machine learning techniques behind their predictions-Reference-Cited by-同舟云学术

Cracking black-box models: Revealing hidden machine learning techniques behind their predictions

Published:2024-03-20 Issue: Volume: Page:1-21
ISSN:1088-467X
Container-title:Intelligent Data Analysis
language:
Short-container-title:IDA

Author:

Fabra-Boluda Raül,Ferri Cèsar,Hernández-Orallo José,Ramírez-Quintana M. José,Martínez-Plumed Fernando

Abstract

The quest for transparency in black-box models has gained significant momentum in recent years. In particular, discovering the underlying machine learning technique type (or model family) from the performance of a black-box model is a real important problem both for better understanding its behaviour and for developing strategies to attack it by exploiting the weaknesses intrinsic to the learning technique. In this paper, we tackle the challenging task of identifying which kind of machine learning model is behind the predictions when we interact with a black-box model. Our innovative method involves systematically querying a black-box model (oracle) to label an artificially generated dataset, which is then used to train different surrogate models using machine learning techniques from different families (each one trying to partially approximate the oracle’s behaviour). We present two approaches based on similarity measures, one selecting the most similar family and the other using a conveniently constructed meta-model. In both cases, we use both crisp and soft classifiers and their corresponding similarity metrics. By experimentally comparing all these methods, we gain valuable insights into the explanatory and predictive capabilities of our model family concept. This provides a deeper understanding of the black-box models and increases their transparency and interpretability, paving the way for more effective decision making.

Publisher

IOS Press

Reference64 articles.

1. Modelling Machine Learning Models;Fabra-Boluda;3rd Conf. on” Philosophy and Theory of Artificial Intelligence,2017

2. A.D. Joseph, B. Nelson, B.I. Rubinstein and J. Tygar, Adversarial machine learning, Cambridge University Press, (2018).

3. Y. Vorobeychik, M. Kantarcioglu, R. Brachman, P. Stone and F. Rossi, Adversarial machine learning, 12, Springer, (2018).

4. C. Molnar et al, Interpretable machine learning: A guide for making black box models explainable, Christoph Molnar, Leanpub (2018).

5. Interpreting black-box models: a review on explainable artificial intelligence;Hassija;Cognitive Computation,2023