Abstract
Shapley value regression with machine learning models has recently emerged as an axiomatic approach to the development of diagnostic models. However, when large numbers of predictor variables have to be considered, these methods become infeasible, owing to the inhibitive computational cost. In this paper, an approximate Shapley value approach with random forests is compared with a full Shapley model, as well as other methods used in variable importance analysis. Three case studies are considered, namely one based on simulated data, a model predicting throughput in a calcium carbide furnace as a function of operating variables, and a case study related to energy consumption in a steel plant. The approximately Shapley approach achieved results very similar to those achieved with the full Shapley approach but at a fraction of the computational cost. Moreover, although the variable importance measures considered in this study consistently identified the most influential predictors in the case studies, they yielded different results when fewer influential predictors were considered, and none of the variable importance measures performed better than the other measures across all three case studies.
Funder
ARC Centre of Excellence for Enabling Eco-Efficient Beneficiation of Minerals
Subject
General Materials Science,Metals and Alloys
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献