Inferring feature importance with uncertainties with application to large genotype data-Reference-Cited by-同舟云学术

Inferring feature importance with uncertainties with application to large genotype data

Published:2023-03-14 Issue:3 Volume:19 Page:e1010963
ISSN:1553-7358
Container-title:PLOS Computational Biology
language:en
Short-container-title:PLoS Comput Biol

Author:

Johnsen Pål Vegard^ORCID,Strümke Inga^ORCID,Langaas Mette^ORCID,DeWan Andrew Thomas,Riemer-Sørensen Signe^ORCID

Abstract

Estimating feature importance, which is the contribution of a prediction or several predictions due to a feature, is an essential aspect of explaining data-based models. Besides explaining the model itself, an equally relevant question is which features are important in the underlying data generating process. We present a Shapley-value-based framework for inferring the importance of individual features, including uncertainty in the estimator. We build upon the recently published model-agnostic feature importance score of SAGE (Shapley additive global importance) and introduce Sub-SAGE. For tree-based models, it has the advantage that it can be estimated without computationally expensive resampling. We argue that for all model types the uncertainties in our Sub-SAGE estimator can be estimated using bootstrapping and demonstrate the approach for tree ensemble methods. The framework is exemplified on synthetic data as well as large genotype data for predicting feature importance with respect to obesity.

Funder

Norges Forskningsråd

Publisher

Public Library of Science (PLoS)

Subject

Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics

Reference41 articles.

1. Drug discovery with explainable artificial intelligence;J Jiménez-Luna;Nature Machine Intelligence,2020

2. A Value for n-Person Games;LS Shapley;Contributions to the Theory of Games (AM-28),1953

3. Explaining individual predictions when features are dependent: More accurate approximations to Shapley values;K Aas;Artificial Intelligence,2021

4. From local explanations to global understanding with explainable AI for trees;SM Lundberg;Nature Machine Intelligence,2020

5. shapr: An R-package for explaining machine learning models with dependence-aware Shapley values;N Sellereite;Journal of Open Source Software,2019

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SARS-CoV-2 Genetic Variants and Patient Factors Associated with Hospitalization Risk;2024-03-10

2. Identifying key factors in cell fate decisions by machine learning interpretable strategies;Journal of Biological Physics;2023-07-17