Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset-Reference-Cited by-同舟云学术

Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset

Published:2022-05-03 Issue:1 Volume:12 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Meng Chuizheng,Trinh Loc,Xu Nan,Enouen James,Liu Yan

Abstract

AbstractThe recent release of large-scale healthcare datasets has greatly propelled the research of data-driven deep learning models for healthcare applications. However, due to the nature of such deep black-boxed models, concerns about interpretability, fairness, and biases in healthcare scenarios where human lives are at stake call for a careful and thorough examination of both datasets and models. In this work, we focus on MIMIC-IV (Medical Information Mart for Intensive Care, version IV), the largest publicly available healthcare dataset, and conduct comprehensive analyses of interpretability as well as dataset representation bias and prediction fairness of deep learning models for in-hospital mortality prediction. First, we analyze the interpretability of deep learning mortality prediction models and observe that (1) the best-performing interpretability method successfully identifies critical features for mortality prediction on various prediction models as well as recognizing new important features that domain knowledge does not consider; (2) prediction models rely on demographic features, raising concerns in fairness. Therefore, we then evaluate the fairness of models and do observe the unfairness: (1) there exists disparate treatment in prescribing mechanical ventilation among patient groups across ethnicity, gender and age; (2) models often rely on racial attributes unequally across subgroups to generate their predictions. We further draw concrete connections between interpretability methods and fairness metrics by showing how feature importance from interpretability methods can be beneficial in quantifying potential disparities in mortality predictors. Our analysis demonstrates that the prediction performance is not the only factor to consider when evaluating models for healthcare applications, since high prediction performance might be the result of unfair utilization of demographic features. Our findings suggest that future research in AI models for healthcare applications can benefit from utilizing the analysis workflow of interpretability and fairness as well as verifying if models achieve superior performance at the cost of introducing bias.

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Link

https://www.nature.com/articles/s41598-022-11012-2.pdf

Reference88 articles.

1. Purushotham, S., Meng, C., Che, Z. & Liu, Y. Benchmarking deep learning models on large healthcare datasets. J. Biomed. Inf. 83, 112–134. https://doi.org/10.1016/j.jbi.2018.04.007 (2018).

2. Harutyunyan, H., Khachatrian, H., Kale, D. C., Ver Steeg, G. & Galstyan, A. Multitask learning and benchmarking with clinical time series data. Sci. Data 6, 1–18 (2019).

3. Wang, S. et al. Mimic-extract: A data extraction, preprocessing, and representation pipeline for mimic-iii. In Proceedings of the ACM Conference on Health, Inference, and Learning, 222–235 (2020).

4. Chen, I., Johansson, F. D. & Sontag, D. Why is my classifier discriminatory? In Adv. Neural Inf. Process. Syst., 3539–3550 (2018).

5. Johnson, A. et al. Mimic-iv (version 0.4). PhysioNet (2020).

Cited by 75 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Prediction of mortality events of patients with acute heart failure in intensive care unit based on deep neural network;Computer Methods and Programs in Biomedicine;2024-11

2. A survey of vision-based condition monitoring methods using deep learning: A synthetic fiber rope perspective;Engineering Applications of Artificial Intelligence;2024-10

3. Obtaining the Most Accurate, Explainable Model for Predicting Chronic Obstructive Pulmonary Disease: Triangulation of Multiple Linear Regression and Machine Learning Methods;JMIR AI;2024-08-29

4. Using Self-supervised Learning Can Improve Model Fairness;Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining;2024-08-24

5. Designing interpretable ML system to enhance trust in healthcare: A systematic review to proposed responsible clinician-AI-collaboration framework;Information Fusion;2024-08