Understanding the performance of machine learning models from data- to patient-level-Reference-Cited by-同舟云学术

Understanding the performance of machine learning models from data- to patient-level

Published:2024-09-13 Issue: Volume: Page:
ISSN:1936-1955
Container-title:Journal of Data and Information Quality
language:en
Short-container-title:J. Data and Information Quality

Author:

Valeriano Maria¹^ORCID,Matran-Fernandez Ana²^ORCID,Kiffer Carlos³^ORCID,Lorena Ana Carolina⁴^ORCID

Affiliation:

1. Instituto Tecnológico de Aeronáutica, Sao Jose dos Campos, Brazil

2. University of Essex, Colchester United Kingdom of Great Britain and Northern Ireland

3. Universidade Federal de São Paulo, Sao Paulo Brazil

4. Instituto Tecnológico de Aeronáutica, Sao Jose dos Campos Brazil

Abstract

Machine Learning (ML) models have the potential to support decision-making in healthcare by grasping complex patterns within data. However, decisions in this domain are sensitive and require active involvement of domain specialists with deep knowledge of the data. In order to address this task, clinicians need to understand how predictions are generated so they can provide feedback for model refinement. There is usually a gap in the communication between data scientists and domain specialists that needs to be addressed. Specifically, many ML studies are only concerned with presenting average accuracies over an entire dataset, losing valuable insights that can be obtained at a more fine-grained patient-level analysis of classification performance. In this paper, we present a case study aimed at explaining the factors that contribute to specific predictions for individual patients. Our approach takes a data-centric perspective, focusing on the structure of the data and its correlation with ML model performance. We utilize the concept of Instance Hardness , which measures the level of difficulty an instance poses in being correctly classified. By selecting the hardest and easiest to classify instances, we analyze and contrast the distributions of specific input features and extract meta-features to describe each instance. Furthermore, we individually examine certain instances, offering valuable insights into why they offer challenges for classification, enabling a better understanding of both the successes and failures of the ML models. This opens up the possibility for discussions between data scientists and domain specialists, supporting collaborative decision-making.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3687267

Reference25 articles.

1. Measuring Instance Hardness Using Data Complexity Measures

2. André Calero Valdez, Martina Ziefle, Katrien Verbert, Alexander Felfernig, and Andreas Holzinger. 2016. Recommender systems for health informatics: state-of-the-art and future perspectives. In Machine learning for health informatics. Springer, 391–414.

3. Angelos Chatzimparmpas Fernando V Paulovich and Andreas Kerren. 2022. HardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling Techniques. arXiv preprint arXiv:2203.15753(2022).

4. Clinical and immunological features of severe and moderate coronavirus disease 2019

5. Alexander Decruyenaere, Philippe Decruyenaere, Patrick Peeters, Frank Vermassen, Tom Dhaene, and Ivo Couckuyt. 2015. Prediction of delayed graft function after kidney transplantation: comparison between logistic regression and machine learning methods. BMC medical informatics and decision making 15 (2015), 1–10.