Measuring the prediction difficulty of individual cases in a dataset using machine learning-Reference-Cited by-同舟云学术

Measuring the prediction difficulty of individual cases in a dataset using machine learning

Published:2024-05-07 Issue:1 Volume:14 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Kwon Hyunjin,Greenberg Matthew,Josephson Colin Bruce,Lee Joon

Abstract

AbstractDifferent levels of prediction difficulty are one of the key factors that researchers encounter when applying machine learning to data. Although previous studies have introduced various metrics for assessing the prediction difficulty of individual cases, these metrics require specific dataset preconditions. In this paper, we propose three novel metrics for measuring the prediction difficulty of individual cases using fully-connected feedforward neural networks. The first metric is based on the complexity of the neural network needed to make a correct prediction. The second metric employs a pair of neural networks: one makes a prediction for a given case, and the other predicts whether the prediction made by the first model is likely to be correct. The third metric assesses the variability of the neural network’s predictions. We investigated these metrics using a variety of datasets, visualized their values, and compared them to fifteen existing metrics from the literature. The results demonstrate that the proposed case difficulty metrics were better able to differentiate various levels of difficulty than most of the existing metrics and show constant effectiveness across diverse datasets. We expect our metrics will provide researchers with a new perspective on understanding their datasets and applying machine learning in various fields.

Funder

Natural Sciences and Engineering Research Council of Canada

Publisher

Springer Science and Business Media LLC

Link

https://www.nature.com/articles/s41598-024-61284-z.pdf

Reference28 articles.

1. Sarker, I. H. Machine learning: algorithms, real-world applications and research directions. SN Comput. Sci. 2(3), 160 (2021).

2. Dusenberry MW, Tran D, Choi E, Kemp J, Nixon J, Jerfel G, Heller K, Dai AM. 2020 Analyzing the role of model uncertainty for electronic health records. In: Proceedings of the ACM Conference on Health, Inference, and Learning. (pp. 204-213).

3. Kompa, B., Snoek, J. & Beam, A. L. Second opinion needed: communicating uncertainty in medical machine learning. npj Digital Med. https://doi.org/10.1038/s41746-020-00367-3 (2021).

4. Smith, M. R., Martinez, T. & Giraud-Carrier, C. An instance level analysis of data complexity. Machine Learn. 95, 225–256 (2013).

5. Arruda, J. L. M., Prudêncio, R. B. C. & Lorena, A. C. Measuring instance hardness using data complexity measures. In Intelligent Systems: 9th Brazilian Conference, BRACIS 2020, Rio Grande, Brazil, October 20–23, 2020, Proceedings, Part II (eds Cerri, R. & Prati, R. C.) 483–497 (Springer International Publishing, Cham, 2020). https://doi.org/10.1007/978-3-030-61380-8_33.