Impact of the Choice of Cross-Validation Techniques on the Results of Machine Learning-Based Diagnostic Applications-Reference-Cited by-同舟云学术

Impact of the Choice of Cross-Validation Techniques on the Results of Machine Learning-Based Diagnostic Applications

Published:2021-07-31 Issue:3 Volume:27 Page:189-199
ISSN:2093-369X
Container-title:Healthcare Informatics Research
language:en
Short-container-title:Healthc Inform Res

Author:

Tougui Ilias^ORCID,Jilbab Abdelilah^ORCID,Mhamdi Jamal El^ORCID

Abstract

Objectives: With advances in data availability and computing capabilities, artificial intelligence and machine learning technologies have evolved rapidly in recent years. Researchers have taken advantage of these developments in healthcare informatics and created reliable tools to predict or classify diseases using machine learning-based algorithms. To correctly quantify the performance of those algorithms, the standard approach is to use cross-validation, where the algorithm is trained on a training set, and its performance is measured on a validation set. Both datasets should be subject-independent to simulate the expected behavior of a clinical study. This study compares two cross-validation strategies, the subject-wise and the record-wise techniques; the subject-wise strategy correctly mimics the process of a clinical study, while the record-wise strategy does not.Methods: We started by creating a dataset of smartphone audio recordings of subjects diagnosed with and without Parkinson’s disease. This dataset was then divided into training and holdout sets using subject-wise and the record-wise divisions. The training set was used to measure the performance of two classifiers (support vector machine and random forest) to compare six cross-validation techniques that simulated either the subject-wise process or the record-wise process. The holdout set was used to calculate the true error of the classifiers.Results: The record-wise division and the record-wise cross-validation techniques overestimated the performance of the classifiers and underestimated the classification error.Conclusions: In a diagnostic scenario, the subject-wise technique is the proper way of estimating a model’s performance, and record-wise techniques should be avoided.

Publisher

The Korean Society of Medical Informatics

Subject

Health Information Management,Health Informatics,Biomedical Engineering

Link

http://e-hir.org/upload/pdf/hir-2021-27-3-189.pdf

Reference18 articles.

1. Use of Mobile Devices to Measure Outcomes in Clinical Research, 2010–2016: A Systematic Literature Review

2. A survey of cross-validation procedures for model selection

3. The mPower study, Parkinson disease mobile data collected using ResearchKit

Cited by 62 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Predicting adolescent psychopathology from early life factors: A machine learning tutorial;Global Epidemiology;2024-12

2. Wearable Sensor-Based Assessments for Remotely Screening Early-Stage Parkinson’s Disease;Sensors;2024-08-30

3. Development and evaluation of predictive models for pregnancy risk in UK dairy cows;Journal of Dairy Science;2024-08

4. Predicting sunspot number from topological features in spectral images I: Machine learning approach;Astronomy and Computing;2024-07

5. A Novel Video-Based Methodology for Automated Classification of Dystonia and Choreoathetosis in Dyskinetic Cerebral Palsy During a Lower Extremity Task;Neurorehabilitation and Neural Repair;2024-06-06