Comparative Analysis of Machine-Learning Model Performance in Image Analysis: The Impact of Dataset Diversity and Size-Reference-Cited by-同舟云学术

Comparative Analysis of Machine-Learning Model Performance in Image Analysis: The Impact of Dataset Diversity and Size

Published:2024-08-08 Issue: Volume: Page:
ISSN:0003-2999
Container-title:Anesthesia & Analgesia
language:en
Short-container-title:

Author:

Pelletier Eric D.¹,Jeffries Sean D.¹²,Song Kevin²,Hemmerling Thomas M.¹²

Affiliation:

1. Department of Experimental Surgery, McGill University Health Center, Montreal, Quebec, Canada

2. Department of Anesthesia, McGill University, Montreal, Quebec, Canada.

Abstract

BACKGROUND: This study presents an analysis of machine-learning model performance in image analysis, with a specific focus on videolaryngoscopy procedures. The research aimed to explore how dataset diversity and size affect the performance of machine-learning models, an issue vital to the advancement of clinical artificial intelligence tools. METHODS: A total of 377 videolaryngoscopy videos from YouTube were used to create 6 varied datasets, each differing in patient diversity and image count. The study also incorporates data augmentation techniques to enhance these datasets further. Two machine-learning models, YOLOv5-Small and YOLOv8-Small, were trained and evaluated on metrics such as F1 score (a statistical measure that combines the precision and recall of the model into a single metric, reflecting its overall accuracy), precision, recall, mAP@50, and mAP@50–95. RESULTS: The findings indicate a significant impact of dataset configuration on model performance, especially the balance between diversity and quantity. The Multi-25 × 10 dataset, featuring 25 images from 10 different patients, demonstrates superior performance, highlighting the value of a well-balanced dataset. The study also finds that the effects of data augmentation vary across different types of datasets. CONCLUSIONS: Overall, this study emphasizes the critical role of dataset structure in the performance of machine-learning models in medical image analysis. It underscores the necessity of striking an optimal balance between dataset size and diversity, thereby illuminating the complexities inherent in data-driven machine-learning development.

Publisher

Ovid Technologies (Wolters Kluwer Health)

Reference13 articles.

1. Acceptance of clinical artificial intelligence among physicians and medical students: a systematic review with cross-sectional survey.;Chen;Front Med,2022

2. How much data is needed to train a medical image deep learning system to achieve necessary high accuracy?;Cho;arXiv Learn,2015

3. Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models.;Bailly;Comput Methods Programs Biomed,2022

4. Ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime Instance Segmentation (v7.0).;Jocher;Zenodo,2022

5. Ultralytics YOLO.;Jocher