Mask R-CNN based multiclass segmentation model for endotracheal intubation using video laryngoscope-Reference-Cited by-同舟云学术

Mask R-CNN based multiclass segmentation model for endotracheal intubation using video laryngoscope

Published:2023-01 Issue: Volume:9 Page:
ISSN:2055-2076
Container-title:DIGITAL HEALTH
language:en
Short-container-title:DIGITAL HEALTH

Author:

Choi Seung Jae¹,Kim Dae Kon²³⁴,Kim Byeong Soo⁵,Cho Minwoo¹,Jeong Joo²³,Jo You Hwan²³,Song Kyoung Jun³⁶,Kim Yu Jin²³,Kim Sungwan⁴⁷^ORCID

Affiliation:

1. Transdisciplinary Department of Medicine and Advanced Technology, Seoul National University Hospital, Seoul, Republic of Korea

2. Department of Emergency Medicine, Seoul National University Bundang Hospital, Seongnam, Republic of Korea

3. Department of Emergency Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea

4. Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, Republic of Korea

5. Interdisciplinary Program in Bioengineering, Graduate School, Seoul National University, Seoul, Republic of Korea

6. Department of Emergency Medicine, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, Seoul, Republic of Korea

7. Institute of Bioengineering, Seoul National University, Seoul, Republic of Korea

Abstract

Objective Endotracheal intubation (ETI) is critical to secure the airway in emergent situations. Although artificial intelligence algorithms are frequently used to analyze medical images, their application to evaluating intraoral structures based on images captured during emergent ETI remains limited. The aim of this study is to develop an artificial intelligence model for segmenting structures in the oral cavity using video laryngoscope (VL) images. Methods From 54 VL videos, clinicians manually labeled images that include motion blur, foggy vision, blood, mucus, and vomitus. Anatomical structures of interest included the tongue, epiglottis, vocal cord, and corniculate cartilage. EfficientNet-B5 with DeepLabv3+, EffecientNet-B5 with U-Net, and Configured Mask R-Convolution Neural Network (CNN) were used; EffecientNet-B5 was pretrained on ImageNet. Dice similarity coefficient (DSC) was used to measure the segmentation performance of the model. Accuracy, recall, specificity, and F1 score were used to evaluate the model's performance in targeting the structure from the value of the intersection over union between the ground truth and prediction mask. Results The DSC of tongue, epiglottis, vocal cord, and corniculate cartilage obtained from the EfficientNet-B5 with DeepLabv3+, EfficientNet-B5 with U-Net, and Configured Mask R-CNN model were 0.3351/0.7675/0.766/0.6539, 0.0/0.7581/0.7395/0.6906, and 0.1167/0.7677/0.7207/0.57, respectively. Furthermore, the processing speeds (frames per second) of the three models stood at 3, 24, and 32, respectively. Conclusions The algorithm developed in this study can assist medical providers performing ETI in emergent situations.

Funder

National Research Foundation of Korea

AI Institute at Seoul National University

Publisher

SAGE Publications

Subject

Health Information Management,Computer Science Applications,Health Informatics,Health Policy

Link

http://journals.sagepub.com/doi/pdf/10.1177/20552076231211547

Reference32 articles.

1. First-pass intubation success rate during rapid sequence induction of prehospital anaesthesia by physicians versus paramedics

2. Techniques for Endotracheal Intubation