A Deep Learning Approach for Quantifying Vocal Fold Dynamics During Connected Speech Using Laryngeal High-Speed Videoendoscopy

Author:

Yousef Ahmed M.1,Deliyski Dimitar D.1,Zacharias Stephanie R. C.23,de Alarcon Alessandro45,Orlikoff Robert F.6,Naghibolhosseini Maryam1ORCID

Affiliation:

1. Department of Communicative Sciences and Disorders, Michigan State University, East Lansing

2. Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, AZ

3. Department of Otolaryngology—Head and Neck Surgery, Mayo Clinic, Phoenix, AZ

4. Division of Pediatric Otolaryngology, Cincinnati Children's Hospital Medical Center, OH

5. Department of Otolaryngology—Head and Neck Surgery, University of Cincinnati, OH

6. College of Allied Health Sciences, East Carolina University, Greenville, NC

Abstract

Purpose:Voice disorders are best assessed by examining vocal fold dynamics in connected speech. This can be achieved using flexible laryngeal high-speed videoendoscopy (HSV), which enables us to study vocal fold mechanics with high temporal details. Analysis of vocal fold vibration using HSV requires accurate segmentation of the vocal fold edges. This article presents an automated deep-learning scheme to segment the glottal area in HSV from which the glottal edges are derived during connected speech.Method:Using a custom-built HSV system, data were obtained from a vocally healthy participant reciting the “Rainbow Passage.” A deep neural network was designed for glottal area segmentation in the HSV data. A recently introduced hybrid approach by the authors was utilized as an automated labeling tool to train the network on a set of HSV frames, where the glottis region was automatically annotated during vocal fold vibrations. The network was then tested against manually segmented frames using different metrics, intersection over union (IoU), and Boundary F1 (BF) score, and its performance was assessed on various phonatory events on the HSV sequence.Results:The designed network was successfully trained using the hybrid approach, without the need for manual labeling, and tested on the manually labeled data. The performance metrics showed a mean IoU of 0.82 and a mean BF score of 0.96. In addition, the evaluation assessment of the network's performance demonstrated an accurate segmentation of the glottal edges/area even during complex nonstationary phonatory events and when vocal folds were not vibrating, thus overcoming the limitations of the previous hybrid approach that could only be applied to the vibrating vocal folds.Conclusions:The introduced automated scheme guarantees accurate glottis representation in challenging color HSV data with lower image quality and excessive laryngeal maneuvers during all instances of connected speech. This facilitates the future development of HSV-based measures to assess the running vibratory characteristics of the vocal folds in speakers with and without voice disorder.Supplemental Material:https://doi.org/10.23641/asha.19798864

Publisher

American Speech Language Hearing Association

Subject

Speech and Hearing,Linguistics and Language,Language and Linguistics

Reference79 articles.

1. Aronson, A. E. , & Bless, D. (2011). Clinical voice disorders. Thieme.

2. Videostroboscopic evaluation of the larynx;Bless D. M.;Ear, Nose & Throat Journal,1987

3. Brown, C. , Naghibolhosseini, M. , Zacharias, S. R. , & Deliyski, D. D. (2019). Investigation of high-speed videoendoscopy during connected speech in norm and neurogenic voice disorder. Michigan Speech-Language-Hearing Association (MSHA) Annual Conference, East Lansing, MI, United States.

4. Csurka, G. , Larlus, D. , Perronnin, F. , & Meylan, F. (2013). What is a good evaluation measure for semantic segmentation? In T. Burghardt , D. Damen , W. Mayol-Cuevas , & M. Mirmehdi (Eds.). Proceedings of the British Machine Vision Conference (Vol. 27, No. 2013, pp. 32.1−32.11). BMVA Press. https://doi.org/10.5244/C.27.32

5. Endoscope Motion Compensation for Laryngeal High-Speed Videoendoscopy

Cited by 16 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3