End‐to‐end deep learning classification of vocal pathology using stacked vowels-Reference-Cited by-同舟云学术

End‐to‐end deep learning classification of vocal pathology using stacked vowels

Published:2023-08-31 Issue:5 Volume:8 Page:1312-1318
ISSN:2378-8038
Container-title:Laryngoscope Investigative Otolaryngology
language:en
Short-container-title:Laryngoscope Investig Oto

Author:

Liu George S.¹²^ORCID,Hodges Jordan M.³,Yu Jingzhi⁴,Sung C. Kwang¹²^ORCID,Erickson‐DiRenzo Elizabeth¹²^ORCID,Doyle Philip C.¹²^ORCID

Affiliation:

1. Department of Otolaryngology Head and Neck Surgery Stanford University School of Medicine, Stanford University Stanford California USA

2. Division of Laryngology Stanford University School of Medicine, Stanford University Stanford California USA

3. Computer Science Department School of Engineering, Stanford University Stanford California USA

4. Biomedical Informatics, Department of Biomedical Data Science Stanford University School of Medicine Stanford California USA

Abstract

AbstractObjectivesAdvances in artificial intelligence (AI) technology have increased the feasibility of classifying voice disorders using voice recordings as a screening tool. This work develops upon previous models that take in single vowel recordings by analyzing multiple vowel recordings simultaneously to enhance prediction of vocal pathology.MethodsVoice samples from the Saarbruecken Voice Database, including three sustained vowels (/a/, /i/, /u/) from 687 healthy human participants and 334 dysphonic patients, were used to train 1‐dimensional convolutional neural network models for multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings. Three models were trained: (1) a baseline model that analyzed individual vowels in isolation, (2) a stacked vowel model that analyzed three vowels (/a/, /i/, /u/) in the neutral pitch simultaneously, and (3) a stacked pitch model that analyzed the /a/ vowel in three pitches (low, neutral, and high) simultaneously.ResultsFor multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings, the stacked vowel model demonstrated higher performance compared with the baseline and stacked pitch models (F1 score 0.81 vs. 0.77 and 0.78, respectively). Specifically, the stacked vowel model achieved higher performance for class‐specific classification of hyperfunctional dysphonia voice samples compared with the baseline and stacked pitch models (F1 score 0.56 vs. 0.49 and 0.50, respectively).ConclusionsThis study demonstrates the feasibility and potential of analyzing multiple sustained vowel recordings simultaneously to improve AI‐driven screening and classification of vocal pathology. The stacked vowel model architecture in particular offers promise to enhance such an approach.Lay SummaryAI analysis of multiple vowel recordings can improve classification of voice pathologies compared with models using a single sustained vowel and offer a strategy to enhance AI‐driven screening of voice disorders.Level of Evidence3

Publisher

Wiley

Subject

General Medicine

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/lio2.1144

Reference39 articles.

1. Current role of stroboscopy in laryngeal imaging

2. Perceptual Evaluation of Voice Quality

3. When and why listeners disagree in voice quality assessment tasks

4. The perceptual structure of pathologic voice quality

5. The multidimensional nature of pathologic vocal quality

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Artificial Intelligence in Laryngology, Broncho-Esophagology, and Sleep Surgery;Otolaryngologic Clinics of North America;2024-10

2. A Scoping Review of Artificial Intelligence Detection of Voice Pathology: Challenges and Opportunities;Otolaryngology–Head and Neck Surgery;2024-05-13

3. Deep Learning-Based Voice Pathology Detection From Electroglottography;Advances in Medical Technologies and Clinical Practice;2024-05-10