A novel hybrid model integrating MFCC and acoustic parameters for voice disorder detection-Reference-Cited by-同舟云学术

A novel hybrid model integrating MFCC and acoustic parameters for voice disorder detection

Published:2023-12-20 Issue:1 Volume:13 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Verma Vyom,Benjwal Anish,Chhabra Amit,Singh Sunil K.,Kumar Sudhakar,Gupta Brij B.,Arya Varsha,Chui Kwok Tai

Abstract

AbstractVoice is an essential component of human communication, serving as a fundamental medium for expressing thoughts, emotions, and ideas. Disruptions in vocal fold vibratory patterns can lead to voice disorders, which can have a profound impact on interpersonal interactions. Early detection of voice disorders is crucial for improving voice health and quality of life. This research proposes a novel methodology called VDDMFS [voice disorder detection using MFCC (Mel-frequency cepstral coefficients), fundamental frequency and spectral centroid] which combines an artificial neural network (ANN) trained on acoustic attributes and a long short-term memory (LSTM) model trained on MFCC attributes. Subsequently, the probabilities generated by both the ANN and LSTM models are stacked and used as input for XGBoost, which detects whether a voice is disordered or not, resulting in more accurate voice disorder detection. This approach achieved promising results, with an accuracy of 95.67%, sensitivity of 95.36%, specificity of 96.49% and f1 score of 96.9%, outperforming existing techniques.

Funder

Kwok Tai Chui

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Link

https://www.nature.com/articles/s41598-023-49869-6.pdf

Reference72 articles.

1. Bhattacharyya, N. The prevalence of voice problems among adults in the united states. Laryngoscope 124, 2359–2362. https://doi.org/10.1002/lary.24740 (2014).

2. Morris, M. A., Meier, S. K., Griffin, J. M., Branda, M. E. & Phelan, S. M. Prevalence and etiologies of adult communication disabilities in the united states: Results from the 2012 national health interview survey. Disabil. Health J. 9, 140–144. https://doi.org/10.1016/j.dhjo.2015.07.004 (2016).

3. Heinen, M. M. et al. Waist circumference improves obesity models but social disadvantage remains significant: Results among 10,766 children of the childhood growth surveillance initiative (COSI) in the republic of Ireland. Int. J. Epidemiol. 44, i260–i260. https://doi.org/10.1093/ije/dyv096.490 (2015).

4. About 1 in 12 children has a disorder related to voice, speech, language, or swallowing—nidcd.nih.gov. https://www.nidcd.nih.gov/news/2015/about-1-12-children-has-disorder-related-voice-speech-language-or-swallowing (2015).

5. Wang, J. & Jo, C. Performance of gaussian mixture models as a classifier for pathological voice. In Proceedings of the 11th Australian International Conference on Speech Science and Technology, Vol. 107, 122–131 (2006).

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Developing a multi-variate prediction model for COVID-19 from crowd-sourced respiratory voice data;Exploration of Digital Health Technologies;2024-08-11

2. Voice pathology detection on spontaneous speech data using deep learning models;International Journal of Speech Technology;2024-08-10

3. Applying Visual Cryptography to Decrypt Data Using Human Senses;Advances in Information Security, Privacy, and Ethics;2024-07-12

4. Next Gen Security With Quantum-Safe Cryptography;Advances in Information Security, Privacy, and Ethics;2024-07-12

5. Homomorphic Encryption in Smart City Applications for Balancing Privacy and Utility;Advances in Information Security, Privacy, and Ethics;2024-07-12