Author:
Verma Vyom,Benjwal Anish,Chhabra Amit,Singh Sunil K.,Kumar Sudhakar,Gupta Brij B.,Arya Varsha,Chui Kwok Tai
Abstract
AbstractVoice is an essential component of human communication, serving as a fundamental medium for expressing thoughts, emotions, and ideas. Disruptions in vocal fold vibratory patterns can lead to voice disorders, which can have a profound impact on interpersonal interactions. Early detection of voice disorders is crucial for improving voice health and quality of life. This research proposes a novel methodology called VDDMFS [voice disorder detection using MFCC (Mel-frequency cepstral coefficients), fundamental frequency and spectral centroid] which combines an artificial neural network (ANN) trained on acoustic attributes and a long short-term memory (LSTM) model trained on MFCC attributes. Subsequently, the probabilities generated by both the ANN and LSTM models are stacked and used as input for XGBoost, which detects whether a voice is disordered or not, resulting in more accurate voice disorder detection. This approach achieved promising results, with an accuracy of 95.67%, sensitivity of 95.36%, specificity of 96.49% and f1 score of 96.9%, outperforming existing techniques.
Publisher
Springer Science and Business Media LLC
Reference72 articles.
1. Bhattacharyya, N. The prevalence of voice problems among adults in the united states. Laryngoscope 124, 2359–2362. https://doi.org/10.1002/lary.24740 (2014).
2. Morris, M. A., Meier, S. K., Griffin, J. M., Branda, M. E. & Phelan, S. M. Prevalence and etiologies of adult communication disabilities in the united states: Results from the 2012 national health interview survey. Disabil. Health J. 9, 140–144. https://doi.org/10.1016/j.dhjo.2015.07.004 (2016).
3. Heinen, M. M. et al. Waist circumference improves obesity models but social disadvantage remains significant: Results among 10,766 children of the childhood growth surveillance initiative (COSI) in the republic of Ireland. Int. J. Epidemiol. 44, i260–i260. https://doi.org/10.1093/ije/dyv096.490 (2015).
4. About 1 in 12 children has a disorder related to voice, speech, language, or swallowing—nidcd.nih.gov. https://www.nidcd.nih.gov/news/2015/about-1-12-children-has-disorder-related-voice-speech-language-or-swallowing (2015).
5. Wang, J. & Jo, C. Performance of gaussian mixture models as a classifier for pathological voice. In Proceedings of the 11th Australian International Conference on Speech Science and Technology, Vol. 107, 122–131 (2006).
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献