Voice disorder classification using convolutional neural network based on deep transfer learning-Reference-Cited by-同舟云学术

Voice disorder classification using convolutional neural network based on deep transfer learning

Published:2023-05-04 Issue:1 Volume:13 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Peng Xiangyu,Xu Huoyao,Liu Jie,Wang Junlang,He Chaoming

Abstract

AbstractVoice disorders are very common in the global population. Many researchers have conducted research on the identification and classification of voice disorders based on machine learning. As a data-driven algorithm, machine learning requires a large number of samples for training. However, due to the sensitivity and particularity of medical data, it is difficult to obtain sufficient samples for model learning. To address this challenge, this paper proposes a pretrained OpenL3-SVM transfer learning framework for the automatic recognition of multi-class voice disorders. The framework combines a pre-trained convolutional neural network, OpenL3, and a support vector machine (SVM) classifier. The Mel spectrum of the given voice signal is first extracted and then input into the OpenL3 network to obtain high-level feature embedding. Considering the effects of redundant and negative high-dimensional features, model overfitting easily occurs. Therefore, linear local tangent space alignment (LLTSA) is used for feature dimension reduction. Finally, the obtained dimensionality reduction features are used to train the SVM for voice disorder classification. Fivefold cross-validation is used to verify the classification performance of the OpenL3-SVM. The experimental results show that OpenL3-SVM can effectively classify voice disorders automatically, and its performance exceeds that of the existing methods. With continuous improvements in research, it is expected to be considered as auxiliary diagnostic tool for physicians in the future.

Funder

Sichuan Province Science and Technology Support Program

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Link

https://www.nature.com/articles/s41598-023-34461-9.pdf

Reference43 articles.

1. Vilkman, E. Voice problems at work: A challenge for occupational safety and health arrangement. FPL 52, 120–125 (2000).

2. Zhou, C. et al. Gammatone spectral latitude features extraction for pathological voice detection and classification. Appl. Acoust. 185, 108417 (2022).

3. Marques da Rocha, L., Behlau, M. & Dias de Mattos Souza, L. Behavioral dysphonia and depression in elementary school teachers. J. Voice 29, 712–717 (2015).

4. Delcor, N. S. et al. Condições de trabalho e saúde dos professores da rede particular de ensino de Vitória da Conquista, Bahia, Brasil. Cad. Saúde Pública 20, 187–196 (2004).

5. Roy, N., Merrill, R. M., Thibeault, S., Gray, S. D. & Smith, E. M. Voice disorders in teachers and the general population. J. Speech Lang. Hear. Res. 47, 542–551 (2004).

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Innovative Speech-Based Deep Learning Approaches for Parkinson’s Disease Classification: A Systematic Review;Applied Sciences;2024-09-04

2. Analyzing Wav2Vec 1.0 Embeddings for Cross-Database Parkinson’s Disease Detection and Speech Features Extraction;Sensors;2024-08-26

3. Patho VoiceAI: Classifying Pathology Types in Human Voices;2024 Fifteenth International Conference on Ubiquitous and Future Networks (ICUFN);2024-07-02

4. Pathological voice classification system based on CNN-BiLSTM network using speech enhancement and multi-stream approach;International Journal of Speech Technology;2024-06

5. MFCC in audio signal processing for voice disorder: a review;Multimedia Tools and Applications;2024-04-27