Analyzing wav2vec embedding in Parkinson’s disease speech: A study on cross-database classification and regression tasks-Reference-Cited by-同舟云学术

Analyzing wav2vec embedding in Parkinson’s disease speech: A study on cross-database classification and regression tasks

Published:2024-04-12 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Klempir Ondrej^ORCID,Krupicka Radim^ORCID

Abstract

AbstractAdvancements in deep learning speech representations have facilitated the effective use of extensive datasets comprised of unlabeled speech signals, and have achieved success in modeling tasks associated with Parkinson’s disease (PD) with minimal annotated data. This study focuses on PD non-fine-tuned wav2vec 1.0 architecture. Utilizing features derived from wav2vec embedding, we develop machine learning models tailored for clinically relevant PD speech diagnosis tasks, such as cross-database classification and regression to predict demographic and articulation characteristics, for instance, modeling the subjects’ age and number of characters per second. The primary aim is to conduct feature importance analysis on both classification and regression tasks, investigating whether latent discrete speech representations in PD are shared across models, particularly for related tasks. The proposed wav2vec-based models were evaluated on PD versus healthy controls using three multi-language-task PD datasets. Results indicated that wav2vec accurately detected PD based on speech, outperforming feature extraction using mel-frequency cepstral coefficients in the proposed cross-database scenarios. Furthermore, wav2vec proved effective in regression, modeling various quantitative speech characteristics related to intelligibility and aging. Subsequent analysis of important features, obtained using scikit-learn feature importance built-in tools and the Shapley additive explanations method, examined the presence of significant overlaps between classification and regression models. The feature importance experiments discovered shared features across trained models, with increased sharing for related tasks, further suggesting that wav2vec contributes to improved generalizability. In conclusion, the study proposes wav2vec embedding as a promising step toward a speech-based universal model to assist in the evaluation of PD.

Publisher

Cold Spring Harbor Laboratory

Reference67 articles.

1. High-performance medicine: the convergence of human and artificial intelligence

2. Ching T , Himmelstein DS , Beaulieu-Jones BK , et al. (2018) Opportunities and obstacles for deep learning in biology and medicine. Journal of The Royal Society Interface 15

3. Shehab M , Abualigah L , Shambour Q , et al. (2022) Machine learning in medical applications: A review of state-of-the-art methods. Computers in Biology and Medicine 145

4. Sigcha L , Borzì L , Amato F , et al. (2023) Deep learning and wearable sensors for the diagnosis and monitoring of Parkinson’s disease: A systematic review. Expert Systems with Applications 229

5. Shaban M (2023) Deep Learning for Parkinson’s Disease Diagnosis: A Short Survey. Computers 12.