Abstract
AbstractThe presence of degradations in speech signals, which causes acoustic mismatch between training and operating conditions, deteriorates the performance of many speech-based systems. A variety of enhancement techniques have been developed to compensate the acoustic mismatch in speech-based applications. To apply these signal enhancement techniques, however, it is necessary to know prior information about the presence and the type of degradations in speech signals. In this paper, we propose a new convolutional neural network (CNN)-based approach to automatically identify the major types of degradations commonly encountered in speech-based applications, namely additive noise, nonlinear distortion, and reverberation. In this approach, a set of parallel CNNs, each detecting a certain degradation type, is applied to the log-mel spectrogram of audio signals. Experimental results using two different speech types, namely pathological voice and normal running speech, show the effectiveness of the proposed method in detecting the presence and the type of degradations in speech signals which outperforms the state-of-the-art method. Using the score weighted class activation mapping, we provide a visual analysis of how the network makes decision for identifying different types of degradation in speech signals by highlighting the regions of the log-mel spectrogram which are more influential to the target degradation.
Funder
Danmarks Frie Forskningsfond
Publisher
Springer Science and Business Media LLC
Subject
Electrical and Electronic Engineering,Acoustics and Ultrasonics
Reference37 articles.
1. S. Ghai, R. Sinha, Adaptive feature truncation to address acoustic mismatch in automatic recognition of children’s speech. APSIPA Trans. Signal Inf. Process.5:, 1–13 (2016).
2. A. Alexander, F. Botti, D. Dessimoz, A. Drygajlo, The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications. Forensic Sci. Int.146:, 95–99 (2004).
3. V. Mitra, A. Tsiartas, E. Shriberg, in International Conference on Acoustics, Speech and Signal Processing (ICASSP). Noise and reverberation effects on depression detection from speech, (2016), pp. 5795–5799.
4. A. H. Poorjam, M. S. Kavalekalam, L. Shi, J. P. Raykov, J. R. Jensen, M. A. Little, M. G. Christensen, Automatic quality control and enhancement for voice-based remote Parkinson’s disease detection. Speech Commun.127:, 1–16 (2021).
5. M. Fakhry, A. H. Poorjam, M. G. Christensen, in European Signal Processing Conference (EUSIPCO). Speech enhancement by classification of noisy signals decomposed using NMF and Wiener filtering, (2018), pp. 16–20.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献