Time frequency domain deep CNN for automatic background classification in speech signals-Reference-Cited by-同舟云学术

Time frequency domain deep CNN for automatic background classification in speech signals

Published:2023-09 Issue:3 Volume:26 Page:695-706
ISSN:1381-2416
Container-title:International Journal of Speech Technology
language:en
Short-container-title:Int J Speech Technol

Author:

Yakkati Rakesh Reddy,Yeduri Sreenivasa Reddy,Tripathy Rajesh Kumar,Cenkeramaddi Linga Reddy^ORCID

Abstract

AbstractMany application areas, such as background identification, predictive maintenance in industrial applications, smart home applications, assisting deaf people with their daily activities and indexing and retrieval of content-based multimedia, etc., use automatic background classification using speech signals. It is challenging to predict the background environment accurately from speech signal information. Thus, a novel synchrosqueezed wavelet transform (SWT)-based deep learning (DL) approach is proposed in this paper for automatically classifying background information embedded in speech signals. Here, SWT is incorporated to obtain the time-frequency plot from the speech signals. These time-frequency signals are then fed to a deep convolutional neural network (DCNN) to classify background information embedded in speech signals. The proposed DCNN model consists of three convolution layers, one batch-normalization layer, three max-pooling layers, one dropout layer, and one fully connected layer. The proposed method is tested using various background signals embedded in speech signals, such as airport, airplane, drone, street, babble, car, helicopter, exhibition, station, restaurant, and train sounds. According to the results, the proposed SWT-based DCNN approach has an overall classification accuracy of 97.96 (± 0.53)% to classify background information embedded in speech signals. Finally, the performance of the proposed approach is compared to the existing methods.

Funder

Norges Forskningsråd

University of Agder

Publisher

Springer Science and Business Media LLC

Subject

Computer Vision and Pattern Recognition,Linguistics and Language,Human-Computer Interaction,Language and Linguistics,Software

Link

https://link.springer.com/content/pdf/10.1007/s10772-023-10042-z.pdf

Reference27 articles.

1. Al-Emadi, S., Al-Ali, A., Mohammad, A., & Al-Ali, A. (2019). Audio based drone detection and identification using deep learning. In Proceedings of international wireless communications & mobile computing conference, Tangier, Morocco, June 2019 (pp. 459–464).

2. Cao, X., Togneri, R., Zhang, X., & Yu, Y. (2019). Convolutional neural network with second-order pooling for underwater target classification. IEEE Sensors Journal, 19(8), 3058–3066. https://doi.org/10.1109/JSEN.2018.2886368

3. Chaki, J. (2021). Pattern analysis based acoustic signal processing: A survey of the state-of-art. International Journal of Speech Technology, 24(4), 913–955.

4. Das, N., Chakraborty, S., Chaki, J., Padhy, N., & Dey, N. (2021). Fundamentals, present and future perspectives of speech enhancement. International Journal of Speech Technology, 24(4), 883–901.

5. Daubechies, I., Lu, J., & Wu, H. T. (2011). Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool. Applied and Computational Harmonic Analysis, 30(2), 243–261.