Affiliation:
1. Vishwakarama Institute of information Technilogy, Pune
2. Siddhant College of engineering Pune
Abstract
Abstract
This research paper explores the use of the VGGish pre-trained model for feature extraction in the context of speech enhancement. The objective is to investigate the effectiveness of VGGish in capturing relevant speech features that can be utilized to enhance speech quality and reduce noise interference. The experimentation is conducted on the MUSAN dataset, and the results demonstrate the capability of the VGGish model in extracting rich and discriminative features encompassing spectral, temporal, and perceptual characteristics of speech. These features are then employed in various speech enhancement techniques to improve speech intelligibility, enhance spectral clarity, and reduce artifacts caused by noise and distortions. Comparative analysis with traditional methods reveals the superior performance of the VGGish model in capturing a comprehensive representation of the speech signal, leading to better discrimination between speech and noise components. The findings highlight the potential of the VGGish model for speech enhancement applications, offering opportunities for improved communication systems, automatic speech recognition, and audio processing in diverse domains. Future research directions include optimizing the VGGish model for specific speech enhancement tasks, exploring novel feature fusion techniques, and integrating other deep learning architectures to further enhance system performance and flexibility. Overall, this research contributes to advancing speech processing and provides a foundation for enhancing speech quality, reducing noise interference, and improving the overall listening experience.
Publisher
Research Square Platform LLC
Reference23 articles.
1. Palahina, Elena; Gamcová, Mária; Gladišová, Iveta; Gamec, Ján; Palahin, Volodymyr (2018). Signal Detection in Correlated Non-Gaussian Noise Using Higher-Order Statistics. Circuits, Systems, and Signal Processing, 37(4), 1704–1723. doi:10.1007/s00034-017-0623-5
2. Akshat Srivastava,,Ajay Agarwal ,Nidhi Chahal ,Dilbag Singh ,Abeer Ali Alnuaim,Aseel Alhadlaq andHeung-No Lee,” Two-Way Feature Extraction for Speech Emotion Recognition Using Deep Learning;Apeksha Aggarwal;Sensors,2022
3. Asad Mahmood “Audio Classification with Pre-trained VGG-19 (Keras)” Apr 20, 2019
4. Mikolaj Kegler, Milos Cernak” Word-level Embeddings for Cross-Task Transfer Learning in Speech Processing”;Beckmann Pierre
5. Dominik Roblek “Learning audio representations via phase prediction”;Quitry FDC;Published,2019
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献