Abstract
Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks (CNNs) and various two-dimensional (2D) speech signal representations. To reduce the computational cost and not resign from the 2D representation-based approach, a strategy for threshold-based averaging of the Lombard effect detection results is introduced. The pseudocode of the averaging process is also included. A series of experiments are performed to determine the most effective network structure and the 2D speech signal representation. Investigations are carried out on German and Polish recordings containing Lombard speech. All 2D signal speech representations are tested with and without augmentation. Augmentation means using the alpha channel to store additional data: gender of the speaker, F0 frequency, and first two MFCCs. The experimental results show that Lombard and neutral speech recordings can clearly be discerned, which is done with high detection accuracy. It is also demonstrated that the proposed speech detection process is capable of working in near real-time. These are the key contributions of this work.
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference64 articles.
1. Le signe de l’elevation de la voix;Lombard;Ann. Mal. De L’Oreille Et Du Larynx,1911
2. The influence of acoustics on speech production: A noise-induced stress phenomenon known as the Lombard reflex;Junqua;Speech Commun.,1996
3. The Lombard sign as a function of age and task;Amazi;J. Speech Lang. Hear. Res.,1982
4. Khan, M.N., and Naseer, F. (2020, January 3–5). IoT based university garbage monitoring system for healthy environment for students. Proceedings of the 14th International Conference on Semantic Computing (ICSC), San Diego, CA, USA.
5. Neural network predictive control of vibrations in tall structure: An experimental controlled vision;Jamil;Comput. Electr. Eng.,2021
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Noisy Phoneme Recognition Using 2D Convolution Neural Network;2023 IEEE 10th Jubilee Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE);2023-04-27