Affiliation:
1. School of Physics and Electronic Information Yantai University Yantai China
2. School of Computer and Control Engineering Yantai University Yantai China
Abstract
AbstractAcoustic echo cancellation (AEC) methods aim to suppress the acoustic coupling for hands‐free speech communication. Traditional AEC works by identifying the acoustic impulse response using adaptive algorithms. With recent research advances, deep learning has become an attractive choice for AEC. This paper introduces a two‐stage bidirectional long short term memory (TS‐BLSTM) framework, incorporating multi‐head self‐attention mechanisms after each BLSTM block. This is aimed at better capturing contextual information and further enhancing ability of the model to handle complex acoustic scenarios. The BLSTM blocks are utilized to aggregate magnitude spectrum information, modelling both time and frequency dependencies. Additionally, dilation convolution is introduced to broaden the range of information in each convolution output. The magnitude decoder estimates a mask for the input, resulting in the generation of an estimated magnitude spectrum for near‐end speech. Experimental results indicate that the proposed method achieves promising outcomes.
Funder
Natural Science Foundation of Shandong Province
Publisher
Institution of Engineering and Technology (IET)