Abstract
It is a critical problem to accurately separate clean speech in the multispeaker scenario for different speakers. However, in most cases, smart devices such as smart phones interact with only one specific user. As a consequence, the speech separation models adopted by these devices only have to extract the target speaker’s speech. A voiceprint, which reflects the speaker’s voice characteristics, provides prior knowledge for the target speech separation. Therefore, how to efficiently integrate voiceprint features into the existing speech separation models to improve their performance for the target speech separation is an interesting problem not fully explored. This paper attempts to solve this issue to some extent and our contributions are as follows. First, two different voiceprint features (i.e., MFCCs and d-vector) are explored in the performance enhancement for three speech separation models. Second, three different feature fusion methods are proposed to efficiently fuse the voiceprint features with the magnitude spectrograms originally used in the speech separation models. Third, a target speech extraction method which utilizes the fused features is proposed for two speaker-independent models. Experiments demonstrate that the speech separation models integrated with voiceprint features using three feature fusion methods can effectively extract the target speaker’s speech.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference38 articles.
1. Robust Automatic Speech Recognition: A Bridge to Practical Applications;Li,2015
2. New Era for Robust Speech Recognition;Watanabe,2017
3. A Study of Learning Based Beamforming Methods for Speech Recognition;Xiao,2016
4. Target speaker extraction for overlapped multi-talker speaker verification;Rao;arXiv,2019
5. Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge;Sell;Proceedings of the INTERSPEECH,2018
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献