Target Speaker Extraction by Fusing Voiceprint Features-Reference-Cited by-同舟云学术

Target Speaker Extraction by Fusing Voiceprint Features

Published:2022-08-15 Issue:16 Volume:12 Page:8152
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Cheng Shidan,Shen Ying^ORCID,Wang Dongqing

Abstract

It is a critical problem to accurately separate clean speech in the multispeaker scenario for different speakers. However, in most cases, smart devices such as smart phones interact with only one specific user. As a consequence, the speech separation models adopted by these devices only have to extract the target speaker’s speech. A voiceprint, which reflects the speaker’s voice characteristics, provides prior knowledge for the target speech separation. Therefore, how to efficiently integrate voiceprint features into the existing speech separation models to improve their performance for the target speech separation is an interesting problem not fully explored. This paper attempts to solve this issue to some extent and our contributions are as follows. First, two different voiceprint features (i.e., MFCCs and d-vector) are explored in the performance enhancement for three speech separation models. Second, three different feature fusion methods are proposed to efficiently fuse the voiceprint features with the magnitude spectrograms originally used in the speech separation models. Third, a target speech extraction method which utilizes the fused features is proposed for two speaker-independent models. Experiments demonstrate that the speech separation models integrated with voiceprint features using three feature fusion methods can effectively extract the target speaker’s speech.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/16/8152/pdf

Reference38 articles.

1. Robust Automatic Speech Recognition: A Bridge to Practical Applications;Li,2015

2. New Era for Robust Speech Recognition;Watanabe,2017

3. A Study of Learning Based Beamforming Methods for Speech Recognition;Xiao,2016

4. Target speaker extraction for overlapped multi-talker speaker verification;Rao;arXiv,2019

5. Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge;Sell;Proceedings of the INTERSPEECH,2018

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Voiceprint Recognition under Cross-Scenario Conditions Using Perceptual Wavelet Packet Entropy-Guided Efficient-Channel-Attention–Res2Net–Time-Delay-Neural-Network Model;Mathematics;2023-10-09

2. Audio–Visual Sound Source Localization and Tracking Based on Mobile Robot for The Cocktail Party Problem;Applied Sciences;2023-05-15

3. An Electroglottograph Auxiliary Neural Network for Target Speaker Extraction;Applied Sciences;2022-12-29