Affiliation:
1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou, China
2. School of Information Science & Engineering, Lanzhou University, Lanzhou, China
Abstract
Short utterance speaker verification (SV) in the actual application is the task of accepting or rejecting the identity claim of a speaker based on a few enrollment utterances. Traditional methods have used deep neural networks to extract speaker representations for verification. Recently, several meta-learning approaches have learned a deep distance metric to distinguish speakers within meta-tasks. Among them, a prototypical network learns a metric space that may be used to compute the distance to the prototype center of speakers, in order to classify speaker identity. We use emphasized channel attention, propagation and aggregation in TDNN (ECAPA-TDNN) to implement the necessary function for the prototypical network, which is a nonlinear mapping from the input space to the metric space for either few-shot SV task. In addition, optimizing only for speakers in given meta-tasks cannot be sufficient to learn distinctive speaker features. Thus, we used an episodic training strategy, in which the classes of the support and query sets correspond to the classes of the entire training set, further improving the model performance. The proposed model outperforms comparison models on the VoxCeleb1 dataset and has a wide range of practical applications.
Funder
The National Science Foundation of China
The Science and Technology project of Gansu Province
The Gansu Province Department of Education: Outstanding Graduate Student “Innovation Star” Project
Reference45 articles.
1. Automatic speaker verification from affective speech using Gaussian mixture model based estimation of neutral speech characteristics;Avila;Speech Communication,2021
2. Speaker recognition based on deep learning: an overview;Bai;Neural Networks,2021
3. Meta-learning with task-adaptive loss function for few-shot learning;Baik,2021
4. Exploring the encoding layer and loss function in end-to-end speaker and language recognition system;Cai,2018
5. MGNet: mutual-guidance network for few-shot semantic segmentation;Chang;Engineering Applications of Artificial Intelligence,2022
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献