Few-shot short utterance speaker verification using meta-learning-Reference-Cited by-同舟云学术

Few-shot short utterance speaker verification using meta-learning

Published:2023-04-21 Issue: Volume:9 Page:e1276
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Wang Weijie¹,Zhao Hong¹,Yang Yikun²,Chang YouKang¹,You Haojie¹

Affiliation:

1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou, China

2. School of Information Science & Engineering, Lanzhou University, Lanzhou, China

Abstract

Short utterance speaker verification (SV) in the actual application is the task of accepting or rejecting the identity claim of a speaker based on a few enrollment utterances. Traditional methods have used deep neural networks to extract speaker representations for verification. Recently, several meta-learning approaches have learned a deep distance metric to distinguish speakers within meta-tasks. Among them, a prototypical network learns a metric space that may be used to compute the distance to the prototype center of speakers, in order to classify speaker identity. We use emphasized channel attention, propagation and aggregation in TDNN (ECAPA-TDNN) to implement the necessary function for the prototypical network, which is a nonlinear mapping from the input space to the metric space for either few-shot SV task. In addition, optimizing only for speakers in given meta-tasks cannot be sufficient to learn distinctive speaker features. Thus, we used an episodic training strategy, in which the classes of the support and query sets correspond to the classes of the entire training set, further improving the model performance. The proposed model outperforms comparison models on the VoxCeleb1 dataset and has a wide range of practical applications.

Funder

The National Science Foundation of China

The Science and Technology project of Gansu Province

The Gansu Province Department of Education: Outstanding Graduate Student “Innovation Star” Project

Publisher

PeerJ

Subject

General Computer Science

Link

https://peerj.com/articles/cs-1276.pdf

Reference45 articles.

1. Automatic speaker verification from affective speech using Gaussian mixture model based estimation of neutral speech characteristics;Avila;Speech Communication,2021

2. Speaker recognition based on deep learning: an overview;Bai;Neural Networks,2021

3. Meta-learning with task-adaptive loss function for few-shot learning;Baik,2021

4. Exploring the encoding layer and loss function in end-to-end speaker and language recognition system;Cai,2018

5. MGNet: mutual-guidance network for few-shot semantic segmentation;Chang;Engineering Applications of Artificial Intelligence,2022

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Design of intelligent behavior analysis software based on speaker identity classification algorithm in microgrid mode;Advanced Control for Applications;2024-04-18

2. Multi-task learning for X-vector based speaker recognition;International Journal of Speech Technology;2023-10-28