Unsupervised Learning of Total Variability Embedding for Speaker Verification with Random Digit Strings-Reference-Cited by-同舟云学术

Unsupervised Learning of Total Variability Embedding for Speaker Verification with Random Digit Strings

Published:2019-04-17 Issue:8 Volume:9 Page:1597
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Kang Woo Hyun^ORCID,Kim Nam Soo

Abstract

Recently, the increasing demand for voice-based authentication systems has encouraged researchers to investigate methods for verifying users with short randomized pass-phrases with constrained vocabulary. The conventional i-vector framework, which has been proven to be a state-of-the-art utterance-level feature extraction technique for speaker verification, is not considered to be an optimal method for this task since it is known to suffer from severe performance degradation when dealing with short-duration speech utterances. More recent approaches that implement deep-learning techniques for embedding the speaker variability in a non-linear fashion have shown impressive performance in various speaker verification tasks. However, since most of these techniques are trained in a supervised manner, which requires speaker labels for the training data, it is difficult to use them when a scarce amount of labeled data is available for training. In this paper, we propose a novel technique for extracting an i-vector-like feature based on the variational autoencoder (VAE), which is trained in an unsupervised manner to obtain a latent variable representing the variability within a Gaussian mixture model (GMM) distribution. The proposed framework is compared with the conventional i-vector method using the TIDIGITS dataset. Experimental results showed that the proposed method could cope with the performance deterioration caused by the short duration. Furthermore, the performance of the proposed approach improved significantly when applied in conjunction with the conventional i-vector framework.

Funder

Samsung Research Funding Center of Samsung Electronics

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/9/8/1597/pdf

Reference34 articles.

1. Speaker Recognition by Machines and Humans: A tutorial review

2. Support vector machines using GMM supervectors for speaker verification

3. Eigenvoice modeling with sparse training data

4. Front-End Factor Analysis for Speaker Verification

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Variational autoencoder for prosody‐based speaker recognition;ETRI Journal;2023-01-16

2. Acoustic Signal Target Recognition Using Improved Clustering Autoencoder;2021 33rd Chinese Control and Decision Conference (CCDC);2021-05-22

3. Special Issue on Advanced Biometrics with Deep Learning;Applied Sciences;2020-06-28

4. Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification;IEEE Access;2020

5. Adversarially Learned Total Variability Embedding for Speaker Recognition with Random Digit Strings;Sensors;2019-10-30