Affiliation:
1. Chengdu University of Information Technology
Abstract
Abstract
In this paper, we leverage the generative adversarial mechanism and multi-task optimization strategy to propose an architecture to enhance the accuracy of speaker verification. The proposed model can recognize both speaker's original features and their reconstructed features on the basis of the variational auto- encoder (VAE) and generative adversarial network (GAN). We term our method Multitasking Variational Autoencoder Generative Adversarial Networks (MTVAEGAN). The encoder in VAE aims at extracting target speech features and classifying to a specific target. The purpose of the generator in GAN is to generate perturbations that could make the target network misclassified as a specific target, while simultaneously fooling the discriminators by treating the adversarial examples as a beguine example. The discriminator exists to distinguish crafted examples from genuine samples. The classification is to classify the reconstructed data and drive the generator to produce more precise speech features. Experiments on short utterances demonstrates that MTVAEGAN increases the verification accuracy (ACC) by 11.52% (relatively) and 2.78% (relatively) over conventional 3DCNN method and MTGAN method respectively.
Publisher
Research Square Platform LLC