Speaker Recognition Based on Fusion of a Deep and Shallow Recombination Gaussian Supervector-Reference-Cited by-同舟云学术

Speaker Recognition Based on Fusion of a Deep and Shallow Recombination Gaussian Supervector

Published:2020-12-25 Issue:1 Volume:10 Page:20
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Sun Linhui^ORCID,Bu Yunyi,Zou Bo,Fu Sheng,Li Pingan

Abstract

Extracting speaker’s personalized feature parameters is vital for speaker recognition. Only one kind of feature cannot fully reflect the speaker’s personality information. In order to represent the speaker’s identity more comprehensively and improve speaker recognition rate, we propose a speaker recognition method based on the fusion feature of a deep and shallow recombination Gaussian supervector. In this method, the deep bottleneck features are first extracted by Deep Neural Network (DNN), which are used for the input of the Gaussian Mixture Model (GMM) to obtain the deep Gaussian supervector. On the other hand, we input the Mel-Frequency Cepstral Coefficient (MFCC) to GMM directly to extract the traditional Gaussian supervector. Finally, the two categories of features are combined in the form of horizontal dimension augmentation. In addition, when the number of speakers to be recognized increases, in order to prevent the system recognition rate from falling sharply, we introduce the optimization algorithm to find the optimal weight before the feature fusion. The experiment results indicate that the speaker recognition rate based on the feature which is fused directly can reach 98.75%, which is 5% and 0.62% higher than the traditional feature and deep bottleneck feature, respectively. When the number of speakers increases, the fusion feature based on optimized weight coefficients can improve the recognition rate by 0.81%. It is validated that our proposed fusion method can effectively consider the complementarity of the different types of features and improve the speaker recognition rate.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/10/1/20/pdf

Reference39 articles.

1. SVM classification for fake biometric detection using image quality assessment: Application to iris, face and palm print

2. Footprint Recognition with Principal Component Analysis and Independent Component Analysis

3. Image Quality Assessment for Fake Biometric Detection: Application to Iris, Fingerprint, and Face Recognition

4. A Study on Speech Recognition Control for a Surgical Robot

5. Dynamic Fixed-Point Arithmetic Design of Embedded SVM-Based Speaker Identification System

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Speaker identification using hybrid neural network support vector machine classifier;International Journal of Speech Technology;2022-11-30