A Deep Neural Network Model for Speaker Identification-Reference-Cited by-同舟云学术

A Deep Neural Network Model for Speaker Identification

Published:2021-04-16 Issue:8 Volume:11 Page:3603
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Ye Feng^ORCID,Yang Jun

Abstract

Speaker identification is a classification task which aims to identify a subject from a given time-series sequential data. Since the speech signal is a continuous one-dimensional time series, most of the current research methods are based on convolutional neural network (CNN) or recurrent neural network (RNN). Indeed, these methods perform well in many tasks, but there is no attempt to combine these two network models to study the speaker identification task. Due to the spectrogram that a speech signal contains, the spatial features of voiceprint (which corresponds to the voice spectrum) and CNN are effective for spatial feature extraction (which corresponds to modeling spectral correlations in acoustic features). At the same time, the speech signal is in a time series, and deep RNN can better represent long utterances than shallow networks. Considering the advantage of gated recurrent unit (GRU) (compared with traditional RNN) in the segmentation of sequence data, we decide to use stacked GRU layers in our model for frame-level feature extraction. In this paper, we propose a deep neural network (DNN) model based on a two-dimensional convolutional neural network (2-D CNN) and gated recurrent unit (GRU) for speaker identification. In the network model design, the convolutional layer is used for voiceprint feature extraction and reduces dimensionality in both the time and frequency domains, allowing for faster GRU layer computation. In addition, the stacked GRU recurrent network layers can learn a speaker’s acoustic features. During this research, we tried to use various neural network structures, including 2-D CNN, deep RNN, and deep LSTM. The above network models were evaluated on the Aishell-1 speech dataset. The experimental results showed that our proposed DNN model, which we call deep GRU, achieved a high recognition accuracy of 98.96%. At the same time, the results also demonstrate the effectiveness of the proposed deep GRU network model versus other models for speaker identification. Through further optimization, this method could be applied to other research similar to the study of speaker identification.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/11/8/3603/pdf

Reference57 articles.

1. An Overview of Text-Independent Speaker Recognition: From Features to Supervectors;Tomi;Speech Commun.,2010

2. Recent Advances in Speaker Recognition;Sadaoki;Pattern Recognit. Lett.,1997

3. Speaker recognition: a tutorial

4. Robust text-independent speaker identification using Gaussian mixture speaker models

Cited by 61 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optimizing speaker identification: a comprehensive study with deep neural networks;STUDIES IN ENGINEERING AND EXACT SCIENCES;2024-09-09

2. Text-independent voiceprint recognition via compact embedding of dilated deep convolutional neural networks;Computers and Electrical Engineering;2024-09

3. An optimized attention based hybrid deep learning framework for automatic speaker identification from speech signals;Multimedia Tools and Applications;2024-08-23

4. An effective speaker adaption using deep learning for the identification of speakers in emergency situation;Multimedia Tools and Applications;2024-07-02

5. Empowering Speaker Verification with Deep Convolutional Neural Network Vectors;Studies in Informatics and Control;2024-06-27