Analysis and Investigation of Speaker Identification Problems Using Deep Learning Networks and the YOHO English Speech Dataset-Reference-Cited by-同舟云学术

Analysis and Investigation of Speaker Identification Problems Using Deep Learning Networks and the YOHO English Speech Dataset

Published:2023-08-24 Issue:17 Volume:13 Page:9567
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Almarshady Nourah M.¹^ORCID,Alashban Adal A.¹^ORCID,Alotaibi Yousef A.¹^ORCID

Affiliation:

1. Computer Engineering Department, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia

Abstract

The rapid momentum of deep neural networks (DNNs) in recent years has yielded state-of-the-art performance in various machine-learning tasks using speaker identification systems. Speaker identification is based on the speech signals and the features that can be extracted from them. In this article, we proposed a speaker identification system using the developed DNNs models. The system is based on the acoustic and prosodic features of the speech signal, such as pitch frequency (vocal cords vibration rate), energy (loudness of speech), their derivations, and any additional acoustic and prosodic features. Additionally, the article investigates the existing recurrent neural networks (RNNs) models and adapts them to design a speaker identification system using the public YOHO LDC dataset. The average accuracy of the system was 91.93% in the best experiment for speaker identification. Furthermore, this paper helps uncover reasons for analyzing speakers and tokens yielding major errors to increase the system’s robustness regarding feature selection and system tune-up.

Funder

King Saud University

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/17/9567/pdf

Reference34 articles.

1. Kacur, J., and Truchly, P. (2015, January 28–30). Acoustic and auxiliary speech features for speaker identification system. Proceedings of the 2015 57th International Symposium ELMAR (ELMAR), Zadar, Croatia.

2. Bharali, S.S., and Kalita, S.K. (2017, January 22–24). Speaker identification using vector quantization and I-vector with reference to Assamese language. Proceedings of the 2017 International Conference on Wireless Communications, Signal Processing and Networking, WiSPNET 2017, Chennai, India.

3. HMM-based phrase-independent i-vector extractor for text-dependent speaker verification;Zeinali;IEEE/ACM Trans. Audio Speech Lang. Process,2017

4. Chang, J., and Wang, D. (2017, January 5–9). Robust speaker recognition based on DNN/i-vectors and speech separation. Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing–Proceedings, New Orleans, LA, USA.

5. (2023, June 25). YOHO Speaker Verification–Linguistic Data Consortium. Available online: https://catalog.ldc.upenn.edu/LDC94S16.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Identification of true speakers from disguised voices in anti-forensic scenarios using an efficient framework;Signal, Image and Video Processing;2024-07-23

2. An effective speaker adaption using deep learning for the identification of speakers in emergency situation;Multimedia Tools and Applications;2024-07-02

3. Speaker Identification Using CNN-LSTM Model on RAVDESS Dataset: A Deep Learning Approach;2023 4th International Conference on Intelligent Technologies (CONIT);2024-06-21

4. Speaker Identification Using Hybrid Subspace, Deep Learning and Machine Learning Classifiers;2024