Affiliation:
1. Computer Engineering Department, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia
Abstract
The rapid momentum of deep neural networks (DNNs) in recent years has yielded state-of-the-art performance in various machine-learning tasks using speaker identification systems. Speaker identification is based on the speech signals and the features that can be extracted from them. In this article, we proposed a speaker identification system using the developed DNNs models. The system is based on the acoustic and prosodic features of the speech signal, such as pitch frequency (vocal cords vibration rate), energy (loudness of speech), their derivations, and any additional acoustic and prosodic features. Additionally, the article investigates the existing recurrent neural networks (RNNs) models and adapts them to design a speaker identification system using the public YOHO LDC dataset. The average accuracy of the system was 91.93% in the best experiment for speaker identification. Furthermore, this paper helps uncover reasons for analyzing speakers and tokens yielding major errors to increase the system’s robustness regarding feature selection and system tune-up.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference34 articles.
1. Kacur, J., and Truchly, P. (2015, January 28–30). Acoustic and auxiliary speech features for speaker identification system. Proceedings of the 2015 57th International Symposium ELMAR (ELMAR), Zadar, Croatia.
2. Bharali, S.S., and Kalita, S.K. (2017, January 22–24). Speaker identification using vector quantization and I-vector with reference to Assamese language. Proceedings of the 2017 International Conference on Wireless Communications, Signal Processing and Networking, WiSPNET 2017, Chennai, India.
3. HMM-based phrase-independent i-vector extractor for text-dependent speaker verification;Zeinali;IEEE/ACM Trans. Audio Speech Lang. Process,2017
4. Chang, J., and Wang, D. (2017, January 5–9). Robust speaker recognition based on DNN/i-vectors and speech separation. Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing–Proceedings, New Orleans, LA, USA.
5. (2023, June 25). YOHO Speaker Verification–Linguistic Data Consortium. Available online: https://catalog.ldc.upenn.edu/LDC94S16.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献