Author:
Anitha Mummireddygari ,N Ananda Reddy
Abstract
This study focuses on the development of an advanced speaker recognition system utilizing Convolutional Neural Networks (CNN) in conjunction with Mel Frequency Cepstral Coefficients (MFCC) for feature extraction and K Nearest Neighbor (KNN) for classification. The proposed system aims to improve accuracy by refining the fine-tuning layer within the CNN architecture. By leveraging the unique characteristics of human voice as a biometric identifier, the system extracts voice data features using MFCC, then employs CNN with triplet loss to generate 128-dimensional embeddings. These embeddings are subsequently classified using the KNN method. The system's performance was evaluated using 50 speakers from the TIMIT dataset and 60 speakers from live recordings made with a smartphone, demonstrating high accuracy. This study highlights the potential of combining CNN and MFCC for robust speaker recognition and suggests that future research could further enhance recognition accuracy by integrating multimodal biometric systems, which combine different types of biometric data for more comprehensive identification.
Reference25 articles.
1. R. Ryu, S. Yeom, S.-H. Kim, and D. Herbert, ‘‘Continuous multimodal biometric authentication schemes: A systematic review,’’ IEEEAccess, vol. 9, pp. 34541–34557, 2021, doi: 10.1109/ACCESS.2021.3061589.
2. M. M. Kabir, M. F. Mridha, J. Shin, I. Jahan, and A. Q. Ohi, ‘‘A survey of speaker recognition: Fundamental theories, recognition methodsand opportunities,’’ IEEE Access, vol. 9, pp. 79236–79263, 2021, doi:10.1109/ACCESS.2021.3084299.
3. Veridium Enterprise. (2019). How Your Biometric Data is Different FromYour Password—Veridium. [Online]. Available: https://veridiumid.com/case-studies/?_ga=2.182866598.1698325735.1656058419-1850179022.1656058419
4. C. Burt. (2019). More Than 4 in 5 Americans Support AirportBiometrics, Unisys Survey Shows—Biometric Update. [Online]. Available:https://www.biometricupdate.com/201906/more-than-4-in-5-americanssupport-airport-biometrics-unisys-survey-shows
5. X. Mu and C.-H. Min, ‘‘MFCC as features for speaker classification using machine learning,’’ in Proc. IEEE World AI IoTCongr. (AIIoT), Seattle, WA, USA, Jun. 2023, pp. 566–570, doi:10.1109/AIIoT58121.2023.10174566.