Abstract
AbstractSpeaker Identification (SI) is the task of identifying an unknown speaker of an utterance by comparing the voice biometrics of the unknown speaker with previously stored and known speaker models. Although deep learning algorithms have been successful in different speech and speaker recognition systems, they are computationally expensive and require considerable run-time resources. This paper approaches this issue by proposing an optimized text-independent SI system based on convolutional neural networks (CNNs) that not only delivers accuracies on par with state-of-the-art benchmarks but also demands significantly fewer trainable parameters. The proposed system integrates an Enhanced Multi-Active Learner framework, which distributes the complexity of the learning task among an array of learners, with a novel SI approach in which speakers are identified based on a single sound segment of voice biometrics. Here, experiments were conducted with all 1881 VoxCeleb 1 and TIMIT speakers, and results were compared with the SI systems reported in the literature that were assessed on the same speakers’ data. Results indicate that first, the proposed system outperformed the benchmark systems’ performances by delivering up to 2.43% better top-1 accuracy, and second, it reduced the number of deep learning trainable parameters by up to 95%. The proposed SI could bring offline, large-scale speaker identification to low-end computing machines without specific deep learning hardware and make the technology more affordable.
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Hardware and Architecture,Media Technology,Software
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Forensic Perspective on Voice Biometrics and AI : A Review;International Journal of Scientific Research in Science and Technology;2024-09-04