Analysis of Deep Generative Model Impact on Feature Extraction and Dimension Reduction for Short Utterance Text-Independent Speaker Verification-Reference-Cited by-同舟云学术

Analysis of Deep Generative Model Impact on Feature Extraction and Dimension Reduction for Short Utterance Text-Independent Speaker Verification

Published:2024-04-13 Issue:7 Volume:43 Page:4547-4564
ISSN:0278-081X
Container-title:Circuits, Systems, and Signal Processing
language:en
Short-container-title:Circuits Syst Signal Process

Author:

Farhadipour Aref^ORCID,Veisi Hadi^ORCID

Abstract

AbstractSpeaker verification is a biometric-based method for individual authentication. However, there are still several challenging problems in achieving high performance in short utterance text-independent conditions, maybe for weak speaker-specific features. Recently, deep learning algorithms have been used extensively in speech processing. This manuscript uses a deep belief network (DBN) as a deep generative method for feature extraction in speaker verification systems. This study aims to show the impact of using the proposed method in various challenging issues, including short utterances, text independence, language variation, and large-scale speaker verification. The proposed DBN uses MFCC as input and tries to extract more efficient features. This new representation of speaker information is evaluated in two popular speaker verification systems: GMM-UBM and i-vector-PLDA methods. The results show that, for the i-vector-PLDA system, the proposed feature decreases the EER considerably from 15.24 to 10.97%. In another experiment, DBN is used to reduce feature dimension and achieves significant results in decreasing computational time and increasing system response speed. In a case study, all the evaluations are performed for 1270 speakers of the NIST SRE2008 dataset. We show deep belief networks can be used in state-of-the-art acoustic modeling methods and more challenging datasets.

Funder

University of Zurich

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s00034-024-02671-9.pdf

Reference59 articles.

1. M.P. Alvin, A. Martin, NIST speaker recognition evaluation chronicles. In: The Speaker and Language Recognition Workshop (ODYSSEY, 2004)

2. L Alzubaidi J Bai A Al-Sabaawi J Santamaría A Albahri BSN Al-dabbagh MA Fadhel M Manoufali J Zhang AH Al-Timemy 2023 A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications J. Big Data 10 46 127

3. Z Bai XL Zhang 2021 Speaker recognition based on deep learning: an overview Neural Netw. 140 65 99

4. A. Banerjee, A. Dubey, A. Menon, S. Nanda, G.C. Nandi, Speaker recognition using deep belief networks. arXiv:1805.08865 (2018)

5. I Bisio F Lavagetto C Garibotto A Sciarrone 2017 Speaker recognition exploiting D2D communications paradigm: performance evaluation of multiple observations approaches Mob. Netw. Appl. 22 1045 1057

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Design of a Digital Exhibition Service System Under the Deep Belief Network Models;IEEE Access;2024