Restricted Boltzmann Machine Vectors for Speaker Clustering and Tracking Tasks in TV Broadcast Shows-Reference-Cited by-同舟云学术

Restricted Boltzmann Machine Vectors for Speaker Clustering and Tracking Tasks in TV Broadcast Shows

Published:2019-07-09 Issue:13 Volume:9 Page:2761
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Khan Umair^ORCID,Safari Pooyan^ORCID,Hernando Javier^ORCID

Abstract

Restricted Boltzmann Machines (RBMs) have shown success in both the front-end and backend of speaker verification systems. In this paper, we propose applying RBMs to the front-end for the tasks of speaker clustering and speaker tracking in TV broadcast shows. RBMs are trained to transform utterances into a vector based representation. Because of the lack of data for a test speaker, we propose RBM adaptation to a global model. First, the global model—which is referred to as universal RBM—is trained with all the available background data. Then an adapted RBM model is trained with the data of each test speaker. The visible to hidden weight matrices of the adapted models are concatenated along with the bias vectors and are whitened to generate the vector representation of speakers. These vectors, referred to as RBM vectors, were shown to preserve speaker-specific information and are used in the tasks of speaker clustering and speaker tracking. The evaluation was performed on the audio recordings of Catalan TV Broadcast shows. The experimental results show that our proposed speaker clustering system gained up to 12% relative improvement, in terms of Equal Impurity (EI), over the baseline system. On the other hand, in the task of speaker tracking, our system has a relative improvement of 11% and 7% compared to the baseline system using cosine and Probabilistic Linear Discriminant Analysis (PLDA) scoring, respectively.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/9/13/2761/pdf

Reference46 articles.

1. Deep Neural Network Approaches to Speaker and Language Recognition

2. Learning Speaker-Specific Characteristics With a Deep Neural Architecture

3. Deep neural networks for extracting baum-welch statistics for speaker recognition;Kenny;Proc. Odyssey,2014

4. Deep feature for text-dependent speaker verification

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Investigating Activation Functions to Enhance Speaker Identification with LSTM Networks;2023 26th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA);2023-12-04

2. Demagnetization Fault Diagnosis of a PMSM Using Auto-Encoder and K-Means Clustering;Energies;2020-08-30

3. I-Vector Transformation Using K-Nearest Neighbors for Speaker Verification;ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2020-05

4. Editorial for Special Issue “IberSPEECH2018: Speech and Language Technologies for Iberian Languages”;Applied Sciences;2020-01-04