An Analysis of the Short Utterance Problem for Speaker Characterization-Reference-Cited by-同舟云学术

An Analysis of the Short Utterance Problem for Speaker Characterization

Published:2019-09-05 Issue:18 Volume:9 Page:3697
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Viñals Ignacio^ORCID,Ortega Alfonso^ORCID,Miguel Antonio,Lleida Eduardo

Abstract

Speaker characterization has always been conditioned by the length of the evaluated utterances. Despite performing well with large amounts of audio, significant degradations in performance are obtained when short utterances are considered. In this work we present an analysis of the short utterance problem providing an alternative point of view. From our perspective the performance in the evaluation of short utterances is highly influenced by the phonetic similarity between enrollment and test utterances. Both enrollment and test should contain similar phonemes to properly discriminate, being degraded otherwise. In this study we also interpret short utterances as incomplete long utterances where some acoustic units are either unbalanced or just missing. These missing units are responsible for the speaker representations to be unreliable. These unreliable representations are biased with respect to the reference counterparts, obtained from long utterances. These undesired shifts increase the intra-speaker variability, causing a significant loss of performance. According to our experiments, short utterances (3–60 s) can perform as accurate as if long utterances were involved by just reassuring the phonetic distributions. This analysis is determined by the current embedding extraction approach, based on the accumulation of local short-time information. Thus it is applicable to most of the state-of-the-art embeddings, including traditional i-vectors and Deep Neural Network (DNN) xvectors.

Funder

Ministerio de Economia, Industria y Competitividad

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/9/18/3697/pdf

Reference37 articles.

1. Fifty years of progress in speech and speaker recognition

2. Evaluation of a vector quantization talker recognition system in text independent and text dependent modes

3. Robust text-independent speaker identification using Gaussian mixture speaker models

4. Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms;Kenny,2005

5. Front-End Factor Analysis for Speaker Verification

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comparison of Modern Deep Learning Models for Speaker Verification;Applied Sciences;2024-02-06

2. A short utterance speaker recognition method with improved cepstrum–CNN;SN Applied Sciences;2022-11-22

3. Enhancing Speech Privacy with Slicing;Interspeech 2022;2022-09-18

4. Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition;IEEE Transactions on Dependable and Secure Computing;2022

5. The Domain Mismatch Problem in the Broadcast Speaker Attribution Task;Applied Sciences;2021-09-14