The Domain Mismatch Problem in the Broadcast Speaker Attribution Task-Reference-Cited by-同舟云学术

The Domain Mismatch Problem in the Broadcast Speaker Attribution Task

Published:2021-09-14 Issue:18 Volume:11 Page:8521
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Viñals Ignacio^ORCID,Ortega Alfonso^ORCID,Miguel Antonio^ORCID,Lleida Eduardo^ORCID

Abstract

The demand of high-quality metadata for the available multimedia content requires the development of new techniques able to correctly identify more and more information, including the speaker information. The task known as speaker attribution aims at identifying all or part of the speakers in the audio under analysis. In this work, we carry out a study of the speaker attribution problem in the broadcast domain. Through our experiments, we illustrate the positive impact of diarization on the final performance. Additionally, we show the influence of the variability present in broadcast data, depicting the broadcast domain as a collection of subdomains with particular characteristics. Taking these two factors into account, we also propose alternative approximations robust against domain mismatch. These approximations include a semisupervised alternative as well as a totally unsupervised new hybrid solution fusing diarization and speaker assignment. Thanks to these two approximations, our performance is boosted around a relative 50%. The analysis has been carried out using the corpus for the Albayzín 2020 challenge, a diarization and speaker attribution evaluation working with broadcast data. These data, provided by Radio Televisión Española (RTVE), the Spanish public Radio and TV Corporation, include multiple shows and genres to analyze the impact of new speech technologies in real-world scenarios.

Funder

Spanish Ministry of Economy and Competitiveness and the European Social Fund

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/11/18/8521/pdf

Reference56 articles.

1. Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms;Kenny,2005

2. Front-End Factor Analysis for Speaker Verification

3. Deep neural network-based speaker embeddings for end-to-end speaker verification

4. Speaker diarization and linking of large corpora

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An Online Diarization Approach for Streaming Applications Based on Tree-Clustering and Bayesian Resegmentation;Text, Speech, and Dialogue;2023

2. Multimodal Diarization Systems by Training Enrollment Models as Identity Representations;Applied Sciences;2022-01-21