An Overview of the IberSpeech-RTVE 2022 Challenges on Speech Technologies
-
Published:2023-07-25
Issue:15
Volume:13
Page:8577
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Lleida Eduardo1ORCID, Rodriguez-Fuentes Luis Javier2ORCID, Tejedor Javier3ORCID, Ortega Alfonso1ORCID, Miguel Antonio1ORCID, Bazán Virginia4ORCID, Pérez Carmen4ORCID, de Prada Alberto4ORCID, Penagarikano Mikel2ORCID, Varona Amparo2ORCID, Bordel Germán2ORCID, Torre-Toledano Doroteo5ORCID, Álvarez Aitor6ORCID, Arzelus Haritz6ORCID
Affiliation:
1. Vivolab, Aragon Institute for Engineering Research (I3A), University of Zaragoza, 50018 Zaragoza, Spain 2. Department of Electricity and Electronics, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), Barrio Sarriena, 48940 Leioa, Spain 3. Institute of Technology, Universidad San Pablo-CEU, CEU Universities, Urbanización Montepríncipe, 28668 Boadilla del Monte, Spain 4. Corporación Radiotelevisión Española, 28223 Madrid, Spain 5. AUDIAS, Electronic and Communication Technology Department, Escuela Politécnica Superior, Universidad Autónoma de Madrid, Av. Francisco Tomás y Valiente, 11, 28049 Madrid, Spain 6. Fundación Vicomtech, Basque Research and Technology Alliance (BRTA), Mikeletegi 57, 20009 Donostia-San Sebastián, Spain
Abstract
Evaluation campaigns provide a common framework with which the progress of speech technologies can be effectively measured. The aim of this paper is to present a detailed overview of the IberSpeech-RTVE 2022 Challenges, which were organized as part of the IberSpeech 2022 conference under the ongoing series of Albayzin evaluation campaigns. In the 2022 edition, four challenges were launched: (1) speech-to-text transcription; (2) speaker diarization and identity assignment; (3) text and speech alignment; and (4) search on speech. Different databases that cover different domains (e.g., broadcast news, conference talks, parliament sessions) were released for those challenges. The submitted systems also cover a wide range of speech processing methods, which include hidden Markov model-based approaches, end-to-end neural network-based methods, hybrid approaches, etc. This paper describes the databases, the tasks and the performance metrics used in the four challenges. It also provides the most relevant features of the submitted systems and briefly presents and discusses the obtained results. Despite employing state-of-the-art technology, the relatively poor performance attained in some of the challenges reveals that there is still room for improvement. This encourages us to carry on with the Albayzin evaluation campaigns in the coming years.
Funder
Corporación Radio Televisión Española
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference28 articles.
1. Garofolo, J., Fiscus, J., and Fisher, W. (1997). Proceedings of the DARPA Speech Recognition Workshop, Morgan Kaufmann Publishers. 2. An overview of Broadcast News corpora;Graff;Speech Commun.,2002 3. Bell, P., Gales, M.J.F., Hain, T., Kilgour, J., Lanchantin, P., Liu, X., McParland, A., Renals, S., Saz, O., and Wester, M. (2015, January 13–17). The MGB challenge: Evaluating multi-genre broadcast media recognition. Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Scottsdale, AZ, USA. 4. NIST (2013). NIST Open Keyword Search 2013 Evaluation (OpenKWS13), National Institute of Standards and Technology (NIST). [1st ed.]. 5. NIST (2014). NIST Open Keyword Search 2014 Evaluation (OpenKWS14), National Institute of Standards and Technology (NIST). [1st ed.].
|
|