An Overview of the IberSpeech-RTVE 2022 Challenges on Speech Technologies

Author:

Lleida Eduardo1ORCID,Rodriguez-Fuentes Luis Javier2ORCID,Tejedor Javier3ORCID,Ortega Alfonso1ORCID,Miguel Antonio1ORCID,Bazán Virginia4ORCID,Pérez Carmen4ORCID,de Prada Alberto4ORCID,Penagarikano Mikel2ORCID,Varona Amparo2ORCID,Bordel Germán2ORCID,Torre-Toledano Doroteo5ORCID,Álvarez Aitor6ORCID,Arzelus Haritz6ORCID

Affiliation:

1. Vivolab, Aragon Institute for Engineering Research (I3A), University of Zaragoza, 50018 Zaragoza, Spain

2. Department of Electricity and Electronics, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), Barrio Sarriena, 48940 Leioa, Spain

3. Institute of Technology, Universidad San Pablo-CEU, CEU Universities, Urbanización Montepríncipe, 28668 Boadilla del Monte, Spain

4. Corporación Radiotelevisión Española, 28223 Madrid, Spain

5. AUDIAS, Electronic and Communication Technology Department, Escuela Politécnica Superior, Universidad Autónoma de Madrid, Av. Francisco Tomás y Valiente, 11, 28049 Madrid, Spain

6. Fundación Vicomtech, Basque Research and Technology Alliance (BRTA), Mikeletegi 57, 20009 Donostia-San Sebastián, Spain

Abstract

Evaluation campaigns provide a common framework with which the progress of speech technologies can be effectively measured. The aim of this paper is to present a detailed overview of the IberSpeech-RTVE 2022 Challenges, which were organized as part of the IberSpeech 2022 conference under the ongoing series of Albayzin evaluation campaigns. In the 2022 edition, four challenges were launched: (1) speech-to-text transcription; (2) speaker diarization and identity assignment; (3) text and speech alignment; and (4) search on speech. Different databases that cover different domains (e.g., broadcast news, conference talks, parliament sessions) were released for those challenges. The submitted systems also cover a wide range of speech processing methods, which include hidden Markov model-based approaches, end-to-end neural network-based methods, hybrid approaches, etc. This paper describes the databases, the tasks and the performance metrics used in the four challenges. It also provides the most relevant features of the submitted systems and briefly presents and discusses the obtained results. Despite employing state-of-the-art technology, the relatively poor performance attained in some of the challenges reveals that there is still room for improvement. This encourages us to carry on with the Albayzin evaluation campaigns in the coming years.

Funder

Corporación Radio Televisión Española

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Reference28 articles.

1. Garofolo, J., Fiscus, J., and Fisher, W. (1997). Proceedings of the DARPA Speech Recognition Workshop, Morgan Kaufmann Publishers.

2. An overview of Broadcast News corpora;Graff;Speech Commun.,2002

3. Bell, P., Gales, M.J.F., Hain, T., Kilgour, J., Lanchantin, P., Liu, X., McParland, A., Renals, S., Saz, O., and Wester, M. (2015, January 13–17). The MGB challenge: Evaluating multi-genre broadcast media recognition. Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Scottsdale, AZ, USA.

4. NIST (2013). NIST Open Keyword Search 2013 Evaluation (OpenKWS13), National Institute of Standards and Technology (NIST). [1st ed.].

5. NIST (2014). NIST Open Keyword Search 2014 Evaluation (OpenKWS14), National Institute of Standards and Technology (NIST). [1st ed.].

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3