Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models-Reference-Cited by-同舟云学术

Mi-Go: tool which uses YouTube as data source for evaluating general-purpose speech recognition machine learning models

Published:2024-05-01 Issue:1 Volume:2024 Page:
ISSN:1687-4722
Container-title:EURASIP Journal on Audio, Speech, and Music Processing
language:en
Short-container-title:J AUDIO SPEECH MUSIC PROC.

Author:

Wojnar Tomasz^ORCID,Hryszko Jarosław^ORCID,Roman Adam^ORCID

Abstract

AbstractThis article introduces Mi-Go, a tool aimed at evaluating the performance and adaptability of general-purpose speech recognition machine learning models across diverse real-world scenarios. The tool leverages YouTube as a rich and continuously updated data source, accounting for multiple languages, accents, dialects, speaking styles, and audio quality levels. To demonstrate the effectiveness of the tool, an experiment was conducted, by using Mi-Go to evaluate state-of-the-art automatic speech recognition machine learning models. The evaluation involved a total of 141 randomly selected YouTube videos. The results underscore the utility of YouTube as a valuable data source for evaluation of speech recognition models, ensuring their robustness, accuracy, and adaptability to diverse languages and acoustic conditions. Additionally, by contrasting the machine-generated transcriptions against human-made subtitles, the Mi-Go tool can help pinpoint potential misuse of YouTube subtitles, like search engine optimization.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s13636-024-00343-9.pdf

Reference32 articles.

1. S. Abu-El-Haija, N. Kothari, J. Lee, P. Natsev, G. Toderici, B. Varadarajan, S. Vijayanarasimhan, Youtube-8m: A large-scale video classification benchmark. (2016). arXiv preprint arXiv:1609.08675

2. T. Afouras, J.S. Chung, A. Zisserman, Lrs3-ted: A large-scale dataset for visual speech recognition. (2018). arXiv preprint arXiv:1809.00496

3. S. Allen. How many videos are on YouTube? 33+ interesting stats. (2023). https://www.nichepursuits.com/how-many-videos-are-on-youtube/. Accessed 17 Dec 2023

4. R. Ardila, M. Branson, K. Davis, M. Henretty, M. Kohler, J. Meyer, R. Morais, L. Saunders, F.M. Tyers, G. Weber, Common voice: A massively-multilingual speech corpus. Proceedings of the Twelfth Language Resources and Evaluation Conference. (European Language Resources Association, Marseille, 2020), p. 4218–4222

5. A. Baevski, Y. Zhou, A. Mohamed, M. Auli, wav2vec 2.0: A framework for self-supervised learning of speech representations. Adv. Neural Inf. Process. Syst. 33, 12449–12460 (2020)