The ASR Post-Processor Performance Challenges of BackTranScription (BTS): Data-Centric and Model-Centric Approaches-Reference-Cited by-同舟云学术

The ASR Post-Processor Performance Challenges of BackTranScription (BTS): Data-Centric and Model-Centric Approaches

Published:2022-10-02 Issue:19 Volume:10 Page:3618
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Park Chanjun^ORCID,Seo Jaehyung,Lee Seolhwa,Lee Chanhee,Lim Heuiseok

Abstract

Training an automatic speech recognition (ASR) post-processor based on sequence-to-sequence (S2S) requires a parallel pair (e.g., speech recognition result and human post-edited sentence) to construct the dataset, which demands a great amount of human labor. BackTransScription (BTS) proposes a data-building method to mitigate the limitations of the existing S2S based ASR post-processors, which can automatically generate vast amounts of training datasets, reducing time and cost in data construction. Despite the emergence of this novel approach, the BTS-based ASR post-processor still has research challenges and is mostly untested in diverse approaches. In this study, we highlight these challenges through detailed experiments by analyzing the data-centric approach (i.e., controlling the amount of data without model alteration) and the model-centric approach (i.e., model modification). In other words, we attempt to point out problems with the current trend of research pursuing a model-centric approach and alert against ignoring the importance of the data. Our experiment results show that the data-centric approach outperformed the model-centric approach by +11.69, +17.64, and +19.02 in the F1-score, BLEU, and GLEU tests.

Funder

Ministry of Science and ICT

National Research Foundation of Korea

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/10/19/3618/pdf

Reference32 articles.

1. A Gaussian Mixture Model Spectral Representation for Speech Recognition;Stuttle;Ph.D. Thesis,2003

2. The Application of Hidden Markov Models in Speech Recognition

3. wav2vec 2.0: A framework for self-supervised learning of speech representations;Baevski;arXiv,2020

4. The Relevance of the Source Language in Transfer Learning for ASR;Hjortnæs;Proceedings of the Workshop on Computational Methods for Endangered Languages,2021

5. XLST: Cross-lingual Self-training to Learn Multilingual Representation for Low Resource Speech Recognition;Zhang;arXiv,2021