Author:
Chen Yaqi,Zhang Hao,Zhang Wenlin,Qu Dan,Yang Xukui
Abstract
AbstractMeta-learning has proven to be a powerful paradigm for transferring knowledge from prior tasks to facilitate the quick learning of new tasks in automatic speech recognition. However, the differences between languages (tasks) lead to variations in task learning directions, causing the harmful competition for model’s limited resources. To address this challenge, we introduce the task-agreement multilingual meta-learning (TAMML), which adopts the gradient agreement algorithm to guide the model parameters towards a direction where tasks exhibit greater consistency. However, the computation and storage cost of TAMML grows dramatically with model’s depth increases. To address this, we further propose a simplification called TAMML-Light which only uses the output layer for gradient calculation. Experiments on three datasets demonstrate that TAMML and TAMML-Light achieve outperform meta-learning approaches, yielding superior results.Furthermore, TAMML-Light can reduce at least 80 $$\%$$
%
of the relative increased computation expenses compared to TAMML.
Funder
Natural Science Foundation of Henan Province
National Natural Science Foundation of China
Henan Zhongyuan Science and Technology Innovation Leading Talent Project
Publisher
Springer Science and Business Media LLC
Reference28 articles.
1. Baevski A, Zhou H, Mohamed A-r, Auli M (2020) wav2vec 2.0: A framework for self-supervised learning of speech representations. ArXiv arxiv:2006.11477
2. Pratap V, Sriram A, Tomasello P, Hannun AY, Liptchinsky V, Synnaeve G, Collobert R (2020) Massively multilingual asr: 50 languages, 1 model, 1 billion parameters. ArXiv arxiv:2007.03001
3. Hsu W-N, Bolte B, Tsai Y-HH, Lakhotia K, Salakhutdinov R, Mohamed A-r (2021) Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Trans Audio Speech Lang Proc 29:3451–3460
4. Luo J, Wang J, Cheng N, Zheng Z, Xiao J (2022) Adaptive activation network for low resource multilingual speech recognition. 2022 Int Jt Conf Neural Netw (IJCNN), 1–7
5. Hou W, Dong Y, Zhuang B, Yang L, Shi J, Shinozaki T (2020) Large-scale end-to-end multilingual speech recognition and language identification with multi-task learning. In: Interspeech