Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation-Reference-Cited by-同舟云学术

Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation

Published:2021-05-18 Issue:14 Volume:35 Page:12749-12759
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Dong Qianqian,Ye Rong,Wang Mingxuan,Zhou Hao,Xu Shuang,Xu Bo,Li Lei

Abstract

An end-to-end speech-to-text translation (ST) takes audio in a source language and outputs the text in a target language. Existing methods are limited by the amount of parallel corpus. Can we build a system to fully utilize signals in a parallel ST corpus? We are inspired by human understanding system which is composed of auditory perception and cognitive processing. In this paper, we propose Listen-Understand-Translate, (LUT), a unified framework with triple supervision signals to decouple the end-to-end speech-to-text translation task. LUT is able to guide the acoustic encoder to extract as much information from the auditory input. In addition, LUT utilizes a pre-trained BERT model to enforce the upper encoder to produce as much semantic information as possible, without extra data. We perform experiments on a diverse set of speech translation benchmarks, including Librispeech English-French, IWSLT English-German and TED English-Chinese. Our results demonstrate LUT achieves the state-of-the-art performance, outperforming previous methods. The code is available at https://github.com/dqqcasia/st.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Speaker voice normalization for end-to-end speech translation;Expert Systems with Applications;2024-08

2. Soft Alignment of Modality Space for End-to-End Speech Translation;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

3. A multitask co-training framework for improving speech translation by leveraging speech recognition and machine translation tasks;Neural Computing and Applications;2024-02-27

4. Improving Speech Translation by Understanding the Speech From Latent Code;IEEE Signal Processing Letters;2024

5. Learning Semantic Information from Machine Translation to Improve Speech-to-Text Translation;2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC);2023-10-31