SpeechX: Neural Codec Language Model as a Versatile Speech Transformer-Reference-Cited by-同舟云学术

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Published:2024 Issue: Volume:32 Page:3355-3364
ISSN:2329-9290
Container-title:IEEE/ACM Transactions on Audio, Speech, and Language Processing
language:
Short-container-title:IEEE/ACM Trans. Audio Speech Lang. Process.

Author:

Wang Xiaofei¹^ORCID,Thakker Manthan¹,Chen Zhuo¹,Kanda Naoyuki¹^ORCID,Eskimez Sefik Emre¹,Chen Sanyuan²^ORCID,Tang Min¹^ORCID,Liu Shujie²,Li Jinyu¹^ORCID,Yoshioka Takuya¹^ORCID

Affiliation:

1. Microsoft Corporation, Redmond, WA, USA

2. Microsoft Research Asia, Beijing, China

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Link

http://xplorestaging.ieee.org/ielx8/6570655/10304349/10577150.pdf?arnumber=10577150

Reference43 articles.

1. Language models are few-shot learners;Brown,2020

2. GPT-4 technical report,2023

3. High-Resolution Image Synthesis with Latent Diffusion Models

4. AudioLM: A Language Modeling Approach to Audio Generation

5. Flamingo: A visual language model for few-shot learning;Alayrac,2022

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Mapache: Masked Parallel Transformer for Advanced Speech Editing and Synthesis;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

2. SELM: Speech Enhancement using Discrete Tokens and Language Models;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

3. uSee: Unified Speech Enhancement And Editing with Conditional Diffusion Models;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14