AudioLDM 2: Learning Holistic Audio Generation With Self-Supervised Pretraining-Reference-Cited by-同舟云学术

AudioLDM 2: Learning Holistic Audio Generation With Self-Supervised Pretraining

Published:2024 Issue: Volume:32 Page:2871-2883
ISSN:2329-9290
Container-title:IEEE/ACM Transactions on Audio, Speech, and Language Processing
language:
Short-container-title:IEEE/ACM Trans. Audio Speech Lang. Process.

Author:

Liu Haohe¹^ORCID,Yuan Yi¹^ORCID,Liu Xubo¹^ORCID,Mei Xinhao¹^ORCID,Kong Qiuqiang²^ORCID,Tian Qiao³^ORCID,Wang Yuping³,Wang Wenwu¹^ORCID,Wang Yuxuan³,Plumbley Mark D.¹^ORCID

Affiliation:

1. Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guilford, U.K.

2. Department of Electronic Engineering, Chinese University of Hong Kong, Hong Kong, SAR, China

3. Speech, Audio & Music Intelligence (SAMI) Group, ByteDance Inc., Beijing, China

Funder

British Broadcasting Corporation Research and Development

Engineering and Physical Sciences Research Council

Centre for Vision, Speech and Signal Processing

Faculty of Engineering and Physical Science

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Link

http://xplorestaging.ieee.org/ielx7/6570655/10304349/10530074.pdf?arnumber=10530074

Reference93 articles.

1. A comprehensive survey of AI-generated content: A history of generative AI from GAN to ChatGPT;Cao,2023

2. NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality

3. AudioGen: Textually guided audio generation;Kreuk,2022

4. AudioLDM: Text-to-audio generation with latent diffusion models;Liu,2023

5. SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Diffusion-based diverse audio captioning with retrieval-guided Langevin dynamics;Information Fusion;2025-02

2. EDAVS: Emotion-Driven Audiovisual Synthesis Experience;ACM SIGGRAPH 2024 Posters;2024-07-25

3. Video and Audio Deepfake Datasets and Open Issues in Deepfake Technology: Being Ahead of the Curve;Forensic Sciences;2024-07-13

4. From Large Language Models to Large Multimodal Models: A Literature Review;Applied Sciences;2024-06-11

5. Intelligent design of shear wall layout based on diffusion models;Computer-Aided Civil and Infrastructure Engineering;2024-05-17