SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody-Reference-Cited by-同舟云学术

SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody

Published:2023-10-26 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 31st ACM International Conference on Multimedia
language:
Short-container-title:

Author:

Lu Hui¹^ORCID,Wu Xixin¹^ORCID,Wu Zhiyong²^ORCID,Meng Helen¹^ORCID

Affiliation:

1. The Chinese University of Hong Kong, Hong Kong SAR, China

2. Tsinghua University, Shenzhen, China

Funder

National Natural Science Foundation of China

The Center for Perceptual and Interactive Intelligence (CPII) Ltd under the Innovation and Technology Commission's InnoHK Scheme

Shenzhen Science and Technology Program

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3581783.3612485

Reference24 articles.

1. Christopher P. Burgess , Irina Higgins , Arka Pal , Loïc Matthey , Nick Watters , Guillaume Desjardins , and Alexander Lerchner . 2018. Understanding disentangling in (β)-VAE. CoRR , Vol. abs/ 1804 .03599 ( 2018 ). [arXiv]1804.03599 http://arxiv.org/abs/1804.03599 Christopher P. Burgess, Irina Higgins, Arka Pal, Loïc Matthey, Nick Watters, Guillaume Desjardins, and Alexander Lerchner. 2018. Understanding disentangling in (β)-VAE. CoRR, Vol. abs/1804.03599 (2018). [arXiv]1804.03599 http://arxiv.org/abs/1804.03599

2. Chak Ho Chan , Kaizhi Qian , Yang Zhang , and Mark Hasegawa-Johnson . 2022 . Speechsplit2. 0: Unsupervised speech disentanglement for voice conversion without tuning autoencoder bottlenecks . In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6332--6336 . Chak Ho Chan, Kaizhi Qian, Yang Zhang, and Mark Hasegawa-Johnson. 2022. Speechsplit2. 0: Unsupervised speech disentanglement for voice conversion without tuning autoencoder bottlenecks. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6332--6336.

3. Ju-Chieh Chou and Hung-yi Lee. 2019 . One-Shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization. In Interspeech 2019 , 20th Annual Conference of the International Speech Communication Association , Graz, Austria , 15-19 September 2019, Gernot Kubin and Zdravko Kacic (Eds.). ISCA, 664--668. Ju-Chieh Chou and Hung-yi Lee. 2019. One-Shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization. In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019, Gernot Kubin and Zdravko Kacic (Eds.). ISCA, 664--668.

4. Irina Higgins , Loïc Matthey , Arka Pal , Christopher Burgess , Xavier Glorot , Matthew Botvinick , Shakir Mohamed , and Alexander Lerchner . 2017 . beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework . In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. Irina Higgins, Loïc Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. 2017. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net.