Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion-Reference-Cited by-同舟云学术

Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion

Published:2022-05-23 Issue: Volume: Page:
ISSN:
Container-title:ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
language:
Short-container-title:

Author:

Zhao Xintao¹,Liu Feng²,Song Changhe¹,Wu Zhiyong¹,Kang Shiyin²,Tuo Deyi²,Meng Helen¹

Affiliation:

1. Tsinghua University,Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems, Shenzhen International Graduate School,Shenzhen,China

2. Huya Inc,Guangzhou,China

Funder

National Natural Science Foundation of China

Publisher

IEEE

Link

http://xplorestaging.ieee.org/ielx7/9745891/9746004/09747625.pdf?arnumber=9747625

Reference26 articles.

1. Sequence-to-Sequence Acoustic Modeling for Voice Conversion

2. Connectionist temporal classification

3. Oneshot voice conversion by separating speaker and content representations with instance normalization;chou,2019

4. Domain-adversarial training of neural networks;ganin;The Journal of Machine Learning Research,2016

5. Attention-based models for speech recognition;chorowski,2015

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

2. Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

3. Accurate Semi-supervised Automatic Speech Recognition via Multi-hypotheses-Based Curriculum Learning;Lecture Notes in Computer Science;2024

4. NVCGAN: Leveraging Generative Adversarial Networks for Robust Voice Conversion;Lecture Notes in Computer Science;2024

5. CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control and Contrastive Learning with Negative Samples Augmentation;2023 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom);2023-12-21