Cross-lingual voice conversion based on F0 multi-scale modeling with VITS-Reference-Cited by-同舟云学术

Cross-lingual voice conversion based on F0 multi-scale modeling with VITS

Published:2024-03 Issue: Volume:2212 Page:375-379
ISSN:
Container-title:Proceedings of the 2024 3rd International Conference on Cyber Security, Artificial Intelligence and Digital Economy
language:
Short-container-title:

Author:

Cao Danyang¹^ORCID,Zhang Zeyi¹^ORCID

Affiliation:

1. School of information and technology, North China University of Technology, China

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3672919.3672988

Reference16 articles.

1. Yi Zhou, Xiaohai Tian, Haihua Xu, Rohan Kumar Das, and Haizhou Li, "Cross-lingual voice conversion with bilingual phonetic posteriorgram and average modeling," in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 6790–6794.

2. Jingyi Li, Weiping Tu, and Li Xiao. Freevc: Towards high-quality text-free one-shot voice conversion. In ICASSP 2023- 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.

3. Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi, and Hiroshi Ishiguro. Quickvc:Many-to-any voice conversion using inverse short-time fourier transform for faster conversion. arXiv preprint arXiv:2302.08296, 2023.

4. Disong Wang Liqun Deng Yu Ting Yeung Xiao Chen Xunying Liu and Helen Meng. Vqmivc: Vector quantization and mutual information-based unsupervised speech representation disentanglement for one-shot voice conversion. arXiv preprint arXiv:2106.10132 2021.

5. Jonathan Shen, Ruoming Pang, Ron J Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Rj Skerrv- Ryan, Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 4779–4783. IEEE, 2018.