1. 1) B. Sisman, J. Yamagishi, S. King, and H. Li, “An overview of voice conversion and its challenges: From statistical modeling to deep learning,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 132–157, 2020.
2. 2) E. Casanova, J. Weber, C. D. Shulby, A. C. Junior, E. Gölge, and M. A. Ponti, “YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone,” in Proc. ICML. PMLR, 2022, pp. 2709–2720.
3. 3) S. Gao, X. Wu, C. Xiang, and D. Huang, “Development of a computationally efficient voice conversion system on mobile phones,” APSIPA Transactions on Signal and Information Processing, vol. 8, pp. 1–20, 2019.
4. 4) T. Saeki, Y. Saito, S. Takamichi, and H. Saruwatari, “Real-time, full-band, online DNN-based voice conversion system using a single CPU,” in Proc. INTERSPEECH, 2020, pp. 1021–1022.
5. 5) P. L. Tobing and T. Toda, “Low-latency real-time nonparallel voice conversion based on cyclic variational autoencoder and multiband WaveRNN with data-driven linear prediction,” in Proc. SSW, 2021, pp. 142–147.