Author:
Shi Haoxiang,Wang Jianzong,Zhang Xulong,Cheng Ning,Yu Jun,Xiao Jing
Publisher
Springer Nature Singapore
Reference29 articles.
1. Cheng, P., Hao, W., Dai, S., Liu, J., Gan, Z., Carin, L.: Club: a contrastive log-ratio upper bound of mutual information. In: Proceedings of the 37th International Conference on Machine Learning, pp. 1779–1788 (2020)
2. Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 835–838 (2013)
3. Guo, Y., Du, C., Chen, X., Yu, K.: EmoDiff: intensity controllable emotional text-to-speech with soft-label guidance. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1–5 (2023)
4. Im, C., Lee, S., Kim, S., Lee, S.: EMOQ-TTS: emotion intensity quantization for fine-grained controllable emotional text-to-speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6317–6321 (2022)
5. Inoue, S., Zhou, K., Wang, S., Li, H.: Hierarchical emotion prediction and control in text-to-speech synthesis. In: 2024 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 10601–10605 (2024)