1. Bänziger, T., Scherer, K.R.: The role of intonation in emotional expressions. Speech Commun. 46(3–4), 252–267 (2005)
2. Chu, W., Alwan, A.: Reducing f0 frame error of f0 tracking algorithms under noisy conditions with an unvoiced/voiced classification frontend. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3969–3972. IEEE (2009)
3. Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. Adv. Neural. Inf. Process. Syst. 34, 8780–8794 (2021)
4. Fant, G.: Acoustic theory of speech production: with calculations based on X-ray studies of Russian articulations, No. 2, Walter de Gruyter (1971)
5. Guo, Y., Du, C., Chen, X., Yu, K.: EmoDiff: intensity controllable emotional text-to-speech with soft-label guidance. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)