1. Baevski, A., Zhou, Y., Mohamed, A., Auli, M.: wav2vec 2.0: A framework for self-supervised learning of speech representations. In: Advances in Neural Information Processing Systems, vol. 33, pp. 12449–12460 (2020)
2. Beckman, M.E., Ayers Elam, G.: Guidelines for ToBI Labelling, Version 3. The Ohio State University Research Foundation, Ohio State University (1997)
3. Bredin, H.: TristouNet: triplet loss for speaker turn embedding. In: Proceedings of ICASSP 2017, pp. 5430–5434 (2017)
4. Christodoulides, G., Avanzi, M., Simon, A.C.: Automatic labelling of prosodic prominence, phrasing and disfluencies in French speech by simulating the perception of Naïve and expert listeners. In: Proceedings of InterSpeech 2017, pp. 3936–3940 (2017)
5. Cooper, E., Huang, W.C., Toda, T., Yamagishi, J.: Generalization ability of MOS prediction networks. In: Proceedings of ICASSP 2022, pp. 8442–8446 (2022)