1. 14:00-17:00 ISO/IEC 15504-2. (2003). Retrieved 23 Oct 2023, from https://www.iso.org/standard/37458.html
2. Alumaë, T., Tilk, O., & Ullah, A. (2018). Advanced rich transcription system for Estonian speech. Frontiers in Artificial Intelligence and Applications, 307, 8.
3. Baevski, A., Zhou, H., Mohamed, A., Auli, M. (2020). Wav2vec 2.0: A framework for self-supervised learning of speech representations. In 34th Conference on neural information processing systems (NeurIPS 2020), (Vol. 2020), Vancouver, Canada.
4. Bain, M., Huh, J., Han, T., & Zisserman, A. (2023). WhisperX: Time-accurate speech transcription of long-form audio. https://doi.org/10.48550/arXiv.2303.00747
5. Bredin, H., Yin, R., Coria, J. M., Gelly, G., Korshunov, P., Lavechin, M., Fustes, D., Titeux, H., Bouaziz, W., & Gill, M.-P. (2019). Pyannote.Audio: Neural building blocks for speaker diarization.