1. Baevski, A., Zhou, Y., Mohamed, A., & Auli, M. (2020,
December). wav2vec 2.0: A framework for self-supervised learning of speech
representations. Proceedings of the Advances in Neural Information
Processing Systems (pp. 12449-12460). Online
Conference.
2. Bang, J. U., Yun, S., Kim, S. H., Choi, M. Y., Lee, M. K., Kim, Y.
J., Kim, D. H., ... Kim, S. H. (2020). KsponSpeech: Korean spontaneous speech
corpus for automatic speech recognition. Applied Sciences,
10(19), 6936. 10.3390/app10196936
3. Chang, K. W., Tseng, W. C., Li, S. W., & Lee, H. Y. (2022).
SpeechPrompt: An exploration of prompt tuning on generative spoken language
model for speech processing tasks. Retrieved from https://arxiv.org/abs/2203.16773
10.21437/Interspeech.2022-10610
4. Chen, S., Wang, C., Chen, Z., Wu, Y., Liu, S., Chen, Z., Li, J., ...
Wei, F. (2022). WavLM: Large-scale self-supervised pre-training for full stack
speech processing. IEEE Journal of Selected Topics in Signal
Processing, 16(6), 1505-1518. 10.1109/JSTSP.2022.3188113
5. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018).
Bert: Pre-training of deep bidirectional transformers for language
understanding. Retrieved from https://arxiv.org/abs/1810.04805