1. RADFORD A, KIM J W, XU T, et al. Robust speech recognition via large-scale weak supervision [DB/OL]. (2022-12-06) [2023-12-19]. http://arxiv.org/abs/2212.04356
2. BAEVSKI A, ZHOU Y, MOHAMED A, et al. wav2vec 2.0: A framework for self-supervised learning of speech representations [C]//34th Conference on Neural Information Processing Systems. Vancouver: NIPS, 2020: 12449–12460.
3. HSU W N, BOLTE B, TSAI Y H H, et al. Hu-BERT: Self-supervised speech representation learning by masked prediction of hidden units [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3451–3460.
4. CHEN S Y, WANG C Y, CHEN Z Y, et al. WavLM: Large-scale self-supervised pre-training for full stack speech processing [J]. IEEE Journal of Selected Topics in Signal Processing, 2022, 16(6): 1505–1518.
5. BAEVSKI A, HSU W N, XU Q T, et al. data2vec: A general framework for self-supervised learning in speech, vision and language [DB/OL]. (2022-02-07) [2023-12-19]. http://arxiv.org/abs/2202.03555