1. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), pp. 4171–4186 (2019)
2. Fan, A., Grave, E., Joulin, A.: Reducing transformer depth on demand with structured dropout. arXiv preprint arXiv:1909.11556 (2019)
3. Gasmi, K., Dilek, S., Tosun, S., Ozdemir, S.: A survey on computation offloading and service placement in fog computing-based IoT. J. Supercomput. 78(2), 1983–2014 (2021). https://doi.org/10.1007/s11227-021-03941-y
4. Hu, Z., Dong, Y., Wang, K., Chang, K.W., Sun, Y.: GPT-GNN: generative pre-training of graph neural networks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1857–1867 (2020)
5. Kong, J., Wang, J., Zhang, X.: Accelerating pretrained language model inference using weighted ensemble self-distillation. In: Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC), pp. 224–235 (2021)