1. Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (2017, January 4–9). Attention is All you Need. Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA.
2. Lin, T., Wang, Y., Liu, X., and Qiu, X. (2021). A Survey of Transformers. arXiv.
3. Tay, Y., Dehghani, M., Gupta, J.P., Aribandi, V., Bahri, D., Qin, Z., and Metzler, D. Are Pretrained Convolutions Better than Pretrained Transformers? In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021.
4. Gawlikowski, J., Tassi, C.R.N., Ali, M., Lee, J., Humt, M., Feng, J., Kruspe, A., Triebel, R., Jung, P., and Roscher, R. (2021). A Survey of Uncertainty in Deep Neural Networks. arXiv.
5. Gal, Y., and Ghahramani, Z. (2016, January 19–24). Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA.