1. Long Short-Term Memory;Hochreiter;Neural Comput.,1997
2. Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014, January 12–13). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. Proceedings of the NIPS 2014 Deep Learning and Representation Learning Workshop, Montreal, QC, Canada.
3. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4–9). Attention is All you Need. Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA.
4. Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv.
5. Transformer for object detection: Review and benchmark;Li;Eng. Appl. Artif. Intell.,2023