1. Burges, C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. (2013, January 5–10). Distributed Representations of Words and Phrases and their Compositionality. Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA.
2. Long Short-Term Memory;Hochreiter;Neural Comput.,1997
3. Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (2017, January 4–9). Attention is All you Need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA.
4. Atkinson-Abutridy, J. (2024). Large Language Models: Concepts, Techniques and Applications, CRC Press.
5. Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 6–11). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the North American Chapter of the Association for Computational Linguistics, Online.