1. Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint. arXiv:1409.0473
2. Bengio, S., Vinyals, O., Jaitly, N., & Shazeer, N. (2015). Scheduled sampling for sequence prediction with recurrent neural networks. Advances In Neural Information Processing Systems, 28
3. Brady, K. K., Evmenova, A. S., Regan, K. S., Ainsworth, M. K., & Gafurov, B. S. (2022). Using a technology-based graphic organizer to improve the planning and persuasive paragraph writing by adolescents with disabilities and writing difficulties. The Journal of Special Education, 55(4), 222–233.
4. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., others (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint. arXiv:2010.11929
5. Fan, A., Grave, E., & Joulin, A. (2019). Reducing transformer depth on demand with structured dropout. arXiv preprint. arXiv:1909.11556