1. Aksoy, Ç., Ahmetoğlu, A., Güngör, T.: Hierarchical multitask learning approach for BERT. arXiv preprint arXiv:2011.04451 (2020)
2. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186 (2019)
3. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
4. Li, Y., Du, N., Bengio, S.: Time-dependent representation for neural event sequence prediction. arXiv preprint arXiv:1708.00065 (2017)
5. Liang, Y., et al.: Trajformer: efficient trajectory classification with transformers. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pp. 1229–1237 (2022)