1. Shi, X., Chen, Z., Wang, H., (2015). Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Advances in Neural Information Processing Systems, 28.
2. Kalfaoglu, M. E., Kalkan, S., Alatan, A. A. (2020). Late temporal modeling in 3D CNN architectures with BERT for action recognition. In: Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part V, 16. Springer International Publishing, pp. 731-747.
3. Cho K, Van Merrienboer B, Gulcehre C, (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv preprint arXiv:1406.1078.
4. Zhang X. (2022). Research on traffic incident recognition method based on video classification and video description, Thesis of Master degree.
5. Traffic Scenario Understanding and Video Captioning via Guidance Attention Captioning Network;Liu C.;IEEE Transactions on Intelligent Transportation Systems.,2023