1. Ai, J., Yang, Y., Xu, X., Zhou, J., & Shen, HT. (2020). CC-LSTM: Cross and conditional long-short time memory for video captioning. In: A. D. Bimbo, R. Cucchiara, S. Sclaroff, G. M. Farinella, T. Mei, M. Bertini, H. J. Escalante, R. Vezzani, (Eds.), Pattern Recognition. ICPR International Workshops and Challenges—Virtual Event, January 10-15, 2021, Proceedings, Part VI, vol 12666 (Springer, 2020) Lecture Notes in Computer Science, pp. 353–365.
2. Ba, J., Mnih, V., & Kavukcuoglu, K. (2015). Multiple object recognition with visual attention. In: Y. Bengio, Y. LeCun (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
3. Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y (eds) In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
4. Bello, I. (2021). Lambdanetworks: Modeling long-range interactions without attention. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, OpenReview.net.
5. Brehm, S., Scherer, S., & Lienhart, R. (2020). High-resolution dual-stage multi-level feature aggregation for single image and video deblurring. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2020, Seattle, WA, USA, June 14-19, 2020, IEEE, pp 1872–1881.