1. Athar, A., Mahadevan, S., Osep, A., Leal-Taixé, L., & Leibe, B. (2020). Stem-seg: Spatio-temporal embeddings for instance segmentation in videos. In ECCV.
2. Awais, M., Naseer, M., Khan, S., Anwer, R.M., Cholakkal, H., Shah, M., Yang, M.H., & Khan, F.S. (2023). Foundational models defining a new era in vision: A survey and outlook. arXiv preprint arXiv:2307.13721.
3. Bertasius, G., & Torresani, L. (2020). Classifying, segmenting, and tracking object instances in video with mask propagation. In CVPR.
4. Caelles, A., Meinhardt, T., Brasó, G., & Leal-Taixé, L. (2022). DeVIS: Making deformable transformers work for video instance segmentation. arXiv:2207.11103.
5. Cao, J., Anwer, R.M., Cholakkal, H., Khan, F.S., Pang, Y., & Shao, L. (2020). Sipmask: Spatial information preservation for fast image and video instance segmentation. In ECCV.