Author:
Yan Yichen,He Xingjian,Chen Sihan,Lu Shichen,Liu Jing
Publisher
Springer Nature Singapore
Reference22 articles.
1. Cheng, Z., et al.: Parallel vertex diffusion for unified visual grounding. arXiv preprint arXiv:2303.07216 (2023)
2. Cho, Y., Yu, H., Kang, S.J.: Cross-aware early fusion with stage-divided vision and language transformer encoders for referring image segmentation. IEEE Trans. Multimedia (2023)
3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
4. Ding, H., Liu, C., Wang, S., Jiang, X.: VLT: vision-language transformer and query generation for referring segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2022)
5. Feng, G., Hu, Z., Zhang, L., Lu, H.: Encoder fusion network with co-attention embedding for referring image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15506–15515 (2021)