1. Hu, R., Rohrbach, M., & Darrell, T. (2016). Segmentation from natural language expressions. In B. Leibe, J. Matas, N. Sebe, et al. (Eds.), Proceedings of the 14th European conference of computer vision (pp. 108–124). Cham: Springer.
2. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431–3440). Piscataway: IEEE.
3. Zhou, Y., Ji, R., Luo, G., Sun, X., Su, J., Ding, X., et al. (2023). A real-time global inference network for one-stage referring expression comprehension. IEEE Transactions on Neural Networks and Learning Systems, 34(1), 134–143.
4. Luo, G., Zhou, Y., Sun, J., Sun, X., & Ji, R. (2024). A survivor in the era of large-scale pretraining: an empirical study of one-stage referring expression comprehension. IEEE Transactions on Multimedia, 26, 3689–3700.
5. He, S., Ding, H., Liu, C., & Jiang, X. (2023). GREC: generalized referring expression comprehension. arXiv preprint. arXiv:2308.16182.