1. 2018a. Conditional image-text embedding networks;Proceedings of the European Conference on Computer Vision (ECCV)
2. Mattnet: Modular attention network for referring expression comprehension;Proceedings of the IEEE conference on computer vision and pattern recognition,2018
3. nuscenes: A multimodal dataset for autonomous driving;H Caesar;Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,2020
4. Refnms: Breaking proposal bottlenecks in two-stage referring expression grounding;L Chen;Proceedings of the AAAI conference on artificial intelligence,2021