1. A benchmark and baseline for language-driven image editing;Shi,2020
2. Caise: Conversational agent for image search and editing;Kim,2022
3. Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments;Anderson,2018
4. Grounding linguistic commands to navigable regions;Rufus,2021
5. Dynamic multimodal instance segmentation guided by natural language queries;Margffoy-Tuay,2018