1. Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L., 2017. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6077–6086.
2. TypeFormer: Multiscale transformer with type controller for remote sensing image caption;Chen;IEEE Geosci. Remote Sens. Lett.,2022
3. Remote sensing image scene classification: Benchmark and state of the art;Cheng;Proc. IEEE,2017
4. From plane to hierarchy: Deformable transformer for remote sensing image captioning;Du;IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.,2023
5. Improving visual question answering for remote sensing via alternate-guided attention and combined loss;Feng;Int. J. Appl. Earth Obs. Geoinf.,2023