1. Bai, J., Chen, R., Liu, M.: Feature-attention module for context-aware image-to-image translation. Vis. Comput. 36, 2145–2159 (2020). https://doi.org/10.1007/s00371-020-01943-0
2. Chang, H., Zhang, H., Jiang, L., Liu, C., Freeman, W.T.: MaskGIT: masked generative image transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11315–11325 (2022)
3. Chen, M., et al.: Generative pretraining from pixels. In: International Conference on Machine Learning, pp. 1691–1703. PMLR (2020)
4. Ding, M., et al.: CogView: mastering text-to-image generation via transformers. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
5. Dong, X., et al.: PeCo: perceptual codebook for BERT pre-training of vision transformers. arXiv preprint arXiv:2111.12710 (2021)