1. Semantic understanding of scenes through the ADE20K dataset;Bolei;International Journal of Computer Vision,2019
2. Chȩciński, K., & Wawrzyński, P. (2020). DCT-Conv: Coding filters in convolutional networks with Discrete Cosine Transform. In Proceedings of the international joint conference on neural network.
3. Chen, X., Qin, Y., Xu, W., Bur, A. M., Zhong, C., & Wang, G. (2022). Explicitly Increasing Input Information Density for Vision Transformers on Small Datasets. In Proceedings of the 36th conference on neural information processing systems.
4. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
5. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2020). An Image is Worth 16X16 Worda: Transformers for Image Recognition at Scale. In Proceedings of the international conference on learning representations.