1. Scene text visual question answering by using YOLO and STN;International Journal of Speech Technology;2024-01-03
2. FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions;2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV);2024-01-03
3. CLIPAG: Towards Generator-Free Text-to-Image Generation;2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV);2024-01-03
4. PreSTU: Pre-Training for Scene-Text Understanding;2023 IEEE/CVF International Conference on Computer Vision (ICCV);2023-10-01