Funder
University College London
The Chinese University of Hong Kong
Reference75 articles.
1. Surgical-VQA: Visual question answering in surgical scenes using transformer;Seenivasan,2022
2. MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain;Sharma;Sci. Rep.,2021
3. Surgical-VQLA: Transformer with gated vision-language embedding for visual question localized-answering in robotic surgery;Bai,2023
4. Visualbert: A simple and performant baseline for vision and language;Li,2019
5. Multimodal research in vision and language: A review of current and emerging trends;Uppal;Inf. Fusion,2022