1. NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space
2. Simvlm: Simple visual language model pretraining with weak supervision;Wang
3. Flamingo: a visual language model for few-shot learning;Alayrac,2022
4. Improving depth gradient continuity in transformers: A comparative study on monocular depth estimation with cnn;Yao,2023
5. Woodpecker: Hallucination correction for multimodal large language models;Yin,2023