1. Learning transferable visual models from natural language supervision;Radford
2. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models;Li,2023
3. ConceptFusion: Open-set multimodal 3D mapping
4. OpenScene: 3D Scene Understanding with Open Vocabularies
5. Alvinn: An autonomous land vehicle in a neural network;Pomerleau;Advances in neural information processing systems,1988