1. VideoBERT: A Joint Model for Video and Language Representation Learning
2. Passive radar using Starlink transmissions: A theoretical study
3. Learning transferable visual models from natural language supervision;radford;ICML,0
4. Im2text: Describing images using 1 million captioned photographs;ordonez;NeurIPS,0
5. 12-in-l: Multitask vision and language representation learning;lu;CVPR,0