1. Sara Atito , Muhammad Awais , and Josef Kittler . 2021 . Sit: Self-supervised vision transformer. arXiv preprint arXiv:2104.03602 (2021). Sara Atito, Muhammad Awais, and Josef Kittler. 2021. Sit: Self-supervised vision transformer. arXiv preprint arXiv:2104.03602 (2021).
2. Learning Spatial Knowledge for Text to 3D Scene Generation
3. Angel X Chang , Mihail Eric , Manolis Savva , and Christopher D Manning . 2017. SceneSeer: 3D scene design with natural language. arXiv preprint arXiv:1703.00050 ( 2017 ). Angel X Chang, Mihail Eric, Manolis Savva, and Christopher D Manning. 2017. SceneSeer: 3D scene design with natural language. arXiv preprint arXiv:1703.00050 (2017).
4. Mark Chen , Alec Radford , Rewon Child , Jeffrey Wu , Heewoo Jun , David Luan , and Ilya Sutskever . 2020 . Generative pretraining from pixels . In International conference on machine learning. 1691--1703 . Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020. Generative pretraining from pixels. In International conference on machine learning. 1691--1703.
5. Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).