1. An image is worth 16x16 words: Transformers for image recognition at scale;Dosovitskiy,2020
2. C. Godard, O. Mac Aodha, G.J. Brostow, Unsupervised monocular depth estimation with left-right consistency, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 270–279.
3. Training language models to follow instructions with human feedback;Ouyang,2022
4. Language models are few-shot learners;Brown;Adv. Neural Inf. Process. Syst.,2020
5. Weight uncertainty in neural network;Blundell,2015