1. S. Y. Min et al., “Film: Following instructions in language with modular methods,” arXiv preprint arXiv:2110.07342 (2021).
2. H. Liu et al., “LEBP—language expectation & binding policy: A two-stream framework for embodied vision-and-language interaction task learning agents,” arXiv preprint arXiv:2203.04637 (2022).
3. J. Devlin et al., “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805 (2018).
4. A. Chowdhery et al., “Palm: Scaling language modeling with pathways,” arXiv preprint arXiv:2204.02311 (2022).
5. T. Brown et al., “Language models are few-shot learners,” Advances in neural information processing systems,” 33, 1877–1901 (2020).