1. Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering, pp. 6077–6086 (2018)
2. Bao, H., et al.: Unilmv2: pseudo-masked language models for unified language model pre-training. In: International Conference on Machine Learning, pp. 642–652. PMLR (2020)
3. Bush, V., et al.: As we may think. The atlantic monthly 176(1), 101–108 (1945)
4. Byrne, D., Kelliher, A., Jones, G.J.: Life editing: third-party perspectives on lifelog content. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1501–1510 (2011)
5. Castro, S., Azab, M., Stroud, J., Noujaim, C., Wang, R., Deng, J., Mihalcea, R.: Lifeqa: a real-life dataset for video question answering. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 4352–4358 (2020)