1. David A Bulkin and Jennifer M Groh . 2006. Seeing sounds: visual and auditory interactions in the brain. Current opinion in neurobiology , Vol. 16 , 4 ( 2006 ), 415--419. David A Bulkin and Jennifer M Groh. 2006. Seeing sounds: visual and auditory interactions in the brain. Current opinion in neurobiology, Vol. 16, 4 (2006), 415--419.
2. Santiago Castro , Mahmoud Azab , Jonathan Stroud , Cristina Noujaim , Ruoyao Wang , Jia Deng , and Rada Mihalcea . 2020 . LifeQA: A Real-life Dataset for Video Question Answering . In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association , Marseille, France, 4352--4358. https://aclanthology.org/ 2020.lrec-1.536 Santiago Castro, Mahmoud Azab, Jonathan Stroud, Cristina Noujaim, Ruoyao Wang, Jia Deng, and Rada Mihalcea. 2020. LifeQA: A Real-life Dataset for Video Question Answering. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 4352--4358. https://aclanthology.org/2020.lrec-1.536
3. Vggsound: A Large-Scale Audio-Visual Dataset
4. Seongho Choi , Kyoung-Woon On , Yu-Jung Heo , Ahjeong Seo , Youwon Jang , Minsu Lee , and Byoung-Tak Zhang . 2020 . Dramaqa: Character-centered video story understanding with hierarchical qa. arXiv preprint arXiv:2005.03356 (2020). Seongho Choi, Kyoung-Woon On, Yu-Jung Heo, Ahjeong Seo, Youwon Jang, Minsu Lee, and Byoung-Tak Zhang. 2020. Dramaqa: Character-centered video story understanding with hierarchical qa. arXiv preprint arXiv:2005.03356 (2020).
5. Anthony Colas , Seokhwan Kim , Franck Dernoncourt , Siddhesh Gupte , Daisy Zhe Wang, and Doo Soon Kim . 2019 . TutorialVQA : Question answering dataset for tutorial videos. arXiv preprint arXiv:1912.01046 (2019). Anthony Colas, Seokhwan Kim, Franck Dernoncourt, Siddhesh Gupte, Daisy Zhe Wang, and Doo Soon Kim. 2019. TutorialVQA: Question answering dataset for tutorial videos. arXiv preprint arXiv:1912.01046 (2019).