Author:
Xu Feifei,Zhou Wang,Sun Tao,Lu Jiahao,Yu Ziheng,Li Guangzhen
Publisher
Springer Nature Switzerland
Reference34 articles.
1. Das, A., et al.: Visual dialog. In: CVPR, pp. 326–335 (2017)
2. Lei, J., Yu, L., Bansal, M., Berg, T.L.: Tvqa: localized, compositional video question answering. arXiv preprint arXiv:1809.01696 (2018)
3. Alamri, H., et al.: Audio visual scene-aware dialog. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7558–7567 (2019)
4. Hori, C., et al.: End-to-end audio visual scene-aware dialog using multimodal attention-based video features. In: ICASSP 2019–2019 IEEE ICASSP, pp. 2352–2356. IEEE (2019)
5. Nguyen, D.T., Sharma, S., Schulz, H., Asri, L.E., et al.: From film to video: Multi-turn question answering with multi-modal context. arXiv preprint arXiv:1812.07023, 2018