1. Tensor fusion network for multimodal sentiment analysis;Zadeh,2017
2. Efficient low-rank multimodal fusion with modality-specific factors;Liu,2018
3. Unifying the video and question attentions for open-ended video question answering;Xue;IEEE Transactions on Image Processing,2017
4. Multi-modal dual attention memory for video story question answering;Kim,2018
5. Bilinear attention networks;Kim,2018