1. Bottom-up and top-down attention for image captioning and visual question answering;Anderson,2018
2. Beyond bilinear: generalized multimodal factorized high-order pooling for visual question answering;Yu;IEEE Trans. Neural Netw. Learn. Syst.,2018
3. Interpretable visual question answering by reasoning on dependency trees;Cao;IEEE Trans. Pattern Anal. Mach. Intell.,2019
4. Deep residual learning for image recognition;He,2016
5. Interpretable Machine Learning;Molnar,2020