1. Zhang C, Lu Y (2021) Study on artificial intelligence: the state of the art and future prospects. J Ind Inf Integr 23:100224
2. Sharma H, Srivastava S (2021) Visual question-answering model based on the fusion of multimodal features by a two-way co-attention mechanism. Imaging Sci J 69(1–4):177–189
3. Bhatt D, Patel C, Talsania H, Patel J, Vaghela R, Pandya S, ..., Ghayvat H (2021) CNN variants for computer vision: history, architecture, application, challenges, and future scope. Electronics 10(20):2470
4. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, Montreal Convention Center, Montreal, Canada, December 7–10
5. Schwartz I, Schwing A, Hazan T (2017) High-order attention models for visual question answering. Advances in Neural Information Processing Systems, 30. Long Beach, California, USA, December 4–9, 3667–3677