Improving visual question answering for bridge inspection by pre‐training with external data of image

Improving visual question answering for bridge inspection by pre‐training with external data of image–text pairs

Published:2023-08-18 Issue:3 Volume:39 Page:345-361
ISSN:1093-9687
Container-title:Computer-Aided Civil and Infrastructure Engineering
language:en
Short-container-title:Computer aided Civil Eng

Author:

Kunlamai Thannarot¹,Yamane Tatsuro²,Suganuma Masanori¹³,Chun Pang‐Jo²,Okatani Takayaki¹³

Affiliation:

1. Graduate School of Information Sciences Tohoku University Miyagi Japan

2. Department of Civil Engineering The University of Tokyo Tokyo Japan

3. Center for Advanced Intelligence Projects RIKEN Miyagi Japan

Abstract

AbstractThis paper explores the application of visual question answering (VQA) in bridge inspection using recent advancements in multimodal artificial intelligence (AI) systems. VQA involves an AI model providing natural language answers to questions about the content of an input image. However, applying VQA to bridge inspection poses challenges due to the high cost of creating training data that requires expert knowledge. To address this, we propose leveraging existing bridge inspection reports, which already include image–text pairs, as external knowledge to enhance VQA performance. Our approach involves training the model on a large collection of image–text pairs, followed by fine‐tuning it on a limited amount of training data specifically designed for the VQA task. The results demonstrate a significant improvement in VQA accuracy using this approach. These findings highlight the potential of AI models for VQA as valuable tools for assessing the condition of bridges.

Publisher

Wiley

Subject

Computational Theory and Mathematics,Computer Graphics and Computer-Aided Design,Computer Science Applications,Civil and Structural Engineering,Building and Construction

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1111/mice.13086

Reference67 articles.

1. Neural Networks in Civil Engineering: 1989–2000

2. Feature extraction and classification techniques for health monitoring of structures;Amezquita‐Sanchez J.;Scientia Iranica,2015

3. Anderson P. He X. Buehler C. Teney D. Johnson M. Gould S. &Zhang L.(2018).Bottom‐up and topdown attention for image captioning and visual question answering.Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Salt Lake City UT (pp.6077–6086).

4. Antol S. Agrawal A. Lu J. Mitchell M. Batra D. Zitnick C. L. &Parikh D.(2015).VQA: Visual question answering.Proceedings of the IEEE International Conference on Computer Vision Santiago Chile (pp.2425–2433).

5. VLMo: Unified vision‐language pre‐training with mixture‐of‐modalityexperts;Bao H.;Advances in Neural Information Processing Systems,2022

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Implementation of explanatory texts output for bridge damage in a bridge inspection web system;Advances in Engineering Software;2024-09

2. Deep learning-based corrosion inspection of long-span bridges with BIM integration;Heliyon;2024-08

3. Self‐training with Bayesian neural networks and spatial priors for unsupervised domain adaptation in crack segmentation;Computer-Aided Civil and Infrastructure Engineering;2024-07-29

4. Vision transformer-based visual language understanding of the construction process;Alexandria Engineering Journal;2024-07

5. Beyond chat-GPT: a BERT-AO approach to custom question answering system;Multimedia Tools and Applications;2024-06-03