Multimodal Classification of Safety-Report Observations-Reference-Cited by-同舟云学术

Multimodal Classification of Safety-Report Observations

Published:2022-06-07 Issue:12 Volume:12 Page:5781
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Paraskevopoulos Georgios^ORCID,Pistofidis Petros^ORCID,Banoutsos Georgios,Georgiou Efthymios^ORCID,Katsouros Vassilis^ORCID

Abstract

Modern businesses are obligated to conform to regulations to prevent physical injuries and ill health for anyone present on a site under their responsibility, such as customers, employees and visitors. Safety officers (SOs) are engineers, who perform site audits to businesses, record observations regarding possible safety issues and make appropriate recommendations. In this work, we develop a multimodal machine-learning architecture for the analysis and categorization of safety observations, given textual descriptions and images taken from the location sites. For this, we utilize a new multimodal dataset, Safety4All, which contains 5344 safety-related observations created by 86 SOs in 486 sites. An observation consists of a short issue description, written by the SOs, accompanied with images where the issue is shown, relevant metadata and a priority score. Our proposed architecture is based on the joint fine tuning of large pretrained language and image neural network models. Specifically, we propose the use of a joint task and contrastive loss, which aligns the text and vision representations in a joint multimodal space. The contrastive loss ensures that inter-modality representation distances are maintained, so that vision and language representations for similar samples are close in the shared multimodal space. We evaluate the proposed model on three tasks, namely, priority classification of input observations, observation assessment and observation categorization. Our experiments show that inspection scene images and textual descriptions provide complementary information, signifying the importance of both modalities. Furthermore, the use of the joint contrastive loss produces strong multimodal representations and outperforms a baseline simple model in tasks fusion. In addition, we train and release a large transformer-based language model for the Greek language based on the Electra architecture.

Funder

European Regional Development Fund

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/12/5781/pdf

Reference99 articles.

1. A user-centered information and communication technology (ICT) tool to improve safety inspections

2. Analysis of Recommendations from Mining Incident Investigative Reports: A 50-Year Review

3. Importance Degree Research of Safety Risk Management Processes of Urban Rail Transit Based on Text Mining Method

4. Analyzing Arizona OSHA Injury Reports Using Unsupervised Machine Learning

5. Sectoral patterns of accident process for occupational safety using narrative texts of OSHA database

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Navigating the Multimodal Landscape: A Review on Integration of Text and Image Data in Machine Learning Architectures;Machine Learning and Knowledge Extraction;2024-07-09

2. Scoping Review on Image-Text Multimodal Machine Learning Models;2023 International Conference on Computational Science and Computational Intelligence (CSCI);2023-12-13

3. Harnessing the Multimodal Data Integration and Deep Learning for Occupational Injury Severity Prediction;IEEE Access;2023

4. A Systematic Literature Review on Multimodal Machine Learning: Applications, Challenges, Gaps and Future Directions;IEEE Access;2023