Research on the Visual Perception of Ship Engine Rooms Based on Deep Learning-Reference-Cited by-同舟云学术

Research on the Visual Perception of Ship Engine Rooms Based on Deep Learning

Published:2023-07-20 Issue:7 Volume:11 Page:1450
ISSN:2077-1312
Container-title:Journal of Marine Science and Engineering
language:en
Short-container-title:JMSE

Author:

Wang Yongkang¹,Zhang Jundong¹,Zhu Jinting¹^ORCID,Ge Yuequn¹,Zhai Guanyu¹

Affiliation:

1. College of Marine Engineering, Dalian Maritime University, Dalian 116026, China

Abstract

In the intelligent engine room, the visual perception of ship engine room equipment is the premise of defect identification and the replacement of manual operation. This paper improves YOLOv5 for the problems of mutual occlusion of cabin equipment, an unbalanced number of different categories, and a large proportion of small targets. First, a coordinate attention (CA) mechanism is introduced into the backbone-extraction network to improve the ability of the network to extract main features. Secondly, this paper improves the neck network so that the network can learn a relatively important resolution for feature-fusion and enrich the semantic information between different layers. At the same time, this paper uses the Swin transformer as the prediction head (SPH). This enables the network to establish global connections in complex environments, which can improve detection accuracy. In order to solve the problem of cabin equipment covering each other, this paper replaces the original non-maxima suppression (NMS) with Soft-NMS. Finally, this paper uses the K-means algorithm based on the genetic algorithm to cluster new anchor boxes to match the dataset better. This paper is evaluated on the laboratory’s engine room equipment dataset (EMER) and the public dataset PASCAL VOC. Compared with YOLOv5m, the mAP of CBS-YOLOv5m increased by 3.34% and 1.8%, respectively.

Funder

High-technology Ship Research Program

Publisher

MDPI AG

Subject

Ocean Engineering,Water Science and Technology,Civil and Structural Engineering

Link

https://www.mdpi.com/2077-1312/11/7/1450/pdf

Reference37 articles.

1. Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (2017). mixup: Beyond Empirical Risk Minimization. arXiv.

2. Bochkovskiy, J., Wang, C.Y., and Liao, H.Y.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv.

3. Dalal, N., and Triggs, B. (2005, January 20–25). Histograms of oriented gradients for human detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA.

4. Distinctive image features from scale-invariant keypoints;Lowe;Int. J. Comput. Vis.,2004

5. Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23–28). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.