Affiliation:
1. College of Shipbuilding Engineering, Harbin Engineering University, Harbin 150001, China
Abstract
In crowded waters with unknown obstacle motion information, traditional methods often fail to ensure safe and autonomous collision avoidance. To address the challenges of information acquisition and decision delay, this study proposes an optimized autonomous navigation strategy that combines deep reinforcement learning with internal and external rewards. By incorporating random network distillation (RND) with proximal policy optimization (PPO), the interest of autonomous ships in exploring unknown environments is enhanced. Additionally, the proposed approach enables the autonomous generation of intrinsic reward signals for actions. For multi-ship collision avoidance scenarios, an environmental reward is designed based on the International Regulations for Preventing Collision at Sea (COLREGs). This reward system categorizes dynamic obstacles into four collision avoidance situations. The experimental results demonstrate that the proposed algorithm outperforms the popular PPO algorithm by achieving more efficient and safe collision avoidance decision-making in crowded ocean environments with unknown motion information. This research provides a theoretical foundation and serves as a methodological reference for the route deployment of autonomous ships.
Funder
the National Key R&D Program of China
Natural Science Foundation of Heilongjiang Province of China
Subject
Ocean Engineering,Water Science and Technology,Civil and Structural Engineering
Reference35 articles.
1. COVID-19 impact on global maritime mobility;Millefiori;Sci. Rep.,2021
2. The ship maneuverability based collision avoidance dynamic support system in close-quarters situation;Wang;Ocean Eng.,2017
3. EMSA (2021). Annual Overview of Marine Casualties and Incidents, EMSA.
4. Smoothed A* algorithm for practical unmanned surface vehicle path planning;Song;Appl. Ocean Res.,2019
5. Zhang, Z., Wu, D., Gu, J., and Li, F. (2019). A Path-Planning strategy for unmanned surface vehicles based on an adaptive hybrid dynamic stepsize and target attractive force-RRT algorithm. J. Mar. Sci. Eng., 7.