Affiliation:
1. The 54th Research Institute of China Electronics Technology Group Corporation, Shijiazhuang 050081, China
2. School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150080, China
Abstract
In light of the increasing scarcity of frequency spectrum resources for satellite communication systems based on the transparent transponder, fast and efficient satellite resource allocation algorithms have become key to improving the overall resource occupancy. In this paper, we propose a reinforcement learning-based Multi-Branch Deep Q-Network (MBDQN), which introduces TL-Branch and RP-Branch to extract features of satellite resource pool state and task state simultaneously, and Value-Branch to calculate the action-value function. On the one hand, MBDQN improves the average resource occupancy performance (AOP) through the selection of multiple actions, including task selection and resource priority actions. On the other hand, the trained MBDQN is more suitable for online deployment and significantly reduces the runtime overhead due to the fact that MBDQN does not need iteration in the test phase. Experiments on both non-zero waste and zero waste datasets demonstrate that our proposed method achieves superior performance compared to the greedy or heuristic methods on the generated task datasets.
Funder
National Natural Science Foundation of China
Natural Science Foundation for Outstanding Young Scholars of Heilongjiang Province
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering