Affiliation:
1. The College of Systems Engineering and the College of Electronic Sciences National University of Defense Technology Changsha Hunan China
2. The College of Systems Engineering National University of Defense Technology Changsha Hunan China
Abstract
AbstractIn cognitive radios, wideband sequential sensing plays an important role, which is able to quickly identify temporary available transmission opportunities by adaptively allocating sensing resources. This paper proposes a Markov decision process for modelling the optimal control of sequential sensing, which provides a general formulation capturing various practical features, including sampling cost, sensing requirement, sensing budget etc. For solving the optimal sensing policy, a model‐augmented deep reinforcement learning algorithm is proposed, which enjoys high learning stability and efficiency, compared to conventional reinforcement learning algorithms.
Funder
National Natural Science Foundation of China
China Postdoctoral Science Foundation
Publisher
Institution of Engineering and Technology (IET)
Subject
Electrical and Electronic Engineering