Proximal Policy Optimization (PPO)-Based Resource Allocation for Energy Harvesting Industrial Wireless Sensor-Reference-Cited by-同舟云学术

Proximal Policy Optimization (PPO)-Based Resource Allocation for Energy Harvesting Industrial Wireless Sensor

Published:2023-08-18 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Li Rongzhen¹,Xu Lei¹,Tang Chengming¹,Wang Ping²,Liu Wanli³,Gu Junjie¹,Cai Zhicheng¹,Jiang Rui⁴

Affiliation:

1. Nanjing University of Science and Technology

2. Nanyang Technological University

3. Nanjing University of Chinese Medicine

4. Nanjing University of Posts and Telecommunications

Abstract

Abstract For the purpose of overcoming the challenges of charging wireless sensors in the complicated industrial environment, researchers are concentrating more and more on sensor networks that can harvest energy.This paper looks at a wirelessly powered industrial sensor network where each sensor harvests energy from a specific radio frequency (RF) energy source and uses it to transmit data to a receiver.Two working modes are discussed of in this paper.One is the frequency division multiplexing (FDM) working mode, where the sensor simultaneously transmits data over orthogonal frequency bands while harvesting RF energy.Time division multiplexing (TDM), which divides each time slot into two successive intervals, is the second working mode.Data is transmitted and energy is harvested in the same frequency band, but at distinct intervals.Because the channel condition and energy harvesting process are unpredictable, an efficient resource allocation algorithm is required for the sensors.We propose a novel resource allocation algorithm based on reinforcement learning.The proposed algorithm achieves continuous resource allocation and is applicable for continuous states by using Proximal Policy Optimization (PPO).We also utilize entropy regularization, online normalization of state, reward scaling, and advantage normalization to improve the performance of resource allocation algorithm in real-world scenarios.In both FDM and TDM working modes, the proposed algorithm outperforms the greedy algorithm and random algorithm in terms of long-term throughput, according to the results of numerical simulations.

Publisher

Research Square Platform LLC

Reference33 articles.

1. Ciuonzo, Domenico and Gelli, Giacinto and Pescap{\'e}, Antonio and Verde, Francesco (2019) Decision fusion rules in ambient backscatter wireless sensor networks. IEEE, 1--6, 2019 IEEE 30th annual international symposium on personal, indoor and mobile radio communications (PIMRC)

2. Lu, Xiao and Niyato, Dusit and Jiang, Hai and Kim, Dong In and Xiao, Yong and Han, Zhu (2018) Ambient backscatter assisted wireless powered communications. IEEE Wireless Communications 25(2): 170--177 IEEE

3. Ku, Meng-Lin and Li, Wei and Chen, Yan and Liu, KJ Ray (2015) Advances in energy harvesting communications: Past, present, and future challenges. Ieee communications surveys & tutorials 18(2): 1384--1412 IEEE

4. Sudevalayam, Sujesha and Kulkarni, Purushottam (2010) Energy harvesting sensor nodes: Survey and implications. IEEE communications surveys & tutorials 13(3): 443--461 IEEE

5. Lu, Xiao and Wang, Ping and Niyato, Dusit and Kim, Dong In and Han, Zhu (2014) Wireless networks with RF energy harvesting: A contemporary survey. IEEE Communications Surveys & Tutorials 17(2): 757--789 IEEE