A Method for High-Value Driving Demonstration Data Generation Based on One-Dimensional Deep Convolutional Generative Adversarial Networks-Reference-Cited by-同舟云学术

A Method for High-Value Driving Demonstration Data Generation Based on One-Dimensional Deep Convolutional Generative Adversarial Networks

Published:2022-10-31 Issue:21 Volume:11 Page:3553
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Wu Yukun^ORCID,Wu Xuncheng,Qiu Siyuan,Xiang Wenbin^ORCID

Abstract

As a promising sequential decision-making algorithm, deep reinforcement learning (RL) has been applied in many fields. However, the related methods often demand a large amount of time before they can achieve acceptable performance. While learning from demonstration has greatly improved reinforcement learning efficiency, it poses some challenges. In the past, it has required collecting demonstration data from controllers (either human or controller). However, demonstration data are not always available in some sparse reward tasks. Most importantly, there exist unknown differences between agents and human experts in observing the environment. This means that not all of the human expert’s demonstration data conform to a Markov decision process (MDP). In this paper, a method of reinforcement learning from generated data (RLfGD) is presented, and consists of a generative model and a learning model. The generative model introduces a method to generate the demonstration data with a one-dimensional deep convolutional generative adversarial network. The learning model applies the demonstration data to the reinforcement learning process to greatly improve the effectiveness of training. Two complex traffic scenarios were tested to evaluate the proposed algorithm. The experimental results demonstrate that RLfGD is capable of obtaining higher scores more quickly than DDQN in both of two complex traffic scenarios. The performance of reinforcement learning algorithms can be greatly improved with this approach to sparse reward problems.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/11/21/3553/pdf

Reference48 articles.

1. Human-level control through deep reinforcement learning;Mnih;Nature,2015

2. Model-Free Reinforcement Learning of Impedance Control in Stochastic Environments;Stulp;IEEE Trans. Auton. Ment. Dev.,2012

3. Optimizing Attention for Sequence Modeling via Reinforcement Learning;Fei;IEEE Trans. Neural Netw. Learn. Syst.,2022

4. Savari, M., and Choe, Y. Utilizing Human Feedback in Autonomous Driving: Discrete vs. Continuous. Machines, 2022. 10.

5. Liu, Y., Liu, G., Wu, Y., He, W., Zhang, Y., and Chen, Z. Reinforcement-Learning-Based Decision and Control for Autonomous Vehicle at Two-Way Single-Lane Unsignalized Intersection. Electronics, 2022. 11.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Task-Driven Controllable Scenario Generation Framework Based on AOG;IEEE Transactions on Intelligent Transportation Systems;2024-06

2. Mean Field Multi-Agent Reinforcement Learning Method for Area Traffic Signal Control;Electronics;2023-11-17