E2HRL: An Energy-efficient Hardware Accelerator for Hierarchical Deep Reinforcement Learning-Reference-Cited by-同舟云学术

E2HRL: An Energy-efficient Hardware Accelerator for Hierarchical Deep Reinforcement Learning

Published:2022-09-21 Issue:5 Volume:27 Page:1-19
ISSN:1084-4309
Container-title:ACM Transactions on Design Automation of Electronic Systems
language:en
Short-container-title:ACM Trans. Des. Autom. Electron. Syst.

Author:

Shiri Aidin¹^ORCID,Kallakuri Uttej¹,Rashid Hasib-Al¹,Prakash Bharat¹,Waytowich Nicholas R.²,Oates Tim¹,Mohsenin Tinoosh¹

Affiliation:

1. University of Maryland Baltimore County, USA

2. Army Research Laboratory, USA

Abstract

Recently, Reinforcement Learning (RL) has shown great performance in solving sequential decision-making and control in dynamic environment problems. Despite its achievements, deploying Deep Neural Network (DNN)-based RL is expensive in terms of time and power due to the large number of episodes required to train agents with high dimensional image representations. Additionally, at the interference the large energy footprint of deep neural networks can be a major drawback. Embedded edge devices as the main platform for deploying RL applications are intrinsically resource-constrained and deploying deep neural network-based RL on them is a challenging task. As a result, reducing the number of actions taken by the RL agent to learn desired policy, along with the energy-efficient deployment of RL, is crucial. In this article, we propose Energy Efficient Hierarchical Reinforcement Learning (E2HRL), which is a scalable hardware architecture for RL applications. E2HRL utilizes a cross-layer design methodology for achieving better energy efficiency, smaller model size, higher accuracy, and system integration at the software and hardware layers. Our proposed model for RL agent is designed based on the learning hierarchical policies, which makes the network architecture more efficient for implementation on mobile devices. We evaluated our model in three different RL environments with different level of complexity. Simulation results with our analysis illustrate that hierarchical policy learning with several levels of control improves RL agents training efficiency and the agent learns the desired policy faster compared to a non-hierarchical model. This improvement is specifically more observable as the environment or the task becomes more complex with multiple objective subgoals. We tested our model with different hyperparameters to achieve the maximum reward by the RL agent while minimizing the model size, parameters, and required number of operations. E2HRL model enables efficient deployment of RL agent on resource-constraint-embedded devices with the proposed custom hardware architecture that is scalable and fully parameterized with respect to the number of input channels, filter size, and depth. The number of processing engines (PE) in the proposed hardware can vary between 1 to 8, which provides the flexibility of tradeoff of different factors such as latency, throughput, power, and energy efficiency. By performing a systematic hardware parameter analysis and design space exploration, we implemented the most energy-efficient hardware architectures of E2HRL on Xilinx Artix-7 FPGA and NVIDIA Jetson TX2. Comparing the implementation results shows Jetson TX2 boards achieve 0.1 ∼ 1.3 GOP/S/W energy efficiency while Artix-7 FPGA achieves 1.1 ∼ 11.4 GOP/S/W, which denotes 8.8× ∼ 11× better energy efficiency of E2HRL when model is implemented on FPGA. Additionally, compared to similar works our design shows better performance and energy efficiency.

Funder

U.S. Army Research Laboratory

Publisher

Association for Computing Machinery (ACM)

Subject

Electrical and Electronic Engineering,Computer Graphics and Computer-Aided Design,Computer Science Applications

Link

https://dl.acm.org/doi/pdf/10.1145/3498327

Reference45 articles.

1. NVIDIA. 2020. NVIDIA jetson TX2. Retrieved from https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-tx2/.

2. Jacob Andreas, Dan Klein, and Sergey Levine. 2017. Modular multitask reinforcement learning with policy sketches. In Proceedings of the International Conference on Machine Learning. PMLR, 166–175.

3. The Option-Critic Architecture

4. Maxime Chevalier-Boisvert. 2018. Gym-MiniWorld environment for OpenAI Gym. Retrieved from https://github.com/maximecb/gym-miniworld.

5. FA3C

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Towards an international regulatory framework for AI safety: lessons from the IAEA’s nuclear safety regulations;Humanities and Social Sciences Communications;2024-04-12

2. Deploying Deep Reinforcement Learning Systems: A Taxonomy of Challenges;2023 IEEE International Conference on Software Maintenance and Evolution (ICSME);2023-10-01