Abstract
Deep Reinforcement Learning (DRL) is a promising data-driven approach for traffic signal control, especially because DRL can learn to adapt to varying traffic demands. For that, DRL agents maximize a scalar reward by interacting with an environment. However, one needs to formulate a suitable reward, aligning agent behavior and user objectives, which is an open research problem. We investigate this problem in the context of traffic signal control with the objective of minimizing CO2 emissions at intersections. Because CO2 emissions can be affected by multiple factors outside the agent’s control, it is unclear if an emission-based metric works well as a reward, or if a proxy reward is needed. To obtain a suitable reward, we evaluate various rewards and combinations of rewards. For each reward, we train a Deep Q-Network (DQN) on homogeneous and heterogeneous traffic scenarios. We use the SUMO (Simulation of Urban MObility) simulator and its default emission model to monitor the agent’s performance on the specified rewards and CO2 emission. Our experiments show that a CO2 emission-based reward is inefficient for training a DQN, the agent’s performance is sensitive to variations in the parameters of combined rewards, and some reward formulations do not work equally well in different scenarios. Based on these results, we identify desirable reward properties that have implications to reward design for reinforcement learning-based traffic signal control.
Funder
Hasso-Plattner-Institut, Universität Potsdam
Reference47 articles.
1. C. Louw, L. Labuschagne, and T. Woodley, “A comparison of reinforcement learning agents applied to traffic signal optimisation,” in SUMO Conference Proceedings, vol. 3, 2022, pp. 15–43.
2. H. Wei, G. Zheng, V. Gayah, and Z. Li, “Recent advances in reinforcement learning for traffic signal control: A survey of models and evaluation,” ACM SIGKDD Explorations Newsletter, vol. 22, no. 2, pp. 12–18, 2021, Publisher: ACM New York, NY, USA.
3. A. Haydari and Y. Yilmaz, “Deep Reinforcement Learning for Intelligent Transportation Systems: A Survey,” IEEE Transactions on Intelligent Transportation Systems, pp. 1–22, 2020, ISSN: 1558-0016. DOI: https://www.doi.org/10.1109/TITS.2020.3008612.
4. H. Wei, C. Chen, G. Zheng, et al., “Presslight: Learning max pressure control to coordinate traffic signals in arterial network,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, pp. 1290–1298.
5. G. Zheng, X. Zang, N. Xu, et al., “Diagnosing reinforcement learning for traffic signal control,” arXiv preprint arXiv:1905.04716, 2019.