Toll-based reinforcement learning for efficient equilibria in route choice-Reference-Cited by-同舟云学术

Toll-based reinforcement learning for efficient equilibria in route choice

Published:2020 Issue: Volume:35 Page:
ISSN:0269-8889
Container-title:The Knowledge Engineering Review
language:en
Short-container-title:The Knowledge Engineering Review

Author:

Ramos Gabriel de O.^ORCID,Da Silva Bruno C.,Rădulescu Roxana,Bazzan Ana L. C.,Nowé Ann

Abstract

Abstract The problem of traffic congestion incurs numerous social and economical repercussions and has thus become a central issue in every major city in the world. For this work we look at the transportation domain from a multiagent system perspective, where every driver can be seen as an autonomous decision-making agent. We explore how learning approaches can help achieve an efficient outcome, even when agents interact in a competitive environment for sharing common resources. To this end, we consider the route choice problem, where self-interested drivers need to independently learn which routes minimise their expected travel costs. Such a selfish behaviour results in the so-called user equilibrium, which is inefficient from the system’s perspective. In order to mitigate the impact of selfishness, we present Toll-based Q-learning (TQ-learning, for short). TQ-learning employs the idea of marginal-cost tolling (MCT), where each driver is charged according to the cost it imposes on others. The use of MCT leads agents to behave in a socially desirable way such that the is attainable. In contrast to previous works, however, our tolling scheme is distributed (i.e., each agent can compute its own toll), is charged a posteriori (i.e., at the end of each trip), and is fairer (i.e., agents pay exactly their marginal costs). Additionally, we provide a general formulation of the toll values for univariate, homogeneous polynomial cost functions. We present a theoretical analysis of TQ-learning, proving that it converges to a system-efficient equilibrium (i.e., an equilibrium aligned to the system optimum) in the limit. Furthermore, we perform an extensive empirical evaluation on realistic road networks to support our theoretical findings, showing that TQ-learning indeed converges to the optimum, which translates into a reduction of the congestion levels by 9.1%, on average.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Software

Reference88 articles.

1. Zinkevich, M. 2003. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning, AAAI Press, 928–936.

2. Zhang, J. , Pourazarm, S. , Cassandras, C. G. & Paschalidis, I. C. 2016. The price of anarchy in transportation networks by estimating user cost functions from actual traffic data. In 2016 IEEE 55th Conference on Decision and Control (CDC), IEEE, 789–794.

3. Trial-and-error implementation of marginal-cost pricing on networks in the absence of demand functions

4. Collective intelligence, data routing and Braess’ paradox;Wolpert;Journal of Artificial Intelligence Research,2002

5. Finding theKShortest Loopless Paths in a Network

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multi-objective reinforcement learning based on nonlinear scalarization and long-short-term optimization;Robotic Intelligence and Automation;2024-05-08

2. Multi-objective Reinforcement Learning – Concept, Approaches and Applications;Procedia Computer Science;2023

3. Multi-objective prioritization for data center vulnerability remediation;2022 IEEE Congress on Evolutionary Computation (CEC);2022-07-18

4. A practical guide to multi-objective reinforcement learning and planning;Autonomous Agents and Multi-Agent Systems;2022-04

5. Accelerating route choice learning with experience sharing in a commuting scenario: An agent-based approach;AI Communications;2021-02-15