Affiliation:
1. Northwestern Polytechnical University, China
2. Peking University, China
3. Télécom SudParis, France
4. Harbin Engineering University; Northwestern Polytechnical University, China
Abstract
As an emerging mobility-on-demand service, bike-sharing system (BSS) has spread all over the world by providing a flexible, cost-efficient, and environment-friendly transportation mode for citizens. Demand-supply unbalance is one of the main challenges in BSS because of the inefficiency of the existing bike repositioning strategy, which reallocates bikes according to a pre-defined periodic schedule without considering the highly dynamic user demands. While reinforcement learning has been used in some repositioning problems for mitigating demand-supply unbalance, there are significant barriers when extending it to BSS due to the dimension curse of action space resulting from the dynamic number of workers and bikes in the city. In this paper, we study these barriers and address them by proposing a novel bike repositioning system, namely BikeBrain, which consists of a demand prediction model and a spatio-temporal bike repositioning algorithm. Specifically, to obtain accurate and real-time usage demand for efficient bike repositioning, we first present a prediction model ST-NetPre, which directly predicts user demand considering the highly dynamic spatio-temporal characteristics. Furthermore, we propose a spatio-temporal cooperative multi-agent reinforcement learning method (ST-CBR) for learning the worker-based bike repositioning strategy in which each worker in BSS is considered an agent. Especially, ST-CBR adopts the centralized learning and decentralized execution way to achieve effective cooperation among large-scale dynamic agents based on Mean Field Reinforcement Learning (MFRL), while avoiding the huge dimension of action space. For dynamic action space, ST-CBR utilizes a SoftMax selector to select the specific action. Meanwhile, for the benefits and costs of agents’ operation, an efficient reward function is designed to seek an optimal control policy considering both immediate and future rewards. Extensive experiments are conducted based on large-scale real-world datasets, and the results have shown significant improvements of our proposed method over several state-of-the-art baselines on the demand-supply gap and operation cost measures.
Publisher
Association for Computing Machinery (ACM)
Reference52 articles.
1. Maruan Al-Shedivat Trapit Bansal Yuri Burda Ilya Sutskever Igor Mordatch and Pieter Abbeel. 2017. Continuous adaptation via meta-learning in nonstationary and competitive environments. arXiv preprint arXiv:1710.03641(2017). Maruan Al-Shedivat Trapit Bansal Yuri Burda Ilya Sutskever Igor Mordatch and Pieter Abbeel. 2017. Continuous adaptation via meta-learning in nonstationary and competitive environments. arXiv preprint arXiv:1710.03641(2017).
2. Survey of Deep Reinforcement Learning for Motion Planning of Autonomous Vehicles
3. Bike flow prediction with multi-graph convolutional networks
4. Dynamic Bicycle Dispatching of Dockless Public Bicycle-sharing Systems Using Multi-objective Reinforcement Learning
5. Longbiao Chen Zhihan Jiang Jiangtao Wang and Yasha Wang. 2019. Data-Driven Bike Sharing System Optimization: State of the Art and Future Opportunities.. In EWSN. 347–350. Longbiao Chen Zhihan Jiang Jiangtao Wang and Yasha Wang. 2019. Data-Driven Bike Sharing System Optimization: State of the Art and Future Opportunities.. In EWSN. 347–350.