Affiliation:
1. Bureau d’Economie Théorique et Appliquée (BETA), University of Strasbourg, 67085 Strasbourg, France
2. Department of Economics, Democritus University of Thrace, 69100 Komotini, Greece
Abstract
This study aims to forecast New York and Los Angeles gasoline spot prices on a daily frequency. The dataset includes gasoline prices and a big set of 128 other relevant variables spanning the period from 17 February 2004 to 26 March 2022. These variables were fed to three tree-based machine learning algorithms: decision trees, random forest, and XGBoost. Furthermore, a variable importance measure (VIM) technique was applied to identify and rank the most important explanatory variables. The optimal model, a trained random forest, achieves a mean absolute percent error (MAPE) in the out-of-sample of 3.23% for the New York and 3.78% for the Los Angeles gasoline spot prices. The first lag, AR (1), of gasoline is the most important variable in both markets; the top five variables are all energy-related. This paper can strengthen the understanding of price determinants and has the potential to inform strategic decisions and policy directions within the energy sector, making it a valuable asset for both industry practitioners and policymakers.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献