Abstract
AbstractWe present a novel online decision-making solution, where the optimal path of a given decision tree is dynamically found based on the contextual bandits analysis. At each round, the learner finds a path in the decision tree by making a sequence of decisions following the tree structure and receives an outcome when a terminal node is reached. At each decision node, the environment information is observed to hint on which child node to visit, resulting in a better outcome. The objective is to learn the context-specific optimal decision for each decision node to maximize the accumulated outcome. In this paper, we propose Dynamic Path Identifier (DPI), a learning algorithm where the contextual bandit is applied to every decision node, and the observed outcome is used as the reward of the previous decisions of the same round. The technical difficulty of DPI is the high exploration challenge caused by the width (i.e., the number of paths) of the tree as well as the large context space. We mathematically prove that DPI’s regret per round approached zero as the number of the rounds approaches infinity. We also prove that the regret is not a function of the number of paths in the tree. Numerical evaluations are provided to complement the theoretical analysis.
Funder
Australian Research Council
University of Sydney
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Hardware and Architecture,Software
Reference42 articles.
1. Magee, J.F.: Decision trees for decision making. Harvard Business Review, Boston (1964)
2. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and regression trees. CRC Press, United States (1984)
3. Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics 21(3), 660–674 (1991)
4. Zhang, S.: Multiple-scale cost sensitive decision tree learning. World Wide Web 21(6), 1787–1800 (2018)
5. Huntley, N., Troffaes, M.: Normal form backward induction for decision trees with coherent lower previsions. Annals of Operations Research, 195 (2011). https://doi.org/10.1007/s10479-011-0968-2