A Reinforcement Learning Approach for Solving the Mean Variance Customer Portfolio in Partially Observable Models-Reference-Cited by-同舟云学术

A Reinforcement Learning Approach for Solving the Mean Variance Customer Portfolio in Partially Observable Models

Published:2018-12 Issue:08 Volume:27 Page:1850034
ISSN:0218-2130
Container-title:International Journal on Artificial Intelligence Tools
language:en
Short-container-title:Int. J. Artif. Intell. Tools

Author:

Asiain Erick¹,Clempner Julio B.²³^ORCID,Poznyak Alexander S.¹

Affiliation:

1. Department of Control Automatics, Center for Research and Advanced Studies Av. IPN 2508, Col. San Pedro Zacatenco, Mexico City, 07360, Mexico

2. Escuela Superior de Fiisica y Matematicas, Instituto Politecnico Nacional Building 9, Av. Instituto Politécnico Nacional San Pedro Zacatenco, 07738, Gustavo A. Madero, Mexico City, Mexico

3. School of Physics and Mathematics, National Polytechnic Institute, Mexico

Abstract

In problems involving control of financial processes, it is usually complicated to quantify exactly the state variables. It could be expensive to acquire the exact value of a given state, even if it may be physically possible to do so. In such cases it may be interesting to support the decision-making process on inaccurate information pertaining to the system state. In addition, for modeling real-world application, it is necessary to compute the values of the parameters of the environment (transition probabilities and observation probabilities) and the reward functions, which are typically, hand-tuned by experts in the field until it has acquired a satisfactory value. This results in an undesired process. To address these shortcomings, this paper provides a new Reinforcement Learning (RL) framework for computing the mean-variance customer portfolio with transaction costs in controllable Partially Observable Markov Decision Processes (POMDPs). The solution is restricted to finite state, action, observation sets and average reward problems. For solving this problem, a controller/actor-critic architecture is proposed, which balance the difficult tasks of exploitation and exploration of the environment. The architecture consists of three modules: controller, fast-tracked portfolio learning and an actor-critic module. Each module involves the design of a convergent Temporal Difference (TD) learning algorithm. We employ three different learning rules to estimate the real values of: (a) the transition matrices [Formula: see text], (b) the rewards [Formula: see text] and (c) the resources destined for carrying out a promotion [Formula: see text]. We present a proof for the estimated transition matrix rule [Formula: see text] and showing that it converges when t → ∞. For solving the optimization programming problem we extend the c-variable method for partially observable Markov chains. The c-variable is conceptualized as joint strategy given by the product of the control policy, the observation kernel Q(y|s) and the stationary distribution vector. A major advantage of this procedure is that it can be implemented efficiently for real settings in controllable POMDP. A numerical example illustrates the results of the proposed method.

Publisher

World Scientific Pub Co Pte Lt

Subject

Artificial Intelligence,Artificial Intelligence

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0218213018500343

Reference35 articles.

1. State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms

2. A survey of solution techniques for the partially observed Markov decision process

3. The Complexity of Markov Decision Processes

4. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Reinforcement learning for continuous-time mean-variance portfolio selection in a regime-switching market;Journal of Economic Dynamics and Control;2024-01

2. Joint Observer and Mechanism Design;Optimization and Games for Controllable Markov Chains;2023-12-14

3. Partially Observable Markov Chains;Optimization and Games for Controllable Markov Chains;2023-12-14

4. Revealing perceived individuals’ self-interest;Journal of the Operational Research Society;2023-04-05

5. Optimal Constrained Portfolio Analysis for Incomplete Information and Transaction Costs;ECONOMIC COMPUTATION AND ECONOMIC CYBERNETICS STUDIES AND RESEARCH;2022-12-17