Affiliation:
1. University of Central Florida
2. Air Force Research Laboratory
Abstract
The formal synthesis of automated or autonomous agents has elicited strong interest from the artificial intelligence community in recent years. This problem space broadly entails the derivation of decision-making policies for agents acting in an environment such that a formal specification of behavior is satisfied. Popular formalisms for such specifications include the quintessential Linear Temporal Logic (LTL) and Computation Tree Logic (CTL) which reason over infinite sequences and trees, respectively, of states. However, the related and relevant problem of reasoning over the frequency with which states are visited infinitely and enforcing behavioral specifications on the same has received little attention. That problem, known as Steady-State Policy Synthesis (SSPS) or steady-state control, is the focus of this paper. Prior related work has been mostly confined to unichain Markov Decision Processes (MDPs), while a tractable solution to the general multichain setting heretofore remains elusive. In this paper, we provide a solution to the latter within the context of multichain MDPs over a class of policies that account for all possible transitions in the given MDP. The solution policy is derived from a novel linear program (LP) that encodes constraints on the limiting distributions of the Markov chain induced by said policy. We establish a one-to-one correspondence between the feasible solutions of the LP and the stationary distributions of the induced Markov chains. The derived policy is shown to maximize the reward among the constrained class of stationary policies and to satisfy the specification constraints even when it does not exercise all possible transitions.
Publisher
International Joint Conferences on Artificial Intelligence Organization
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献