Abstract
We study countably infinite Markov decision processes (MDPs) with real-valued
transition rewards. Every infinite run induces the following sequences of
payoffs: 1. Point payoff (the sequence of directly seen transition rewards), 2.
Mean payoff (the sequence of the sums of all rewards so far, divided by the
number of steps), and 3. Total payoff (the sequence of the sums of all rewards
so far). For each payoff type, the objective is to maximize the probability
that the $\liminf$ is non-negative. We establish the complete picture of the
strategy complexity of these objectives, i.e., how much memory is necessary and
sufficient for $\varepsilon$-optimal (resp. optimal) strategies. Some cases can
be won with memoryless deterministic strategies, while others require a step
counter, a reward counter, or both.
Publisher
Centre pour la Communication Scientifique Directe (CCSD)
Subject
General Computer Science,Theoretical Computer Science