文摘
It is known that the value function of a Markov decision process, as a function of the discount factor URL="/science?_ob=MathURL&_method=retrieve&_eid=1-s2.0-S0167637716300487&_mathId=si2.gif&_user=111111111&_pii=S0167637716300487&_rdoc=1&_issn=01676377&md5=8cd069c5799882edd288853ca003bd08" title="Click to view the MathML source">λ, is the maximum of finitely many rational functions in urce">λ. Moreover, each root of the denominators of the rational functions either lies outside the unit ball in the complex plane, or is a unit root with multiplicity 1. We prove the converse of this result, namely, every function that is the maximum of finitely many rational functions in urce">λ, satisfying the property that each root of the denominators of the rational functions either lies outside the unit ball in the complex plane, or is a unit root with multiplicity 1, is the value function of some Markov decision process. We thereby provide a characterization of the set of value functions of Markov decision processes.