报酬无界的连续时间折扣马氏决策规划

Continuous Time Markov Decision Processes with Unbounded Rewards under the Discounted Criterion

  • 摘要: 本文讨论报酬函数无界,转移速率族一致有界,状态空间和行动集均可数的连续时间折扣马氏决策规划(CTMDP).文中引入了一类新的无界报酬函数,并在一新的马氏策略类中,证明了有界报酬下成立的所有结果;讨论了最优策略的结构,得到了该模型策略为最优的一个充要条件。

     

    Abstract: This paper investigates the continuous time Markov decision processes with discounted criterion.Here, the state spacc and the action set are countable, the reward functions are unbounded,and the transition rates are uniformly bounded. A new condition about the unbounded rewards ispresented. In a new set of Markov policies, what is true under bounded rewards has been provedis eaually ture under unbounded rewards. Through the study of the intrinsic structures of optimalplicies, a condition necessary and sulflicient for optinal policies is first worked out.

     

/

返回文章
返回