非时齐向量值马氏决策模型

A Non-stationary Discounted Vector-valued Markovian Decision Model with Unbounded Reward

摘要: 章云等2讨论了一类报酬函数绝对平均相对有界的非时齐向量值马氏决策模型（简记为VMDP），得出了一最优策略存在的充分条件，并讨论了强最优和最优的关系，张升等3导出了该模型的几个性质。
本文讨论在满足一类报酬函数绝对平均相对有界条件下的非时齐VMDP，将非时齐标量值马氏决策模型的主要结论（策略是最优策略的充要条件，最优方程，马氏策略优势等）在此作了推广。

Abstract: Zhang Yun et al have discussed a,non-stationary discounted vector-vector-valued Markovian decision model under the absolute average relatively bounded vector-valued reward function （Abbreviated as VMDP）, they get a sufficients condition for the existence of optimal policy. The relation between strong optimal policy and optimal policy have also discussed by them.Zhang Sheng et al have induced out some properties of the model.
In this paper, the non-stationary discounted VMDP is investigated continually, the major results in scaler-valued Markovian decision model （the necessary and sufficient coindition for a policy to be optimal policy, optimal equation,the dominating property of Markovian policy et al） are extended here.