Abstract:
Zhang Yun et al have discussed a,non-stationary discounted vector-vector-valued Markovian decision model under the absolute average relatively bounded vector-valued reward function (Abbreviated as VMDP), they get a sufficients condition for the existence of optimal policy. The relation between strong optimal policy and optimal policy have also discussed by them.Zhang Sheng et al have induced out some properties of the model.
In this paper, the non-stationary discounted VMDP is investigated continually, the major results in scaler-valued Markovian decision model (the necessary and sufficient coindition for a policy to be optimal policy, optimal equation,the dominating property of Markovian policy et al) are extended here.