The Uniquenness of Optimal Poticies for General MDP

GUO Xianping. The Uniquenness of Optimal Poticies for General MDPJ. Chinese Journal of Applied Probability and Statistics, 1998, 14(3): 258-265.

Citation:

GUO Xianping. The Uniquenness of Optimal Poticies for General MDPJ. Chinese Journal of Applied Probability and Statistics, 1998, 14(3): 258-265.

Citation:

GUO Xianping. The Uniquenness of Optimal Poticies for General MDPJ. Chinese Journal of Applied Probability and Statistics, 1998, 14(3): 258-265.

Abstract

For the general MDP model, we prove that:for any convex combination of strategic mea sures class produced by a given randomized history-dependent policy class,there exists a strategic measure produced by a randomized Markov policy, such that the values of average expected cri terion,of discounted criterion and of expected total reward criterion, which correspond to them,are equal, respectively. So we generilizes the corresponding results obtained by E. B. Dynkin and Yushevich 1, M. Puterman 2, E. Fenberg and A. Shwartze 3, It. Strauch4 and Dongzeqing etc 5, respectively. Finaly, we also prove that the optimal policies for average expected criterion,discounted criterion and expected totall reward criterion, are either unique or infinite.

FullText(HTML)

Turn off MathJax

Article Contents

Export File