GUO Xianping, . The Uniquenness of Optimal Poticies for General MDP[J]. Chinese Journal of Applied Probability and Statistics, 1998, 14(3): 258-265.
Citation: GUO Xianping, . The Uniquenness of Optimal Poticies for General MDP[J]. Chinese Journal of Applied Probability and Statistics, 1998, 14(3): 258-265.

The Uniquenness of Optimal Poticies for General MDP

  • For the general MDP model, we prove that:for any convex combination of strategic mea sures class produced by a given randomized history-dependent policy class,there exists a strategic measure produced by a randomized Markov policy, such that the values of average expected cri terion,of discounted criterion and of expected total reward criterion, which correspond to them,are equal, respectively. So we generilizes the corresponding results obtained by E. B. Dynkin and Yushevich 1, M. Puterman 2, E. Fenberg and A. Shwartze 3, It. Strauch4 and Dongzeqing etc 5, respectively. Finaly, we also prove that the optimal policies for average expected criterion,discounted criterion and expected totall reward criterion, are either unique or infinite.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return