The Uniquenness of Optimal Poticies for General MDP
Graphical Abstract
For the general MDP model, we prove that:for any convex combination of strategic mea sures class produced by a given randomized history-dependent policy class,there exists a strategic measure produced by a randomized Markov policy, such that the values of average expected cri terion,of discounted criterion and of expected total reward criterion, which correspond to them,are equal, respectively. So we generilizes the corresponding results obtained by E. B. Dynkin and Yushevich 1, M. Puterman 2, E. Fenberg and A. Shwartze 3, It. Strauch4 and Dongzeqing etc 5, respectively. Finaly, we also prove that the optimal policies for average expected criterion,discounted criterion and expected totall reward criterion, are either unique or infinite.