吴晓, 郭圳滨. 可变折扣马氏决策过程首达模型列的收敛问题[J]. 应用概率统计, 2021, 37(6): 598-610. DOI: 10.3969/j.issn.1001-4268.2021.06.004
引用本文: 吴晓, 郭圳滨. 可变折扣马氏决策过程首达模型列的收敛问题[J]. 应用概率统计, 2021, 37(6): 598-610. DOI: 10.3969/j.issn.1001-4268.2021.06.004
WU Xiao, GUO Zhenbin. Convergence Problem of a Sequence of First Passage Markov Decision Processes with Varying Discount Factors[J]. Chinese Journal of Applied Probability and Statistics, 2021, 37(6): 598-610. DOI: 10.3969/j.issn.1001-4268.2021.06.004
Citation: WU Xiao, GUO Zhenbin. Convergence Problem of a Sequence of First Passage Markov Decision Processes with Varying Discount Factors[J]. Chinese Journal of Applied Probability and Statistics, 2021, 37(6): 598-610. DOI: 10.3969/j.issn.1001-4268.2021.06.004

可变折扣马氏决策过程首达模型列的收敛问题

Convergence Problem of a Sequence of First Passage Markov Decision Processes with Varying Discount Factors

  • 摘要: 本文主要研究了可数状态空间上带多约束、可变折扣马氏决策过程首达模型序列的收敛问题. 利用``占有测度''及其相关性质,将受约束首达模型序列的优化问题转化为等价的受约束线性规划问题(凸分析方法),在合适条件下证明了首达模型序列的最优值和最优策略收敛于``极限''模型的最优值和最优策略.

     

    Abstract: In this paper, we study the convergence problem of a sequence of first passage Markov decision processes with constraints and varying discount factors. Using the ``occupation measures'' and its related properties, we transform the constrained optimality problems into linear programming problems on the set of occupation measures (i.e., the convex analytic approach), and then prove that the optimal values and optimal policies of the original first passage Markov decision processes converge respectively to those of the ``limit'' one.

     

/

返回文章
返回