基于半马氏的无限阶段指数效用最优模型

The Exponential Utility Optimality for Infinite Horizon Semi-Markov Decision Processes

摘要: 本文考虑半马氏决策过程的指数效用最优问题, 其中状态和行动空间均为Borel集, 报酬函数非负.最优准则是最大化系统无限阶段内获取总报酬指数效用的期望值. 首先,建立标准正则性条件确保状态过程非爆炸,连续--紧条件确保最优策略存在. 其次, 基于这些条件,利用值迭代和嵌入链技术,证明了值函数是相应最优方程的唯一解以及最优策略的存在性. 最后,通过实例展示了如何利用值迭代算法计算值函数和最优策略.

Abstract: This paper concerns the exponential utility maximization problem for semi-Markov decision process with Borel state and action spaces, and nonnegative rewards. The optimal criterion is maximize the expectation of exponential utility of the total rewards in infinite horizon. Under the regular and compactness-continuity conditions, we establish the corresponding optimality equation, and prove the existence of an exponential utility optimal stationary policy by an invariant embedding technique. Moreover, we provide an iterative algorithm for calculating the value function as well as the optimal policies. Finally, we illustrate the computational aspects of an optimal policy with an example.