A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases

详细信息查看全文

作者：Cao ; Xi-Ren ; Guo ; Xianping
关键词：Policy iteration ; Potentials ; Perturbation analysis ; Performance sensitivity ; Reinforcement learning
刊名：Automatica
出版年：2004
出版时间：October, 2004
年：2004
卷：40
期：10
页码：1749-1759
全文大小：311 K

文摘

We propose a unified framework to Markov decision problems and performance sensitivity analysis for multichain Markov processes with both discounted and average-cost performance criteria. With the fundamental concept of performance potentials, we derive both performance-gradient and performance-difference formulas, which play the central role in performance optimization. The standard policy iteration algorithms for both discounted- and average-reward MDPs can be established using the performance-difference formulas in a simple and intuitive way; and the performance-gradient formulas together with stochastic approximation may lead to new optimization schemes. This sensitivity-based point of view of performance optimization provides some insights that link perturbation analysis, Markov decision processes, and reinforcement learning together. The research is an extension of the previous work on ergodic Markov chains (Cao, Automatica 36 (2000) 771).

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700