Semi-Markov decision processes with variance minimization criterion

详细信息查看全文

作者：Qingda Wei (1)
Xianping Guo (2)

1. School of Economics and Finance ; Huaqiao University ; Quanzhou聽 ; 362021 ; People鈥檚 Republic of China
2. School of Mathematics and Computational Science ; Sun Yat-Sen University ; Guangzhou聽 ; 510275 ; People鈥檚 Republic of China
关键词：Semi ; Markov decision processes ; State ; dependent discount factors ; Discount optimality equation ; Discount variance minimal policy ; 93E20 ; 90C40
刊名：4OR: A Quarterly Journal of Operation Research
出版年：2015
出版时间：March 2015
年：2015
卷：13
期：1
页码：59-79
全文大小：268 KB
参考文献：1. Bertsekas, DP (2001) Dynamic programming and optimal control. Athena Scientific, Belmont
2. Berument, H, Kilinc, Z, Ozlale, U (2004) The effects of different inflation risk prepius on interest rate spreads. Phys A 333: pp. 317-324 CrossRef
3. Cruz-Su谩rez, D, Montes-de-Oca, R, Salem-Silva, F (2004) Conditions for the uniqueness of optima policies of discounted Markov decision processes. Math Methods Oper Res 60: pp. 415-436 CrossRef
4. Filar, JA, Kallenberg, LCM, Lee, HM (1989) Variance-penalized Markov decision processes. Math Oper Res 14: pp. 147-161 CrossRef
5. Gonz谩lez-Hern谩ndez, J, L贸pez-Mart铆nez, RR, Minj谩rez-Sosa, JA (2008) Adaptive policies for stochastic systems under a randomized cost criterion. Bol Soc Mat Mex 14: pp. 149-163
6. Gonz谩lez-Hern谩ndez, J, L贸pez-Mart铆nez, RR, Minj谩rez-Sosa, JA (2009) Approximation, estimation and control of stochastic systems under randomized discounted cost criterion. Kybernetika 45: pp. 737-754
7. Guo, XP, Yang, J (2008) A new condition and approach for zero-sum stochastic games with average payoffs. Stoch Anal Appl 26: pp. 537-561 CrossRef
8. Guo, XP, Hern谩ndez-Lerma, O (2009) Continuous-time Markov decision processes: theory and applications. Springer, Berlin Heidelberg CrossRef
9. Hern谩ndez-Lerma, O, Lasserre, JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York CrossRef
10. Hern谩ndez-Lerma, O, Lasserre, JB (1999) Further topics on discrete-time Markov control processes. Springer, New York CrossRef
11. Hern谩ndez-Lerma, O, Vega-Amaya, O, Carrasco, G (1999) Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J Control Optim 38: pp. 79-93 CrossRef
12. Hinderer, K (1970) Foundations of non-stationary dynamical programming with discrete time parameter. Springer, New York CrossRef
13. Huang, Y, Kallenberg, LCM (1994) On finding optimal policies for Markov decision chains: a unifying framework for mean-variance-tradeoffs. Math Oper Res 19: pp. 434-448 CrossRef
14. Jaquette, SC (1973) Markov decision processes with a new optimality criterion: discrete time. Ann Stat 1: pp. 496-505 CrossRef
15. Kadota Y, Kurano M, Yasuda M (1995) Discounted Markov decision processes with general utility. In: Proceeding of APORS鈥?94. World Scientific, pp 330鈥?37
16. Kitaev, MY, Rykov, VV (1995) Controlled queueing systems. CRC Press, Florida
17. Newell, RG, Pizer, WA (2003) Discounting the distant future: how much do uncertain rates increase valuation. J Environ Econ Manage 46: pp. 52-71 CrossRef
18. Puterman, ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York CrossRef
19. Sch盲l, M (1975) Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Z Wahrscheinlichkeitstheorie Verw Gebiete 32: pp. 179-196 CrossRef
20. Sobel, MJ (1982) The variance of discounted Markov decision processes. J Appl Probab 19: pp. 794-802 CrossRef
21. Vega-Amaya, O On the regularity property of semi-Markov processes with Borel state spaces. In: Hern谩ndez-Hern谩ndez, D, Minj谩rez-Sosa, JA eds. (2012) Optimization, control, and applications of stochastic systems. Springer, New York, pp. 301-309 CrossRef
22. Wakuta, W (1987) Arbitrary state semi-Markov decision processes with unbounded rewards. Optimization 18: pp. 447-454 CrossRef
23. Wei, QD, Guo, XP (2011) Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper Res Lett 39: pp. 369-374
24. Wei, QD, Guo, XP (2012) New average optimality conditions for semi-Markov decision processes in Borel spaces. J Optim Theory Appl 153: pp. 709-732 CrossRef
25. Zhang, Y (2013) Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors. Top 21: pp. 378-408 CrossRef
26. Zhu, QX, Guo, XP (2007) Markov decision processes with variance minimization: a new condition and approach. Stoch Anal Appl 25: pp. 577-592 CrossRef
刊物类别：Business and Economics
刊物主题：Economics
Operation Research and Decision Theory
Optimization
Industrial and Production Engineering
出版者：Springer Berlin / Heidelberg
ISSN：1614-2411

文摘

We consider a variance minimization problem for semi-Markov decision processes with state-dependent discount factors in Borel spaces. The reward function may be unbounded from below and from above. Under suitable conditions, we first prove that the discount variance minimization criterion can be transformed into an equivalent expected discount criterion, and then show the existence of a discount variance minimal policy over the class of expected discount optimal stationary policies. Furthermore, we also give a value iteration algorithm for calculating the expected discount optimal value function. Finally, two examples are used to illustrate our results.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700