摘要
针对传统强化学习算法(如Sarsa算法)收敛速度缓慢的问题,提出了基于模拟退火策略的Sarsa(SA-Sarsa)算法。在策略选择上使用模拟退火策略替代ε-greedy策略,利用退火速率控制算法的收敛速度,有效克服了Sarsa算法直接通过随机数与贪婪值比较选择策略而导致的陷入局部最优解的问题,达到了保证最优解、提高收敛速度的目的。通过迷宫的路径规划问题仿真,将SA-Sarsa算法与Q-Learning和Sarsa两种传统算法进行了对比,实验表明,SA-Sarsa学习算法在取得同等最优解下探索效率高且收敛速度更快。
A Sarsa(SA-Sarsa) algorithm based on simulated annealing strategy proposed in order to solve the problem that the convergence speed of traditional reinforcement learning algorithm(such as Sarsa algorithm) is slow. Simulated annealing strategy was used to controll the convergence speed of SA-Sarsa instead of ε-greedy strategy,which can overcome the disadvantage of failing into the local optimal solution in the original Sarsa algorithm and achieve a faster convergence speed. The SA-Sarsa algorithm was compared with the traditional algorithms of Q-Learning and Sarsa by simulation experiments of maze path planning problem. Experiments show that the SA-Sarsa learning algorithm has higher exploration efficiency and faster convergence speed under the same optimal solution.
引文
[1] R S Sutton,A G Barto.Reinforcement Learning:An Introduction[M].Cambridge:The MIT Press,1998.
[2] R S Sutton.Learning to Predict by the Methods of Temporal Differences[M].Kluwer Academic Publishers,1988.
[3] S Mabu,et al.Genetic Network Programming with Rein-forcement Learning Using Sarsa Algorithm[C].Evolutionary Computation,2006.CEC 2006.IEEE Congress on.IEEE,2006:463-469.
[4] F Wen,X Wang.Sarsa Learning Based Route Guidance System with Global and Local Parameter Strategy[J].Ieice Transactions on Fundamentals of Electronics Communications & Computer Sciences,2015,E98.A(12):2686-2693.
[5] 刘全,翟建伟,章宗长,钟珊,周倩,章鹏,徐进.深度强化学习综述[J].计算机学报,2017.
[6] R S Sutton.Dyna,an integrated architecture for learning,planning,and reacting[J].Acm Sigart Bulletin,1991,2(4):160-163.
[7] 高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100.
[8] 许亚.基于强化学习的移动机器人路径规划研究[D].山东大学,2013.
[9] 黄炳强.强化学习方法及其应用研究[D].上海交通大学,2007.
[10] H V Hasselt,A Guez,D Silver.Deep Reinforcement Learning with Double Q-learning[J].Computer Science,2015.
[11] 马朋委.Q_learning强化学习算法的改进及应用研究[D].安徽理工大学,2016.
[12] R S Sutton.Introduction:The Challenge of Reinforcement Learning[M].MIT Press,1992.
[13] 郭茂祖,等.基于MetrOPOlis准则的Q-学习算法研究[J].计算机研究与发展,2002,39(6):684-688.