基于强化学习的移动机器人路径规划研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

基于强化学习的移动机器人路径规划研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Path Planning for Mobile Robot Based on Reinforcement Learning
作者：许亚
论文级别：硕士
学科专业名称：控制科学与工程
中文关键词：移动机器人 ; 路径规划 ; 强化学习算法 ; 最优路径 ; 避障
英文关键词：mobile robot ; path planning ; reinforcement learning algorithm ; optimal
英文关键词：path ; obstacle avoidance
学位年度：2013
导师：马昕
学科代码：0811
学位授予单位：山东大学
论文提交日期：2013-04-18
答辩委员会主席：鲁守银

摘要

随着机器人技术的发展,机器人已开始应用到未知环境,与已知环境下的移动机器人路径规划研究相比,对于未知环境的探索带来了新的挑战。由于在未知环境下,机器人不具有环境的先验知识,移动机器人在路径规划过程中不可避免的会遇到各式各样的障碍物,因此,研究具有灵活规划和避障功能的移动机器人及其在未知环境下的路径规划具有非常重要的实际意义。
     本文以移动机器人在未知环境探索中的路径规划为研究背景,利用强化学习算法实现机器人的路径规划。原有的强化学习算法Q-learning算法和Q(λ)算法可以实现移动机器人的路径规划,但是在较大环境和复杂的环境下,这两种算法很难达到理想的效果,其最大的缺陷就是学习时间长、收敛速度慢。为了解决这些问题,本文提出了单链序贯回溯Q-learning算法,在学习过程中建立状态链,通过回溯的思想改善Q-learning学习中数据传递的滞后性,使当前状态的动作决策能够快速的受到后续动作决策的影响,并应用到单个机器人和多个机器人在未知环境下的路径规划中,解决学习速度慢的问题以及机器人的避障和避碰问题,使移动机器人能够快速有效的找到一条最优的路径,并通过仿真实验验证了算法的有效性。
     文章首先分析了移动机器人路径规划的研究背景和意义,综述了目前移动机器人路径规划技术的国内外研究现状以及存在的主要问题,并简单介绍了本论文的主要内容和章节框架。
     其次,介绍了移动机器人路径规划技术的主要类型,并对全局的路径规划算法和局部的路径规划算法进行了详细阐述；针对本文采用的强化学习算法,这部分详细介绍了强化学习算法的研究现状和发展趋势以及存在的问题,对强化学习算法的基本概念、原理和方法进行了说明,并描述了该算法在路径规划中的应用。
     第三,针对目前路径规划领域应用广泛的Q-learning算法和Q(λ)算法学习时间长、收敛速度慢、难以应用到较大较复杂环境的缺陷,提出了利用回溯的思想进行状态数据更新的高性能算法---单链序贯回溯Q-learning算法应用到移动机器人在复杂环境下的路径规划,通过在不同大小不同复杂程度的环境下的实验,验证了该算法的快速收敛性和大环境下的实用性,为移动机器人路径规划问题提供了一种崭新的方法。
     第四,以多个移动机器人系统为研究对象,利用提出的高性能的强化学习算法,通过机器人之间在不确定环境下的学习策略解决探索过程中的路径规划问题,实现每个机器人的避障和机器人之间的冲突问题,提高到达目标点的效率。
     最后,对本论文所做工作进行总结,并提出了下一步的研究方向。
With the development of robot technology, the robot has begun to be applied to the unknown environment now, compared with the research on the path planning in the known environment, the unknown environment brings new challenges to the path planning of environment exploration for mobile robot. Ineluctably, mobile robot will encounter a variety of obstacles when exploring because there is no prior knowledge of the environment for robot. Therefore, the mobile robot which can obstacle avoidance and has a flexible planning in an unknown environment has a very important practical significance.
     In this paper, we use reinforcement learning algorithm to study the path planning for mobile robot based on the exploration research in unknown environment. The reinforcement learning algorithm Q-learning algorithm and algorithm can achieve mobile robot path planning, but these two algorithms are difficult to achieve the desired results, especially in the large and complex environment, the biggest drawbacks of which are the long time to learn and slow rates of convergence. Aiming at the problem of slow convergence and long learning time for Q-learning based mobile robot path planning, a state-chain sequential feedback Q-learning algorithm based on the idea of backtracking is proposed in this paper for quickly searching for the optimal path of mobile robots in complex unknown static environments. The state chain is built during the searching process. After one action is chosen and the reward is received, the Q-values of the state-action pairs on the previously built state chain are sequentially updated with one-step Q-learning. With the proposed algorithm, the single robot can solve the problem of obstacle avoidance and multiple robots can solve the collision avoidance during the path planning in unknown environment. Extensive simulations validate the efficiency of the newly proposed approach for mobile robot path planning in complex environments. The results show that the new approach has a high convergence speed and that the robot can find the collision-free optimal path in complex unknown static environments with much shorter time.
     Firstly, the paper analyzes the research background and significance of the mobile robot path planning, sum up the research and development of the mobile robot path planning at home and abroad, as well as the main problems. Then the main content and chapters framework of this paper are described brief.
     Secondly, this part introduces the main type of mobile robot path planning technology, and present the global path planning algorithm and local path planning algorithm in detailed; as for the reinforcement learning algorithm, this section introduces the research, development trend and the existence of the problem of the reinforcement learning algorithm. Besides, the basic concepts, principles, methods and application of reinforcement learning algorithm are described in this part.
     The third part aims at the problem of long learning time, slow convergence and difficulty to apply to the larger, more complex environment for Q-learning algorithm and Q(λ) algorithm based mobile robot path planning, a state-chain sequential feedback Q-learning algorithm based on the idea of using backtracking to update the state data is proposed for quickly searching for the optimal path of mobile robots in complex unknown static environments. The extensive simulations of different environment show that the new approach has a high convergence speed and that the robot can find the collision-free optimal path in complex unknown static environments with much shorter time state chain is built during the searching process. so it provides a new method for mobile robot path planning.
     The fourth part studys the multiple mobile robot system based on the proposed high-performance reinforcement learning algorithm, each robot can solve the problems of avoidance and collision with other robots by learning exploring strategies in an uncertain environment path planning which can improve the efficiency of the target point is reached.
     Finally, conclusions are given with recommendation for future work.

引文

[1]N. J. Nilsson. Shakey The Robot [Technical Note 323]. USA:Standford University, 1984:1-135.
    [2]中国工业机器人产业分析研究报告.
    [3]http://baike.baidu.com/view/3069833.htm.
    [4]Salichs M. A., Moreno.L. Navigation of Mobile Robot:Open Questions[J]. Robotica,2000,18:227-234.
    [5]李群明,熊蓉,褚键.室内自主移动机器人定位方法研究综述[J].机器人,2003,25(6)：561-567.
    [6]王志文,郭戈.移动机器人导航技术现状与展望[J].机器人.2003.25(5)：193-197.
    [7]徐秀娜,赖汝.移动机器人路径规划技术的现状与发展.计算机仿真.2006.10.
    [8]王卫华,陈卫东,席裕庚.移动机器人地图创建中的不确定传感信息处理.自动化学报[J],2003：267-274.
    [9]周光明.自主探测移动机器人系统设计与地图创建算法研究[D].合肥：中国科学技术大学博士学位论文,2005
    [10]Elfes A,Moravec H.High resolution maps from wide angle sonar[C].IEEE International Conference on Robotics and Automation.1985:116-121.
    [11]Song K.T., Chang C.C. Navigation integration of a mobile robot in dynamic environments [J]. Journal of Robotic Systems,1999,16(7):387-404.
    [12]Rolf L., Jan L. L., Sun, X. Y., Diedrich w. Geometric robot mapping[J[. Lecture Notes in Computer Science,2005,34(29):11-22.
    [13]Ohya A, Nagashima Y, Yuta S. Explore Unknown Environment and Map Construction Using Ultrasonic Sensing of Normal Direction of Walls[C]. In: Proceedings of IEEE International Conference on Robotics and Automation, San Diego, CA, USA,1994:485-492.
    [14]Huang W. H., Beevers K. R. Topological map merging[J]. International Journal of Robotics Research,2005,24(8):601-613.
    [15]Ryu B. S., Yang H. S. Integration of reactive behaviors and enhanced topological map for robust mobile robot navigation[J]. IEEE Transactions on Systems Man and Cybernetics Part A-Systems and Humans,1999,29(5):474-485.
    [16]李磊,叶涛等.移动机器人技术研究现状与未来[J].机器人,2002,24(5)：20-24.
    [17]艾海舟,张跋.基于拓扑的路径规划问题的图形解法[J].机器人1999,12(5)：20-24.
    [18]Remolina E., Kuipers B. Towards a general theory of topological maps[J]. Artificial Intelligence,2004,15(2):47-104.
    [19]金雷泽,杜振军,贾凯.基于势场法的移动机器人路径规划仿真研究[J].计算机工程与应用,2007,43(24)：226-229.
    [20]高云峰,黄海.复杂环境下基于势场原理的路径规划方法[J].机器人,2004,26(2)：114-118.
    [21]王肖青,王奇志.传统人工势场的改进[J].计算机技术与发展,2006,16(4)：96-98.
    [22]E. Shi, T.Cai,C.He, and J. Guo, " Study of the new method for improving artificial potential field in mobile robot obstacle avoidance," Proc.of IEEE International Conference on Automation and Logistics,2007:282-286.
    [23]王奇志,基于改进人工势场法的多障碍机器人运动控制,2003年中国智能自动化会议论文集[J],2003：656-659.
    [24]蔚东晓,贾霞彦.模糊控制的现状与发展[J].自动化与仪器仪表,2006,(6)：4-7.
    [25]陈华志,谢存禧.移动机器人避障模糊控制[J].机床与液压,2004,(11)：77-78.
    [26]T.-L.Lee, C.-J.Wu, Fuzzy motion planning of mobile robots in unknown environments, Journal of Intelligent and Robotic Systems,2003,37(2):177-191.
    [27]S.K. Pradhan, D.R. Parhi, A.K. Panda, Fuzzy logic techniques for navigation of several mobile robots, Applied Soft Computing Journal,2009(9):290-304.
    [28]Hong-yong Yang, Fu-zeng Zhang. Autonomous Mobile Intelligent Robots on Fuzzy System with Optimal Theories. School of Computer Science and Technology, Ludong University Yantai,2009(54):24-32.
    [29]刘利,黄文玲,多AUV编队协调模糊控制策略,鱼雷技术[J],2009,17(1)：11-13.
    [30]G.G. Rigatos, C. S. Tzafestas, and S. G. Tzafestas, "Mobile robot motion control in partially unknown environments using a sliding-mode fuzzy-logic controller," in Robotics and Autonomous Systems. New York:Elsevier,2000,(33):1-11.
    [31]M.Montaner and A.Serrano, " Fuzzy knowledge-based controller design for autonomous robot navigation, " Artificial Intelligence in Mexico,1988(14):179-186.
    [32]Patrick Reignier. Fuzzy logic techniques for mobile robot obstacle avoidance. Robotics and Aulowmu Syslems.1994 (12):143-153.
    [33]Suo Tan and Simon X.Yang. A Fuzzy Inference Controller with Accelerate/Brake Module for Mobile Robots. Proceedings of the IEEE International Conference on Automation and Logistics Qingdao, China September,2008:810-815.
    [34]邹克旭,欧白旭,王晨等.基于模糊人工势场法的机器鱼路径规划[J].机器人技术及应用,2009(4)：14-17.
    [35]Suo Tan and Simon X. Yang. A Fuzzy Inference Controller with Accelerate/Brake Module for Mobile Robots. Proceedings of the IEEE International Conference on Automation and Logistics Qingdao, China September,2008:810-815.
    [36]Dayal Ramakrushna Parhi,Jagadish Chandra Mohanta. Navigational control of several mobile robotic agents using Petri-potential-fuzzy hybrid controller. Applied Soft Computing,2011 (11):3546-3557.
    [37]Cinthya Solano-Aragon, and Arnulfo Alanis. Multi-Agent System with Fuzzy Logic Control for Autonomous Mobile Robots in Known Environments. Springer-Verlag Berlin Heidelberg,2009(257):33-52.
    [38]Lee M, Tarokh M, Cross M, Fuzzy logic decision making for multi-robot security systems. Artificial Intelligence Review,2010(34):177-194.
    [39]O. Hachour, The proposed fuzzy logic navigation approach of autonomous mobile robots in unknown environments, International Journal of Mathematical Models and Methods in Applied Sciences,2009,3(3):204-218.
    [40]S. X. Yang and M. Meng. "Neural network approaches to dynamic collision-free trajectory generation,"IEEE Trans.Syst., Man, Cybern, B, Cybern.,2001, 31(3):302-318.
    [41]S. X. Yang and Q.-H. M. Meng, "Real-time collision-freemotion planning of mobile robots using neural dynamics based approaches," IEEE Trans.Neural Netw.2003,14(6):1541-1552.
    [42]K.K.Tahboub,and M.S.N. Al-Din, "A neuro-fuzzy reasoning system for mobile robot navigation," Jordan Journal of Mechanical and Industrial Engineering, 2009,3(1):77-88.
    [43]A. Zhu and S. X. Yang,"Neurofuzzy-based approach to mobile robot navigation in unknown environments," IEEE Trans. Syst. Man, Cybern.C, Appl.2007, 37(4):610-621.
    [44]A. Zhu and S. X. Yang," Neuro fuzzy-based approach to mobile robot navigation in unknown environments," IEEE Transactions on Systems, Man, and Cybernetics, Part C:Applications and Reviews.2007,37(4):610-621.
    [45]N. Kubota and K. Nishida, "Perceptual control based on prediction for natural communication of a partner robot, "IEEE Trans.Ind.Electron.2007,54(2):866-877.
    [46]Kala R et al. Mobile robot navigation control in moving obstacle environment using genetic algorithm,artificial neural networks and A* algorithm. Proceedings of the IEEEworld congress on computer scienceand information engineering, 2009:705-713.
    [47]Mahadevan S.Connell J Automatic Programming of Behavior-based Robots using Reinforcement Learning.1992(55):311-365.
    [48]Millan J D R.Torras C A,A Reinforcement Connetionist Approach to Robot Path finding in non-maze-like Enviroments,Machine Learning,1992(8):363-395.
    [49]W. Zhu and S. Levinson, " Vision-based reinforcement learning for robot navigation, " Proc. of the Int. Joint Conf. on Neural Networks, 2001(2):1025-1030.
    [50]Su, M., Huang, D., Chou, C., and Hsieh, C. A Reinforcement Learning Approach to Robot Navigation. Proceedings of the 2004 IEEE International Conference on Networking, Sensing& Control, Taipei, Taiwan,2004:21-23.
    [51]郭锐,吴敏,彭军,彭姣,曹卫.一种新的多智能体Q学习算法.自动化学报[J],2007,4：367-372.
    [52]蔡自兴,贺汉根,陈虹.未知环境中移动机器人导航控制研究的若干问题[J].控制与决策,2002,17(4)：385-39
    [53]吴宪祥,郭宝龙,王娟.基于粒子群三次样条优化的移动机器人路径规划算法[J].机器人,2009,31(6)：556-561。
    [54]DORIGO M, CARO G D. The Modified Swarm Optimization Metaheuristic [M].//Come D,MDORIGO, GLOVER F, Editors. New Ideas in Optimization. Mc London, UK:Graw Hill,1999:11-32.
    [55]Watkins,C.J.C.H., Dayan, P.,Q-learning. Machine Learning,1992,8(3-4):279-292.
    [56]Peng, J., Williams, R.J.,Incremental multi-step Q-learning.Machine Learning, 1996,122(1-3):283-290.
    [57]Hwang, H. J., Viet, H. H., Chung, T.,Q(λ) based vector direction for path planning problem of autonomous mobile robots, Lecture Notes in Electrical Engineering:IT Convergence and Services. Springer Netherlands,2011(107):433-442.
    [58]Sutton, R. S.Barto, A. G,Reinforcement learning:an introduction. Cambriage, MA: MIT Press,1998.
    [59]Dearden, R.,Bayesian Q-learning. Proceedings of the 15th National Conference on Artificial Intelligence,1998:229-235.
    [60]Guo, M., Liu, Y., Malec, J.,A new Q-learning algorithm based on the metropolis criterion. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics,2004,34(5):2140-2143.
    [61]Framling, K.,Guiding exploration by pre-existing knowledge without modifying reward. Neural Networks,2007,20(6):736-747.
    [62]Still, S.Precup D.,An information-theoretic approach to curiosity-driven reinforcement learning. Proceedings of the International Conference on Humanoid Robotics,2012,131(1),139-148.
    [63]Koenig, S., Simmons, R. G.,The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms. Machine Learning,1996,22(1):227-250.
    [64]Ohet, C. H., Nakashima, T., Ishibuchi, H., Initialization of Q-values by fuzzy rules for accelerating Q-learning.Proceedings of IEEE World Congress on Computational Intelligence and IEEE International Joint Conference on Neural Networks,1998(3):2051-2056.
    [65]Wiewiora, E.,Potential-based shaping and Q-value initialization are equivalent. Artificial Intelligence Research,2003(19):205-208.
    [66]Song, Y., Li, Y., Li, C., Zhang, G.,An effcient initialization approach of Q-learning for mobile robots. International Journal of Control, Automation and Systems,2012,10(1):166-172.
    [67]Senda, K., Mano, S., Fujii, S.,A reinforcement learning accelerated by state space reduction. SICE 2003 Annual Conference,2003(2):1992-1997.
    [68]Hamagami, T., Hirata, H.,An adjustment method of the number of states of Q-learning segmenting state space adaptively. Proceedings of IEEE International Conference on Systems, Man and Cybernetics,2003(4):3062-3067.
    [69]Lampton, A., Valasek, J.,Multiresolution state-space discretization method for Q-learning. Proceedings of American Control Conference,2009:1646-1651.
    [70]Jin, Z., Liu, W., Jin, J., Partitioning the State Space by Critical States. Proc.4th Int. Conf. on Bio-Inspired Computing,2009:1-7.
    [71]Jaradat, M. A. K., Al-Rousan, M., Quadan, L., Reinforcement based mobile robot navigation in dynamic environment. Robotics and Computer-Integrated Manufacturing,2011,27(1):135-149.
    [72]黄炳强,曹广益，王占全,强化学习原理算法及应用[J]河北工业大学学报2006,35(6)：34-38.
    [73]焦宝聪,陈兰平,运筹学的思想方法及应用[M],北京,北京大学出版社,2008
    [74]高阳,强化学习研究进展[R], http://www.doc88.com/p-790377874557.html
    [75]余涛,胡细兵,刘靖,基于多步回溯Q(λ)学习算法的多日标最优潮流计算[J],华南理工大学学报,2010,38(10)：139-145.
    [76]Singh S, Sutton R S. Reinforcement learning with replacing eligibility traces [J]. Machine Learning,1996,22:123-158.
    [77]金钊,加速强化学习方法研究[D],云南大学,2010：17-35.
    [78]Megherbi D,Malayia V. Cooperation in A Distribution Hybrid Potential-field/Reinforcement learning Multi-agents-based Autonomous Path Planning in a Dynamic Time-Varying Unstructured Environment. IEEE International Multi-Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support,2012:80-87.
    [79]Cai Yifan,Simon X. Yang. A Hierarchical Reinforcement Learning Based Approach for Multi-robot Cooperation in Unknown Environments.Proceedings of the 2011 2nd International Congress on Computer Applications and Computational Science Advances in Intelligent and Soft Computing, 2012(144):69-74.
    [80]L. Busoniu, R. Babuska and B. D. Schutter, " A comprehensive survey of multiagent reinforcement learning," IEEE Trans. Systems, Man and Cybernetics, Part C,2008,38(2):156-172.
    [81]Martin, J.A., de Lope, J., Maravall, D.:Analysis and solution of a predator-protector-prey multi-robot system by a high-level reinforcement learning architec-ture and adaptive systems theory. Neurocomputing 2010,58(12):1266-1272.
    [82]Y. Shoham, R. Powers, and T. Grenager. "Multi-agent reinforcement learning:A critical survey," Comput. Sci. Dept., Stanford Univ., Stanford, CA, Tech. Rep. 2003:1-13.
    [83]L. Panait and S. Luke, "Cooperative multi-agent learning:The state of the art," In Autonomous Agents and Multi-Agent Systems,2005,11 (3):387-434.
    [84]K. Tuyls and A. Nowe, "Evolutionary game theory and multi-agent reinforcement learning," the Knowledge Engineering Review,2005,20 (1):63-90.
    [85]HH. Viet and S.H. An, Dyna-Q-based vector direction for path planning problem of autonomous mobile robots in unknown environments, Advanced Robotics, 2013,27 (3):159-173.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700