基于状态预测强化学习的移动机器人路径规划研究

英文题名：Research on Path Planning for Mobile Robot Based on State Prediction Reinforcement Learning
作者：傅晓霞
论文级别：硕士
学科专业名称：检测技术与自动化装置
中文关键词：移动机器人 ; 环境探索 ; 路径规划 ; 强化学习 ; 状态预测
英文关键词：mobile robot ; environment exploration ; path planning ; reinforcement learning ; state prediction
学位年度：2008
导师：马昕
学科代码：081102
学位授予单位：山东大学
论文提交日期：2008-09-10

摘要

随着机器人技术的发展,机器人已开始应用到未知环境,与静态已知环境下的移动机器人路径规划研究相比较,对于动态未知环境探索应用问题,环境的未知性和动态变化都给机器人路径规划带来了新的挑战。由于机器人不具有环境的先验知识,移动机器人在环境探索过程中不可避免的会遇到各式各样的障碍物,因此,研究具有灵活规划和避障功能的移动机器人及其在未知环境下的路径规划具有重要的现实意义。本文将状态预测与强化学习算法相结合,分别对静态和动态障碍物未知环境下移动机器人的路径规划进行了研究。
     文章首先对移动机器人环境探索中路径规划的研究内容进行了综述,然后对移动机器人环境探索下路径规划的研究现状及发展作了回顾,并对本论文的主要结构进行了概述。
     其次,详细介绍了有关移动机器人环境探索方面的知识、研究现状及存在的问题,包括栅格地图的创建、到达目标点的代价和效用值的计算、目标点的定义及分配策略等问题。然后,就路径规划的方法、传感器系统及多移动机器人路径规划的冲突消解等相关内容作了详细的阐述。
     第三,详细介绍了强化学习的基本概念、原理、方法、各种算法及研究现状等问题,然后从单个机器人在静态障碍物环境中的路径规划入手,以Q强化学习算法为基础,通过合理划分状态空间与动作空间,设计强化函数,描述了该算法在路径规划中的具体应用。
     第四,将强化学习算法与“预测”的思想相结合用于单个机器人在动态环境下的路径规划,解决了针对规则运动障碍物和同时存在静态、动态障碍物两种环境的避障问题。考虑到机器人前面每一步的决策对最后的成功或失败都有影响,所以在算法中引入资格迹(Eligibility trace)技术,采用改进的Q学习算法实现控制。
     第五,借用人类在动态复杂环境下确定自己下一步行动的预测机制思想,本文将状态预测的方法与强化学习相结合,用于多移动机器人系统环境探索下的路径规划。较以往单纯使用强化学习方法实现的路径规划,本文的方法更加合理的实现了机器人之间的避碰,并通过预测函数降低群体强化学习空间维数、加快了群体强化学习算法的收敛速度。
     最后,对本文所作的工作和取得的成果作了总结,分析了可以进一步改进的地方,并对未来发展进行了展望。
With the development of robot technology, it has begun to be applied to unknown environment now. Compared with the research on the path planning in static known environment, the unknown environment and dynamic changes bring new challenges to the path planning of environment exploration for mobile robot. Ineluctably, mobile robot will meet various obstacle because of unknown environment when exploring. Therefore, the mobile robot which can obstacle avoidance and vivid programming in unknown environment has important practical significance. In this paper, we use reinforcement learning to study the path planning for mobile robot both in static and dynamic environment.
     Firstly, this dissertation sums up the research on path planning for mobile robot exploration based on reinforcement learning.Then reviews the research and development about path planning of mobile robot environment exploration.The background and main contents of this dissertation are described briefly.
     Secondly, this part introduces the relevant knowledge, present condition and existent problem which mobile robot environment exploration in detailed, include established of grid maps, the cost of reaching a target point and its utility, the distribution of targets for multiple robots etc.. Then expatiation the method of path planning, sense system and the conflict resolution of multi-mobile robot.
     The third part introduces the concept, principle, method, algorithm and the research of present condition about reinforcement learning in detailed.Then, aiming at the static environment exploration of single robot, the key to described the path planning strategy based on Q reinforcement learning by dividing the state and act space, structuring of reinforcement function etc..
     The fourth part uses the reinforcement learning and the thought of prediction at single robot path planning in dynamic environment,that in order to solve the problem of obstacle avoidance. Because of the each previous step decision to the success or failure, our approach lead a technique of Eligibility Trace and use the improvement of Q reinforcement learning algorithm to carry out the control.
     The fifth part of this dissertation learns from mankind who make sure through the prediction, and combines the state prediction with the reinforcement learning uses for multi-mobile robot system on path planning.The approach is more reasonable than the method which only use reinforcement learning algorithm, and that carry out the reasonable collide avoids between the robot, in order to lower a space size and raise calculation speed.
     Finally, conclusions are given with recommendation for future work.

引文

[1]蒋新松.机器人学导论[M].沈阳:辽宁科学技术出版社,1994,p17-18.
    [2]H.Choset,M.Schervish,A.Costa,R.Melamud and D.Lean.A Path Planning for Robot Detaining and Development of a Test Platform[A].Proceedings of the 2001International Conference on Field and Service Robotics.2001,p 161-168.
    [3]T.M.Mitchell,J.Sullivan and S.Thrun.Explanation Based Learning for mobile robot perception[C].In:MLG-Colt Workshop on Robot Learning,1999,p239-251.
    [4]Hagit,Shatkay and L.P.Kaelbling.Learing Geometrically Constrained Hidden Markov Modols for Robot Navigation:Bridging the Geometrical-Topological Gap[J].Journal of AI Research(JAIR),2002.
    [5]S.Thrun and A.Bucken.Integrating grid-based and topological maps for mobile robot navigation,in proceedings of the 13th National Conferenceon Artificial Intelligence(AAAI-96),Portland,1996,p944-950.
    [6]R.Lopezde Mantaras,J.Amat,R Esteva,M.Lopez and C.Sierra.Generaiton of unknown environments maps by cooperaitve low-cost robots[C].In Proc.of Agents'97,Marinadel Rey,CA,1997.
    [7]B.Yamauchi.Forntier-Based Exploration Using Multiple Robots[C].In Proc.of the Second Intenrational Conference on Autonomous Agents,Minneapoils MN,USA,1998,p47-53.
    [8]B.Yamauchi.A Frontier-based approach for autonomousex ploration[C].In Proc.of the 1997 IEEE International Symposiumon Computational Intelligence in Robotics and Automation,CA,July 1997,p146-151.
    [9]R.Zlot,A.Stentz,M.B.Dias,et al.Multi-robot Exploration Controlled by a Market Economy[C].In Proc.of the 2002 IEEE Intenational Conference on Robotics &Automation,Washington DC,2002,p3016-3023.
    [10]Bert Mot,Anthony Stentz,M.Bernardine Dias,S.Thayer and M.Driven.Multi-Robot Exploration,Robotics Institute[J].Carnegie Mellon University,Pittsburgh,PA,Tech.Rep.CMU-RI-TR-0202,2002.
    [11]M.Berhault,H.Huang,P.Keskinocak,S.Koenig,W.Elmaghraby,P.Grifin and A.Kleywegt. Robot Exploration with Combinatorial Auctions,[C].Conference on Intelligent Robots and Systems,October, 2003, pl957-1962.

    [12]W. Burgard, M. Moors and F. Schneider.Collaborative Exploration of Unknown Environments with Teams of Mobile Robots[J].Springer Verlag, Vol.4266 of Lcture Notesin Computer Science,2002.

    [13]L Chaimowicz, B. Grocholsky, J. Keller, V Kumar, and C. J. Taylor. Experiments in Multi-robot Air-Ground Coordination[C]. In Proc. of the.2004 IEEE International Conference on Roboitcs and Automaiton,2004, p4053-4058.

    [14]D. Spenneberg, C. Waldmann and R. Badd. Exploration of Underwater Structures with Cooperative Heterogeneous Robots[C]. In Proc. of the IEEE Oceans'05 Europe Conference,2005, p4053-4058

    [15]A. Cai and T. Fukuda. Integration of Disrtibuted Sensing Informationin DARS Based on Evidentia lreasoning[C].In:Proc.of the 3 Intenrational Symposium on Distributed Autonomous Robotic Systems, 1996, p268-279

    [16]D. Pagac and E. M. Nebot. An Evidential Apporach to probabilistic Map-building In LeoDorst(ed.) Reasoning with Uncertainty in Robotics[J].Springer-Vertag,1995, p164-170.

    [17]Hans Jacob, S.Feder,John and J. Leonard,et al. Adaptive Mobile Robot Navigation and Mapping[J]. The Intenrational Journal of Robotics Research,July 1999,11(7), p650-668.

    [18]R. Araujo and A. T. Almeida. Learning Sensor-Based Navigation of a Real Mobile Robotin Unknown Worlds[C]. IEEE Transactions on System,Man,and Cybemetics-Part B:Cybernetics, April 1999,29 (2), p164-178.

    [19]T. Ducket and U. Nehmzow. Exploration of unknown environments using a compass, topological map and neuralnet work[C]. In:Proc. of CIRA'99,IEEE International Symposiumon Computational Intelligencein Robotics and Automation, Monterey,CA,1999, p376-386

    [20] Stewart J. Moorehead and Reid Simmons. A multiple information source planner for autonomous planetary exploration[C]. In: International Symposium on Artificial Intelligence,Robotics and Automationin Space(1-SAIRAS),2001,p 1-8.
    [21]R.Simmons,D.Apfelbaum,W.Burgard,D.Fox,S.Thrun and H.Younes.Coordination for multi-robot exploration and mapping[C].In:Proc.of the National Conference on Artificial Intelligence,AAAI,2000,p 133-138.
    [22]J.Vazquez and C.Malcolm.Distributed multi-robot exploration maintaining a mobile network[C].In Proc.of the 2~(nd) IEEE international conference on intelligent systems,2004,p 113-118.
    [23]易晨,樊晓平,罗熊.平面移动机器人最短路径规划的几何算法研究[M].长沙铁道学院学报,2003,21(1),p52-56.
    [24]王耀南,机器人智能控制工程.科学出版社,2004,p103-108
    [25]Edrinc.sahin Conkur.Path Planning Using Potential Fields for Highly Redundant Manipulators Robotics and Aotonomous Systems[M].2005,52(2-3),p209-228
    [26]邢军,王杰.神经网络在移动机器人路径规划中的应用研究[M].微计算机信息,2005,21(11),p110-112.
    [27]张颖,吴成东,于谦.基于遗传算法的机器人路径规划[M].沈阳建筑工程学院学报(自然科学版),2002,18(4),p302-305.
    [28]Park,M.,Jeon,J.and Lee,M.Obstacle Avoidance for Mobile Robots Using Artifical Potential Field Approach with Simulated Annenling[C].IEEE International Symposium on Industrial Elactronices,2001,p1530-1535.
    [29]庄晓东,孟庆春,殷波.动态环境中基于模糊概念的机器人路径搜索方法.机器人[M].2001,23(5),p397-399.
    [30]Lee,T.H.,Lam,H.K.,Leung,F.H.F.,and Tam,P.K.S.A Practical Fuzzy Logic Controller for the Path Tracking of Wheeled Mobile Robots[C].IEEE Control System Magazine,2003,3(2),p60-65.
    [31]黄素平,何清华.一种移动机器人路径规划方法.机床与液压[M].2004,5,28-30.
    [32]D.V.Lebedev,J.J.Steil and H.Ritter.A Neural Network that Caleulates Dynamic Distance Transform for Path Planning Exploration in a Changing Environment[C].In Proc.of the 2003 IEEE International Conference on Robots and Automation Taipei,Taiwan,September,2003,p4209-4214.
    [33]Zafer Bingul.Adaptive Algorithms Applied to Dynamic Multi-Objective Problems[J].Applied soft Computing,2007,7(3),p791-799.
    [34]C.Davies,and P.Lingras.Grentic Algorithms for Rerouting in Dynamic and Stochastic Networks[J].European Journal of Operational Research,2003,144(1),p27-38.
    [35]Fukuda.Cooperation Behavior Between Autonomous Agent.Autonomous Robotic Systems[M].1998.
    [36]J.H.Hu and D.Wellman.Learning Models of Intelligent Agent.International Journal of Human-Computer studies[J],1998,p48.
    [37]Lee H.,and Pollack J.B.Automatic design and Manufacture of Robotic Lifeforms Nature,2000,p974-978.
    [38]陈忠泽,林良明,颜国正.基于MAS的多机器人系统:协作多机器人学发展的一个重要方向[J].机器人,2001(7),p368-3730.
    [39]R.Alami,F.Robert and F.Ingrand.Multi-Robot Cooperation in the Martha Project[C].IEEE Robotics and automation Magazine 1998,5(1),p312-320.
    [40]M.Carreras,J.Yuh and J.Batlle.High-level Control of Autonomous Robots using a Behavior-based Scheme and Reinforcement Learning[C].15~(th) IFAC World Congress on Automatic Control,Barcelona,Spain,2002,p21-26.
    [41]CHU Hai-tao,HONG Bing-rong.Multi Robots Cooperative Based on Action Selection Level[J].Department of Computer Science and Engineering,Harbin Institute of Technology,Harbin,China.Received September 10,2001;accepted May 9,2002.
    [42]Stone,P.,and Veloso,M.Team-Partitioned,opaque-transition reinforcement learning[C].In:Asada,M.,Kitano,H.,eds,Robocup-98:Robot Soccer World CupⅡ.Berlin:Springer Verlag,1999.
    [43]Ioannis M.Rekleitis,Gregory D.and Evangelos E.Milios.Multi-Robot Collaboration for Robust Exploration[C].In Proc.of IEEE International Conference on Robotics and Automation,San Franciso,California,2001,p3164-3169.
    [44]D.Goldberg and M.J.Mataric.Coordinating mobile robot group behavior using a model of interaction dynamics[C].In Proc.of the Third international conference on Autonomous Agents,2002.
    [45]Lynne E.Parkeer.Current Research in Multi-Robot Systems[J].Journal of Artificial Life and Robots,2003,p7.
    [46]Thrun M.Beetz,et al.Probabilistic algorithms and the interactive museum tour-guide robot Minerva[J].The Intenrational Jounral of Robotics Research,November2000,19(11),p972-999.
    [47]M.Tomono and S.Yuta.A framework for indoor navigation based on a partially quantitative map[C].In Proc.of the 2000 IEEE IRSJ Intenrational Conference on Intelligent Robots and Systems.Japan:October3-November5,2000.1,p626-632.
    [48]S.Thrum and A.Bucken.Integrating grid-based and topological maps for mobile robot navigation[C].In Proc.of National Conference on Artificia Intelligence (AAAI-96).Portland,Oregon,USA,1996,p944-950.
    [49]S.P.Engelson and D.Mc Dermot.Error correction in mobile robot map learning[C].In Proc.of the IEEE/RJS Intenrational Conference on R obotics and Automation(IROS'92),1992.3,p2555-2560.
    [50]杨东勇.多机器人协作的学习与进化方法[D]:[博士学位论文].浙江大学,2004.
    [51]Wolfram Burgard,Member,IEEE,Mark Moors,Cyrill Stachniss,Student Member,IEEE,and Frank E.Schneider,Coordinated Multi-Robot Exploration IEEE Transactions on robotics,vol.21,No.3,June 2005.
    [52]李磊,叶涛等.移动机器人技术研究现状与未来[J].机器人,2002,24(5),p475-480.
    [53]朱明华,王霄,蔡澜.机器人路径规划方法的研究进展与趋势[J].机床与液压,2006(3),p5-8.
    [54]艾海舟,张跋.基于拓扑的路径规划问题的图形解法[J].机器人,1999,12(5),p20-24.
    [55]Gerke M.Genetic Path Planning for Mobile Robots[C].In Proc.of American control conference.San Diego,CA,USA,1999,p596-601.
    [56]M bede and J Bosco.Fuzzy and Recurent Neural Network Motion Control among Dynamic Obstacles for Robot Manipulators[J].Journal of Intelligent and Robotic Systems.2001,30(2),p155-177.
    [57]Zavlangas,P.G and Tzafestas,S.G.Industrial Robot Navigation and Obstacl Avoidance Employing Fuzzy Logic[J].Journal of Intelligent and Robotic Systems.2000,27(1),p23-30.
    [58]Y.Simon and M.Max.Real-time Collision-Free Path Planning of Robot Manipulators using Neural Network Approaches[J].Autonomous Robots.2000,9(1),p27-39.
    [59]张汝波等.强化学习理论、算法及应用[J].控制理论及应用.2000,17(5),p637-642.
    [60]陈卫东,李振海,席裕庚.分布式多自主机器人系统冲突及其消解策略的实例研究[J].系统仿真学报,2002,10,p1288-1292.
    [61]王丽.移动机器人路径规划方法研究[D]:[硕士学位论文].西北工业大学,2007
    [62]苏理.环境探测的多机器人路径规划研究[D]:[硕士学位论文].西安电子科技大学,2007
    [63]Minsky M.L.Theory of Neural Analog Reinforcement Systems and its Application to the brain model problem[D].New Jersey,USA,Princeton University,1954.
    [64]T.Batch.Behavioral Diversity in Learning Robot Teams[D].College of Computing,Georgia Institute of Technology,1998.
    [65]Kaelbling P,and Littman L.Reinforcement Learning:A Survey[J].Journal of Artificial Intelligence Research,1996(4),p237-285.
    [66]Bellman R E.Applied dynamic programming[M].New Jersey:Princeton University Press,1962.
    [67]C.H.Watkins,and P.Dayan.Technical note:Q-learning.Machine Learning,1992,8(3/4),p279-292,
    [68]张彦.未知环境下移动机器人路径规划研究[D]:[硕士学位论文].中国科学技术大学,2007.
    [69]孙羽,张汝波.强化学习中资格迹的作用[J].计算机工程,2002,28(5),p128-129.
    [70]Sutton R S.Temporal Credit Assignment in Reinforcement Learning[D].Universit of Massachusetts,Amherst,MA,1984
    [71]Singh S P.Reinforcement Learning with Replacing Eligibility Trace[J].Machine learning,1996(22),p123-158.
    [72]Tang Ping,hang Qi and Yang YiMin.Studying on path planning and dynamic obstacle avoiding of soccer robot[C].In Proc.of the World Congress on Intelligent Control and Automation,Hefei,University of Science and Technology of China,2000,p1244-1247.
    [73]M.Erdmann and T.Lozano-Perez.On Multiple Moving Objects.Algorithmic,Vol.2,1996,p477-521.
    [74]Miura J.and Yoshiaki S..Modeling motion uncertainty of moving obstacles of robot motion planning[C].In Proc.of 2000 IEEE on Robotics and Automation,2000,p2258-2263.
    [75]Hong Zhang,Vijay Kumar and Jim Ostrowski.Motion planning with uncertainty[C].In Proceedings of the 1998 IEEE International Conference on Robotics&Automation Leuven.Belgium;May 1998,p638-643.
    [76]欧棉军,朱枫.一种多移动机器人避碰规划方法[J].机器人,2000,22(6),p474-481.
    [77]顾国昌,仲宇,张汝波.一种新的多智能体强化学习算法及其在多机器人协作任务中的应用[J].机器人,2003,25(4),p344-348.
    [78]S.Qin,B.Joe and A.Thomas.Asurveyo findustrial model predictive Control technology[J].Control Engineering Practice,2003,11(7),p733-764.
    [79]S.W.Hong,K.S.Park,S.W.Shin and Ahn,Doo Sung.Study of cooperativeal gorithm and group behavior in multi-robot[C].In Proceedings of SPIE-The International Society for Optical Engineering,2001,p535-544.
    [80]K.Kawakami,K.Ohkura and K.Ueda.Reinforcement learning approach to cooperation problem in a homogeneous robot group[C].IEEE International symposiumonI Industrial Electronics,2001,p423-428.
    [81]E.Toledo,B.J.Cora and R.Nicholas.Leanring to Select a coordination mechanism[C].In Proc.of the Interantional Conference on Autonomous Agents,2002,p1106-1113.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700