基于强化学习的机械臂避碰研究

英文题名：Research on Obstacle Avoidance of Robotic Manipulator Based on Reinforcement Learning
作者：张尚炜
论文级别：硕士
学科专业名称：工业工程
中文关键词：强化学习 ; 避碰 ; 智能体 ; K-均值聚类 ; 状态空间划分
英文关键词：reinforcement learning ; obstacle avoidance ; agent ; k-means clustering ; state space partitioning
学位年度：2007
导师：李世其 ; 付艳
学科代码：1201
学位授予单位：华中科技大学
论文提交日期：2007-01-01

摘要

实际应用中对机器人的智能化程度要求越来越高,强化学习技术成为增强机器人智能的重要手段。强化学习可以不需要先验知识,通过机器人或智能体对环境进行探测和环境做出的响应,经过一定阶段的学习后就能够掌握足够的知识进行障碍的回避。由于强化学习仅仅依赖于机器人传感器对环境的感知而不需要对环境和机械臂进行精确建模,因而在机器人及其他领域得到广泛应用。
     本文在研究了机器人避障问题现有方法和强化学习理论的基础上,将强化学习方法引入机械臂的避碰问题研究,建立了一个平面三自由度机械臂的多agent避碰系统,agent能够感知的信息包括距最近障碍物的距离和当前姿态的偏差角信息,这两种信息也是系统的状态变量,agent在两种信息的共同作用下运动。
     针对机械臂控制的实时性要求,使用具有在线学习特点的强化学习主要方法之一的Sarsa(λ)作为避碰系统的基本控制策略,给出了系统算法的具体实施过程。通过仿真试验,证明了强化学习方法在机械臂避碰问题中的可行性和有效性。
     由于机械臂避碰系统具有连续的状态空间,因此对状态空间进行硬化分的方式往往不能反映状态的真实属性。本文将聚类方法与强化学习算法结合,使用K-均值聚类算法对连续状态空间进行自适应划分,仿真试验表明相同环境下与硬化分方式相比,自适应划分能够提升避碰能力。
     在Microsoft的.NET平台上开发出一个基于强化学习的平面3自由度机械臂避碰系统的仿真平台,用以展示避碰试验结果并对系统算法的各个性能指标和环节进行具体分析。通过不同的仿真实验,验证了系统具有较强的避碰能力,机械臂在一些复杂的环境中也能成功避开障碍到达目标。
This paper reports on the obstacle avoidance problem for robotic manipulators. Machine learning (ML) has become an important means to enhancing the intelligence for the robots, as the increasing requirement in practical application. After a period of learning, reinforcement learning (RL) robots or agent without prior knowledge can avoid obstacles just depend on the exploration and the environmental response. The RL method was applied to this domain because it relies on the sensors’perception on environment, not the need of accurate modeling environment and robot itself.
     A multi-agent obstacle avoidance system was built for a 3-DOF planar manipulator. The system combines a repelling influence related to the distance between manipulator and nearby obstacles with the attracting influence produced by the angular difference to drive the manipulator moving.
     According the real-time demand of manipulator control, the Sarsa(λ) algorithm, which is a major method of RL, was selected as a basic control strategy for its on-policy feature and efficiency. The implement process of the algorithm was given and in the end of this paper, a simulation experiment showed the RL method’s feasibility and availability.
     As the obstacle avoidance problem for robotic manipulators has continuous state space, the state space partitioning is a important factor to improve the applicability and efficacy of reinforcement learning algorithms. The k-means clustering algorithm is used to partition a state space. A series of simulations are provided to demonstrate the practical values and performance of the proposed algorithms in solving robot motion planning problems.
     A simulation platform was developed for the obstacle avoidance problem of a 3-DOF planar manipulator on the Microsoft.NET. It is used to show the simulation trial results and analyze the obstacle avoidance algorithm. A series of experiments demonstrate the system having strong capacity for collision avoidance, even in some complex environments.

引文

[1]庄慧忠,杜树新.机器人路径规划及相关算法研究.科技通报,2004,20(3): 210-215.
    [2]黄献龙,梁斌,吴宏鑫.机器人避碰规划综述.航天控制,2002,1:34-40.
    [3] Lozano-Perez T. Spatial planning: A configuration space approach. IEEE Trans on Computers, 1983, 32 (2): 108-120.
    [4] O.Khatib. Real-time obstacle avoidance for manipulators and mobile robots. Robotics, 1986, 5 (1): 90-98.
    [5] Stuart Russell, Peter Norvig. Artificial Intelligence: A Modern Approach (Second Edition). 2004, POSTS&TELECOM PRESS.
    [6] David Hsu, Jean-Claude Latombe, Rajeev Motwani. Path Planning in Expansive Configuration Spaces. International Conference on Robotics and Automation. Albuquerque, New Mexico. April 1997.
    [7]毕盛,朱金辉,闵华清等.基于模糊逻辑的机器人路径规划.机电产品开发与创新,2006,19(1):21-22.
    [8]刘成良,张凯,付庄等.神经网络在机器人运动控制中的应用研究.机械科学与技术,2003,22(2):226-228,331.
    [9]徐小云,颜国政.遗传算法及其在机器人控制中的应用.光学精密工,2001, 9(4):334-338.
    [10] Kazuo Sugibara, John Smith. Genetic algorithms for adaptive motion planning of an autonomous mobile robots . Problems IEEE Trans SMC[C] . USA :SIM ,1997.
    [11] K.Altlioefer, L.D. Seneviratnc. Fuzzy Navigation for Robotic Manipulators. International Journal of Uncertainty, Fuzziness and Knowledge-bascd System, 1998, 6(2):179-188
    [12] K.Althoefer, B.Krekelberg. Reinforcement learning in a rule-based navigator for robotic manipulators [J]. Neurocomputing, 2001, 37(1): 51-70.
    [13]陈华华,杜歆,顾伟康.基于神经网络和遗传算法的机器人动态避障路径规划.传感技术学报,2004,4:551-555.
    [14]杨敬辉,洪炳,朴松昊.基于遗传模糊算法的机器人局部避障规划.哈尔滨工业大学学报,2004,36(7):946-948.
    [15]王珏,石纯一.机器学习研究.广西师范大学学报(自然科学版),2003, 21(2):1-15.
    [16]闫友彪,陈元琰.机器学习的主要策略综述.计算机应用研究,2004,7:4-13.
    [17] Sridhar Mahadevan. Machine learning for robots: A comparison of different paradigms.In Workshop on Towards Real Autonomy, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-96), Osaka, Japan, 1996.
    [18] R S Sutton, A G Barto. Reinforcement Learning: An Introduction [M]. Massachusetts:The MIT Press, 1998.
    [19] Leslie Kaelbling, Michael Littman, Andrew Moore. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research, 1996, 4:237-285.
    [20] Mance E. Harmon, Stephanie S. Harmon. Reinforcement Learning: A Tutorial. http://citeseer.ist.psu.edu/harmon96reinforcement.html
    [21]高阳,陈世福,陆鑫.强化学习研究综述.自动化学报,2004,30(1):86-100.
    [22]陈卫东,席裕庚,顾冬雷.自主机器人的强化学习研究进展.机器人,2001,23(4): 379-384.
    [23] Su Mu-chun, Huang De-yuan, Chien-Hsing Chou. A Reinforcement Learning Approach to Robot Navigation. Proceedings of the 2004 IEEE International Conference on Networidng, Sensing&Control, Taipei, Taiwan, March 21-23, 2004.
    [24] Duan Yong, Xu Xin-he. Fuzzy Reinforcement Learning and its Application in Robot Navigation. Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, August 18-21, 899-904, 2005.
    [25] J. Millan, R. del. Reinforcement learning of goal-directed obstacle-avoiding reaction strategies in an autonomous mobile robot. Robotics and Autonomous Systems,1995, 15(3) 275-799 .
    [26]陆军,徐莉,周小平.强化学习方法在移动机器人导航中的应用.哈尔滨工程大学学报, 2004,25(2):176-179.
    [27] Yoshikazu Arai, Teruo Fujii, Hajime Asama etc. Collision avoidance in multi-robot systems based on multi layered reinforcement learning. Robotics and autonomous systems, 1999, 29:21-32.
    [28]顾国昌,仲宇,张汝波.一种新的多智能体强化学习算法及其在多机器人协作任务中的应用.机器人. 2003, 25(4):344-348.
    [29]殷翔,黄展翔.强化学习在仿真机器人足球踢球动作中的应用.苏州大学学报(工科版),2002,22(4):26-32.
    [30] Ashley Tews, Gordon Wyeth. Multi-robot coordination in the robot soccer environment. 1998.
    [31] A. El-Fakdi, M. Carreras, N. Palomeras etc. Autonomous Underwater Vehicle Control using Reinforcement Learning Policy Search Methods. Oceans, 2005, P793-798.
    [32] S.Fukunaga, Y.Nakamura, K.Aso etc. Reinforcement learning for a snake-like robot controlled by a central pattern generator. Proceedings of the 2004 IEEE Conference onRobotics, Automation and Mechatronics, 2004, P909-914.
    [33] Yang Li, Chen Zong-hai, Chen Feng. A Case-Based Reinforcement Learning for Probe Robot Path Planning. Praceedings of the 4th World Congress on Intelligent Control and Automation June 10-14, 2002, Shanghai, P.R.China.
    [34] Pedro Martin, R. Millan. Robot arm reaching through neural inversions and reinforcement learning. Robotics and Autonomous Systems, 2000, 31:227–246.
    [35]童亮,陆际联.基于强化学习的机器人操作手轨迹规划方法研究.控制技术与自动化,2005, 5:33-36.
    [36] Nanfeng Xiao, Saeid Nahavandi. A reinforcement learning approach for robot control in an unknown environment. Proceedings of the IEEE International Conference on Industrial Technology [C]. USA: IEEE, 2002, P1096– 1099.
    [37] Kai-Tai Song, Te-Shan Chu. Reinforcement learning and its application to force control of an industrial robot. Control Engineering Practice, 1998, 6:37-44.
    [38] Takakuni GOTO, Hiroyuki KAMAYA, Kenichi ABE. Reinforcement Learning Approach to Motion Control of 2-Link Planer Manipulator with a Free Joint. SICE Annual Conference in Sapporo, August 4-6, 2004, P1774-1779.
    [39] Gerhard Weiss. Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, MIT Press, 1999.
    [40]陈雪江,杨东勇.基于强化学习的多智能体协作实现.浙江工业大学学报, 2004,32(5): 516-521.
    [41] Brian Sallans, Geoffrey E. Hinton. Reinforcement Learning with Factored States and Actions. Journal of Machine Learning Research, 2004, 5:1063–1088.
    [42] Dean Hougen, Maria Gini, James Slagle. Partitioning Input Space for Reinforcement Learning for Control. International Congress on Neural Networks, 1997, P755 - 760.
    [43] Ivan S.K. Lee, Henry Y.K. Lau. Adaptive state space partitioning for reinforcement learning. Engineering Applications of Artificial Intelligence. 2004, 17:577–588.
    [44]文锋,陈宗海,卓睿等.连续状态自适应离散化基于K-均值聚类的强化学习方法.控制与决策,2006,21(2):143-147.
    [45] Satinder P. Singh, Tommi Jaakkola, Michael I. Jordan. Reinforcement Learning with Soft State Aggregation. Advances in Neural Information Processing Systems, volume 7. Morgan Kaufmann, 1995.
    [46] Aristidis Likas. A Reinforcement Learning Approach to On-line Clustering. Neural Computation .Volume 11, Number 8, November 1999, P1915-1932.
    [47]孟增辉.聚类算法研究:[硕士学位论文].河北大学,2005.
    [48] Alexander Hinneburg, Daniel Keim. Optimal Grid-Clustering:Towards Breaking theCurse of Dimensionality in High-Dimensional Clustering. Proceedings of the 25th VLDB Conference, Edinburgh, Scotland, 1999.
    [49] Stefan Berchtold, Bernhard Ert, Daniel Keim etc. Fast Nearest Neighbor Search in High-dimensional Space. 14th International Conference on Data Engineering (ICDE’98), Orlando, Florida, February 23-27, 1998.
    [50]王汉芝,刘振全.天津科技大学学报.一种新的确K-均值算法初始聚类中心的方法, 2005,20(4):76-79.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700