基于RoboCup多智能体系统学习与协作问题的研究

英文题名：Multi-agents System Learning and Cooperation Research of RoboCup
作者：杨宝庆
论文级别：硕士
学科专业名称：控制理论与控制工程
中文关键词：多Agent系统 ; RoboCup ; 遗传算法 ; 神经网络 ; Q学习 ; 协作
英文关键词：Multi-agents systems ; RoboCup ; genetic algorithm ; neural network ; Q-learning ; cooperation
学位年度：2008
导师：刘国栋
学科代码：081101
学位授予单位：江南大学
论文提交日期：2008-06-01

摘要

随着计算机技术的发展,分布式人工智能中多智能体系统(MAS:Multi-agent System)的理论及应用研究已经成为人工智能研究的热点。RoboCup(Robot World Cup)即机器人世界杯足球锦标赛,是一种典型的多智能体系统。该系统具有动态环境、多个智能体之间合作与竞争并存、受限的通讯带宽以及系统设置的随机噪声等特点。通过该系统这个具有普遍意义的试验平台,可以深入研究和评价多智能体系统中的各种理论和算法,并将结果推广到众多领域。
     本文的主要研究工作如下:
     1)针对RoboCup中Agent决策任务的复杂性特点,设计了基于分层学习的决策框架。该决策框架将Agent的决策任务按高级到低级分为多个层次,每层的决策通过相应机器学习方法实现,并以下一层的学习结果为基础。而针对层结构的误差积累问题,采取了一种改进的层结构,加入了一个协调层,用于对决策信息进行评价,并对明显错误的信息进行更正。
     2)为了提高Agent个体技术的智能性,采用遗传神经网络技术进行离线训练,实现了Agent的截球技术。实验表明,该技术较好地解决了噪声所造成的干挠影响。而对于智能体的踢球技术,则采用Q学习进行离线训练。
     3)针对Agent团队协作的进攻决策学习问题,对单Agent的Q学习算法进行了扩展。主要思想是引入学习智能体,同时,将统计学习与增强学习相结合,通过对智能体间联合动作的统计来学习其它智能体的行为决策。
     本文的相关实验在Robocup仿真比赛环境下进行,实验结果证明采用本文的学习算法有效地实现了Agent在复杂环境下的智能决策。
With the development of the computer technology, research on the theory and application of Multi-agent system( MAS) has become a hot spot of Artificial Intelligence. The Robot World Cup(RoboCup) is a typical MAS with characters such as dynamic environment, the co-existence of cooperation and competition among several agents, limited communication bandwidth, and the noisy environment. Based on this general test plat form, various theories of MAS can be researched and applied to many field.
     Considering the complexity of agent decision task in RoboCup, layer learning based on decision framework is designed. The framework divides the full decision task into several layers from high-level to low-level. To solve errors accumulation among layers, we adopt the improving layer's structure with a corresponding layer, which can be used to judge of decision-making information and correct inaccuracy information.
     In order to improve the intelligence of individual skills, the off-line learning method is adopted to learn the basic techniques such as ball interception.With the analysis of two different solutions,an improved dichotomy algorithm based on neural network and genetic algorithm is proposed to achieve ball interception. Q learning method is adopted to train basic skills of ball kicking.
     For the learning problem of agent team cooperation, the basic Q learning algorithm is extended introducing the concept of learning agent .And meanwhile,the agent can learn other agents' action policies through observing and counting the joint action, a concise but useful hypothesis is adopted to denote the optimal policies of other agents,the full joint probability of policies distribution guarantees the learning agent to choose optimal action.
     All the experiments are made under RoboCup simulation platform.The results have proved that the agent's learning method proposed in the paper can effectively improve the intelligence of agent decision in complex domain.

引文

1.Gerhard W.Multiagent Systems:A Modem Approach to Distributed Artificial Intelligence[M].Boston:MIT Press,1999.412-433
    2.VR Lesser,D D Corkill.Functionally accurate,cooperative distributed systems[C].IEEE Tra ns.On System Manand Cybemetics Vol.SMC- 11,No.1,1981.
    3.Wooldrideg,M and Jennings N R.Intelligent agent:theory and practice[J].The Knowledge Engineering Review,10(2).115 - 152
    4.Wooldridge,M and Jennings N R.Agents Theories,Architectures and Language:A Survey[J].Intelligent Agent,Springer,1995.212 - 216
    5.Shoham Y.Agent-Oriented Programming.ArtificialIn telligence,1993(50):51 - 92.
    6.Maze P.The agent network architecmre(ANA).SIGART Bullitin,2(4):115 - 120
    7.Bratman M E.Intentions,Plans,and Practical Reason[M].Harvard University Press,1987.581 -587
    8.Rao A S,Georgef.BDI Agents:From Theory to Practice[C].In First International Conference on Multi-Agent Systems(ICMAS-95).1128 - 1134.
    9.Brooks,R A.How to build complete creatures rather than isolated cognitive simulators[J].Architectures for Interlligence,225 - 239.
    10.Jennings,N R Sycara,K,etc.(1998)A Road map of Agent Research and Development[J].In Autonomous Agent and Multi-Agent System,275 - 306.
    11.FIPA 97 Specification and FIPA98 Specification[S].http://drogo.cselt.stet.it/fipa
    12.Labrou,Y,Finin,etc.A Proposal for a New KQML Specification[S].In:TR CS-97-03,UMBC,1997.211 - 213.
    13.E Durfee,V R Lesser and D D Corkill.Coorperative Distributed Problem Solving[J].In:The Handbook of Artificial Intelligence,Volume Ⅳ.Addision Welsley,1989.32 - 37
    14.Von Martial F.Coordinating Plans of Autonomous Agents[J].Lecture Notes on Artificial Intelligence,Springer.356 - 363
    15.陈世福,陈兆乾.人工智能与知识工程[M].南京大学出版社,南京,1997.287-291.
    16.A Mackworth.On Seeing Robots[J].In Computer Vision:Systems,Theory,and Applications,World Scientic Press,Singapore,1993.1 - 13
    17.Mao Chen,Klaus Dorer,etc.Users Manual RoboCup Soccer Server for Soccer Server Version 7.07 and later[S].In RoboCup soccer Server User Manual,February 11,2003.312-315
    18.李实,徐旭明,叶榛等.机器人足球仿真比赛的Server模型.系统仿真学报2000,12(2).138-145
    19.Cong Shuang.Theory And Application Of The Neural Networks By The Neural Networks Tools In Matlab[M].Hefei:University Of Science and Technology Of China Press.2003.219-223
    20.Li Shi,Chen Jiang,Sun Zenqi.Structural design and implementation.of Tsinghua Robert Soccer Team[M].Beijing:Tsinghua University(Sci & Tech).2001.41-47
    21.Stuart Russell and Peter Norvig.Artificial Intelligence:A Modern Approach[M].Beijing:Posts & Telecommunications Press,2002.561-563.
    22.Kaelbling L P,Littman M L and Moore A W.Reinforcement learning:A survey[J].Journal of Artificial Intelligence Research,1996,4:237-285.
    23.J Schmidhuber.A general method for multi-agent learning and incremental self-improvement in unrestricted environments[M].Evolutionary Computation:Theory and Applications.Scientific Pub1.Co.,Singapore,1996.352-357
    24.Singh S.Agents and Reinforcement Learning[M].Miller freeman publish Inc,San Mateo,CA,USA,1997.567-577
    25.Kaelbling L P and Stanley J.Rosenschein,Action and Planningin[J].Embedded Agents,Robotics and Autonomous Systems,Vol.6,No.1,1991.1025-1031
    26.Bellman R.Dynamic Programming[M].Princeton University Press,1957.351-353
    27.Bertsekas D P.Dynamic Programming and Optimal Control[M].Athena,Belmont,MA,1995.367-373
    28.Hans-Dieter Burkhard,Markus Hannebauer,Jan Wendler,AT Humboldt-Development,Practice and Theory[J].RoboCup-97:Robot Soccer World Cupl,Springer,1998.1-19
    29.Jukka Riekki,Reactive task executation of a mobile robot,Info[J].Tech Oulu and Departm ent of Electrical Engineering,OULU,1998.25-32
    30.Peter Stone.Layered Learning in Multi-agent Systems[J].Ph.D Thesis,Computer Scinence Department,Carnegie Mellon University,Pittsburgh,PA,Dec.1998.Available as technical report CMU-CS.98-187
    31.Brooks R A.How to build complete creatures rather than isolated cgnitives imulators[J].Architectures for Interlligence.225-239
    32.Jan Lubbers,Rogier R Spaans.The Priority/Confidence Model as a Framework for Soccer Agents[J].RoboCup-98:Robot Soccer World Cup Ⅱ,Springer,1998.1011-1017
    33.Peter Stone.Layered Learning in Multi-Agent Systems[J].Ph.D.Thesis.Camegie Mellon University,CMU-CS.101 - 187
    34.Tomohito Andou.Refinement of soccer agents' positions using reinforcement learning[J].In Hiroaki Kitano,editor,RoboCup-97:Robot Soccer World Cup Ⅰ,Springer,1998.373-388
    35.何光渝.Visual C++常用数值算法集[M].北京:科学出版社.2002.528-530
    36.Belew R K,Booker L B.Proceedings of the Fourth international Conference on Genetic Algorithms[C].San.Mateo,CA:Morgan Kaufmann Publishers,Inc,1991.1213--1218
    37.王崇骏,于文荻,陈兆乾等.一种基于遗传算法的BP神经网络算法及其应用[J].南京大学学报,2003,39(5).459-466
    38.M Riedmiller,A Merke,D Meier,etc.Karlsruhe Brainstormers-A reinforcement learning approach to robotic soccer[C].RoboCup-00:Robot Soccer World Cup Ⅳ,Springer,2000.447-454
    39.P.Stone,M.Veloso.Team-Partioned,Opaque-Transition Reinforcement Learning,In Proceedings of 15~(th) International Conference on Machine Learning,1998.451--458
    40.Kostas K,Hu HuoSheng.Reinforcement Learning and Co-operation in a Simulated Multi-agent System.Proceedings of the 1999 IEEE/RSJ International Conference Intelligent Robots and Systems[C].Japan:IEEE,1999.443-456
    41.高阳,周志华,何佳洲等.基于Markov对策的多Agent强化学习模型及算法研究[J].计算机研究与发展,2000,37(3).257-263
    42.李镇宇,陈小平.基于Markov对策的强化学习及其在RoboCup中的应用[J].计算机工程与应用,2005,27(3).202-204
    43.郭锐,吴敏,彭军等.一种新的多智能体Q学习算法[J].自动化学报,2007,33(4).367-372
    44.蔡庆生,张波.一种基于Agent团队的强化学习模型与应用研究[J].计算机研究与发展,2000,37(9).1087-1093
    45.谭天晓,赵辉,赵宗涛.一种基于统计的多智能体Q学习算法[J].微电子学与计算机,2008,25(1).17-24
    46.Mitchell D.Machine Learing.USA:McGraw-Hill Companies Inc.1997.367-387
    47.Watkins C J C H,Dayan P.Technical note Q-learning[J].Journal of Machine Learing,1992(8).279-292

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700