多智能体协作策略的研究及在RoboCup中的应用

作者：刘苗
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：多智能体系统 ; 协作策略 ; 模块化模糊Q学习 ; 分层规划
英文关键词：multi-agent system ; cooperation strategy ; modular fuzzy Q-learning ; layered planning
学位年度：2008
导师：彭军
学科代码：081203
学位授予单位：中南大学
论文提交日期：2008-04-01

摘要

多智能体协作是多智能体系统MAS研究的核心问题。在复杂、动态、不确定的多智能体环境中,为了满足多智能体协作中对局部配合和通信受限的要求,本文对智能体策略寻优、行为协调和动作规划问题进行研究,来构建适用于不同情况下的多智能体协作策略,并在典型的MAS——RoboCup机器人足球仿真系统平台下进行检验。
     首先,为了实现智能体行为选择的优化和多智能体的局部协作,提出基于行为协同优化的多智能体协作策略。智能体通过模块化模糊Q学习对其他智能体的行为进行评估,考虑它们的行为对自身行为选择的约束,来优化自身的行为决策,再采用共享联合意图的协调方法解决智能体之间的行为冲突,得到其最优行为策略。
     其次,在通信受限的情况下,提出基于多智能体行为图的分层规划协作策略。根据智能体感知到的局部环境信息,利用行为图对其行为过程进行预测规划,再结合模块化模糊Q学习中获得的行为选择的先验知识,逐层调整其初始行动计划,获得智能体协调一致的动作规划序列,使其针对当前环境快速做出有效决策来实现与其他智能体的协作。
     论文提出的多智能体协作策略应用到中南大学CSU_YunLu机器人足球仿真球队中,在实际训练和对抗比赛中验证了其有效性。
Multi-agent cooperation is an important research focus of multi-agent system (MAS). In complex, dynamic and uncertain multi-agent environment, this dissertation studies these problems, such as strategy optimization of single agent, behavior coordination and action planning, to satisfy the requirements of local collaboration and communication limitation in the process of multi-agent cooperation. Then multi-agent cooperation strategies are conducted to be applicable in different cases and examined in RoboCup soccer simulation system.
     Firstly, in order to implement behavior selection optimization of the agent and local collaboration of multiple agents, a multi-agent cooperation strategy based on behavior common optimization is proposed. Each agent uses modular fuzzy Q-learning to speculate the behaviors of other agents. Considering their behavior restrictions, individual behavior decision-making is optimized. Then the behavior conflicts among agents are solved by the coordination method sharing joint-intentions to obtain the optimized behavior strategy.
     Secondly, a layered planning cooperation strategy based on multi-agent behavior graph is presented in the case of communication limited. According to the local environment state information that agents observe, the behavior process of agents is planned using behavior graph in advance. Then combining with the prior knowledge of behavior selection obtained by modular fuzzy Q-learning, initial activity planning is gradually adjusted from lower layer to higher one, so that consistent action sequence of each agent is acquired, which ensures the agent to make action decision fleetly against current environment to cooperate with others neatly.
     These proposed cooperation strategies above have been applied into CSU_YunLu simulation team. The feasibility is verified in actual antagonism training and competition.

引文

[1]B.Remcode,K.Jelle.The Incremental Development 0f a Synthetic Multi-agent System:The UvA Trileam 2001 Robotic Soccer Simulation Team.Master's Thesis.University of Amsterdam,The Netherlands.February 2002.
    [2]S.Russell,P.Norvig.人工智能--一种现代方法第二版(姜哲等译).北京:人民邮电出版社,2004:343-347.
    [3]P.Lima,T.Balch,M.Fujita,et al.RoboCup2001.IEEE Robotics &Automation Magazine,2002,9(2):20-30.
    [4]M.Dastani,J.Hulstijn,F.Dignum,et al.Issues in Multi-agent System Development.Proceedings of the Third International Joint Conference on Autonomous agents and Multi-agent Systems(AAMAS'04),2004.
    [5]K.Abdelaziz,A.Magdy,A.Mohamed.A Framework for Multi-agent Collaboration.The scientific 11~(th)conference for information and computer technology,Egyptian Society for Information Systems and Computer Technology(ESISACT),Cairo,2004.
    [6]Y.B.Mohammadi,A.Tazari,M.Mehrandezh.A new hybrid task sharing method for cooperative multi-agent systems.IEEE Computer Society (CCECE/CCGEI),Saskatoon,2005:2045-2048.
    [7]H.Kitano,M.Tamble,P.Stone,et al.The RoboCup synthetic agent challenge 97.In Proceedings of Fifteenth International Joint Conference on Artificial Intelligence(IJCAI-97),1997,Vol.1.24-29.
    [8]李实,徐旭明,叶榛等.机器人足球仿真比赛的Server模型.系统仿真学报,2000,12(2):37-39.
    [9]M.Chen,E.Foroughi,F.Heintz,et al.RoboCup Soccer Server Users Manual(version7.07),http://sserver.sourceforge.net/docs/manual.pdf,2002.
    [10]M.Spaan,F.Groen.Team coordination among robotic soccer players.In Proceedings of RoboCup International Symposium,2002.
    [11]The goals of RoboCup.By RoboCup Federation on http://www.robocup.org/overview/22.html,2000.Verified on 12th February 2001.
    [12]胡旦华,马永光,张宇晴.多智能体系统中合作策略的研究.计算机仿真,2004,21(3):130-132.
    [13]W.Chen,K.Decker.Managing multi-agent coordination,planning,and scheduling.Third International Joint Conference on Autonomous agents and Multi-agent Systems(AAMAS'04)New York,2004,1(3):1360-1361.
    [14]C.B.Excelente-Toledo,N.R.Jennings.Learning when and how to coordinate.International Journal of Web Intelligence and agent Systems,2003,1(3-4):203-218.
    [15]X.Li,L.K.Soh.Applications of decision and utility theory in Multi-agent systems.Technical Report TR-UNL-CSE-2004-0014,Department of Computer Science and Engineering.University of Nebraska-Lincoln,Lincoln,NE,2004.
    [16]范波,潘泉,张洪才.基于Markov对策的多智能体协调方法及其在Robot Soccer中的应用.机器人,2005,27(1):202-214.
    [17]E.R.Yang,D.B.Gu.Multi-agent Reinforcement Learning for Multi-Robot Systems:A Survey.Technical Report CSM-404,Department of Computer Science,University of Essex,2004.
    [18]黄鸿,郭巧,金玺等.基于遗传算法的机器人足球避障策略.哈尔滨工业大学学报,2003,35(9):1092-1094.
    [19]J.Peng,M.Wu,R.Guo,et al.Study of neural network disturbance learning and application in RoboCup.High Technology Letters,2007,13(2):203-206.
    [20]赵志宏.多智能体系统中强化学习的研究现状和发展趋势.计算机科学,2004,31(3):23-27.
    [21]Y.Nagayuki,S.Ishii,K.Doya.Multi-agent reinforcement learning:an approach based on the other agent's internal model.In Proc.IEEE Int Conf.Multi-agent Systems,Boston,MA,2000:215-221.
    [22]周浦城,洪炳镕,黄庆成.一种新颖的多智能体强化学习方法.电子学报,2006,34(8):1488-1491.
    [23]范波,潘泉,张洪才.一种基于分布式强化学习的多智能体协调方法.计算机仿真,2005,22(6):115-117.
    [24]J.Huang,B.Yang,D.Y.Liu.A Distributed Q-learning Algorithm for Multi-agent Team Coordination.Proceedings of the Fourth International Conference on Machine Learning and Cybernetics,Guangzhou,2005:108-109.
    [25]A.M.Tehrani,M.S.Kamel,A.M.Khamis.Fuzzy Reinforcement Learning for Embedded Soccer agents in A Multi-agent Context.International Journal of Robotics and Automation,ACTA Press Anaheim,CA,USA,Vol.21 No.2,2006:110-119.
    [26]E.David,W.Karen,L.Myers.A Multi-agent Planning Architecture.Artificial Intelligence Planning Systems,1998.79-84.
    [27]K.Decker,J.Li.Coordinating mutually exclusive resources using GPGP.In AAMAS journal,2000,3(2):133-157.
    [28]Y.Cai,J.Chen,J.Yao,et al.Global planning from local eyeshot:an implementation of observation-based plan coordination in RoboCup simulation games.Lecture Notes In Computer Science RoboCup 2001:Robot Soccer World Cup,2002:12-21.
    [29]J.Cox,E.Durfee.An efficient algorithm for multi-agent plan coordination.In The Fourth International Joint Conference on Autonomous agents and Multi-agent Systems(AAMAS05),Utrecht,Netherlands,2005:828-831.
    [30]J.Cox,E.Durfee,T.Bartold.A distributed framework for solving the agent plan coordination problem.In The Fourth International Joint Conference on Autonomous agents and Multi-agent Systems(2005).
    [31]Y.Dimopoulos,P.Moraitis.Multi-agent Coordination and Cooperation Through Classical Planning.Proceedings of the IEEE/WIC/ACM International Conference on Intelligent agent Technology(IAT 2006 Main Conference Proceedings),IEEE Computer Society Washington,DC,USA,2006:398-402.
    [32]章亮,陆际联.基于强化学习的多智能体协作方法研究.计算机测量与控制,2005,13(2):174-176.
    [33]W.Alshabi,S.Ramaswamy,M.Itmi,et al.Coordination,Cooperation and Conflict Resolution in Multi-agent Systems.Springer:Innovations and Advanced Techniques in Computer and Information Sciences and Engineering,2007:495-500.
    [34]M.Dastani,F.Arbab,F.deBoer.Coordination and Composition in Multi-agent Systems.Proceedings of the Fourth International Joint Conference on Autonomous agents and Multi-agent Systems(AAMAS'05).Utrecht,2005.
    [35]吴敏,曹卫华,桂卫华等.一种新的多智能体系统结构及其在RoboCup 中的应用.自动化学报,2006,32(5):686-694.
    [36]C.B.Excelente-Toledo,N.R.Jennings.The dynamic selection of coordination mechanisms.Journal of Autonomous agents and Multi-agent Systems,2004,9(1-2):55-85.
    [37]S.Abdallah,V.Lesser.Modeling task allocation using a decision theoretic model.Proceedings of the Fourth International Joint Conference on Autonomous agents and Multi-agent Systems,New York:ACM Press,2005:719-726.
    [38]马巧云,洪流,陈学广.多Agent系统中任务分配问题的分析与建模.华中科技大学学报:自然科学报,2007,35(1):54-57.
    [39]谢雅.RoboCup中多智能体协作规划的研究及应用:[硕士学位论文].长沙:中南大学,2006.
    [40]L.P.Reis,N.Lau.FC Portugal Team Description:RoboCup-2000Simulation League Champion.RoboCup-2000:Robot Soccer World Cup Ⅳ.Melbourne,Australia(Springer),2001:29-40.
    [41]K.Jelle,V.Nikos.Mutual Modelling of Team-mate Behaviour.Technical Report IAS-UvA-02-04,Computer Science Institute,University of Amsterdam,2002.
    [42]K.Jelle,V.Nikos,G.Frans.UvA Trilearn 2003 Team Description.Faculty of Science,University of Amsterdam,2003.
    [43]S.Abhinav.Reinforcement Learning of Player agents in RoboCup Soccer Simulation.Proceedings of the Fourth International Conference on Hybrid Intelligent System,2004.
    [44]李实,陈江,孙增圻.清华机器人足球队的结构设计与实现.清华大学学报(自然科学报),2001,41(7):94-97.
    [45]李俊华.多机器人的协调合作技术与群组控制策略研究:[硕士学位论文].西安:西安电子科技大学,2006.
    [46]胡子婴.基于智能体系统的Q学习算法的研究与改进:[硕士学位论文].哈尔滨:哈尔滨理工大学,2007.
    [47]B.Banerjee,A.Bisjee,A.Biswas.Using Bayesian Networks to model agent Relationships.Applied Artif.Intelligence Journal,2000,14(9):867-880.
    [48]G.G.Zweig.Speech Recognition with Dynamic Bayesian Networks.Ph.D.Thesis,University of California,Berkeley.
    [49]R.A.Howard,J.E.Matheson.Influence diagrams.Readings on the Principles and Applications of Decision Analysis,Menlo Park,CA:Strategic Decisions Group,1984:719-792.
    [50]D.Koller,B.Milch.Multi-agent influence diagrams for representing and solving games.IJCAI Seattle,USA:Elsevier,2001:1024-1034.
    [51]王骋,王浩,方宝富.使用基于值规则的协作图实现多智能体的动作选择.计算机工程与应用,2004,40(19):61-62.
    [52]彭军.多智能体系统协作模型及其在足球机器人仿真系统中的应用:[博士论文].长沙:中南大学,2005.
    [53]吕明.机器人足球系统的建模与策略研究:[硕士学位论文].西安:西北工业大学,2004
    [54]赵志宏,高阳,骆斌等.多智能体系统中强化学习的研究现状和发展趋势.计算机科学,2004,31(3):23-27.
    [55]L.Lin.Reinforcement Learning for Robotics Using Neural Networks.Ph.D.Thesis,Carnegie Mellon University,1993.
    [56]虞靖靓.基于Q学习的智能体智能决策研究与实现:[硕士学位论文].合肥:合肥工业大学,2004.
    [57]K.Mehmet,A.Reda.Modular fuzzy reinforcement learning approach with internal model capabilities for multi-agent systems.IEEE Transaction on Systems,Man and Cybernetics 2004 Part B:Cybernetics,Vol.34,No.2,1210-1223.
    [58]A.Bezek.Discovering Strategic Multi-agent Behavior in a Robotic Soccer Domain.Proceedings of AAMAS 05,2005.
    [59]G.Kaminka,M.Fidanboylu,A.Chang,etc al.Learning the sequential coordinated behavior of teams from observations.Proceedings of the RoboCup-2002 Symposium,June,2002.
    [60]A.Bezek.Modeling Multi-agent Games Using Action Graphs.Proceedings of Modeling Other agents from Observations(MOO 2004),2004.
    [61]谢雅,彭军,吴敏.PMBC及其在RoboCup中的应用.计算机仿真,2006,23(3):120-122,176.
    [62]杜陶钧,黄鸿.多智能主体协作规划理论及其在RoboCup中的应用.计算机仿真,2004,21(7):125-128.
    [63]X.R.Wu,L.H.Xie.Attribute-oriented induction and conceptual clustering.Computer Engineering,2003,29(5):92-123.
    [64]S.B.Post,M.L.Fassaert.A Communication and Coordination Model for "RobocupRescue" agents.Master thesis,University of Amsterdam,2004.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700