机器人足球仿真比赛中球员高层动作技能的研究

英文题名：Research of Player's High-Level Skills in RoboCup Soccer Simulation
作者：黄颖
论文级别：硕士
学科专业名称：控制理论与控制工程
中文关键词：RoboCup2D ; 灰色综合评价法 ; 射门 ; DFL ; 传球 ; Agent自主学习
英文关键词：RoboCup2D ; grey comprehensive evaluation method ; shooting ; DFL ; passing ; Agent's autonomous learning
学位年度：2012
导师：陈玮
学科代码：081101
学位授予单位：广东工业大学
论文提交日期：2012-05-01

摘要

机器人世界杯足球赛(Robot World Cup, Robocup),它涉及人工智能、机器人学、传感、通讯等诸多领域的前沿研究和技术集成。RoboCup2D仿真比赛系统作为一个仿真平台,在此基础上通过对Agent的动作技能及团队的协作策略等研究检验各种技术和智能算法理论。在动态、实时性、环境信息不完全确定的RoboCup平台中,各种因素影响着球队取得胜利,而选择哪些因素通过什么样的方法来解决问题,是研究的必要话题。本文以RoboCup2D为载体,从手工编码方式和机器学习方式两方面出发,研究学习综合评价法和机器学习算法,并围绕我们球队的Agent的高层技能动作展开研究。主要研究内容如下：
     首先,简要介绍了RoboCup的情况、研究现状及针对手工编码和机器学习的研究方式,然后运用这两种方式对Agent的高层动作技能进行研究。同时阐述了RoboCup2D仿真比赛平台以及与仿真平台相关的感知模型、运动模型和动作模型；分析得到GDUT-TiJi球队的整体结构和Agent结构模型。
     其次是对于射门技能的研究。通过分析球队底层代码以及观看比赛录像,总结了球队射门技能的不足以及影响射门成功的原因,选择采用手工编码的方式并运用评价的思想进行射门决策的设计。对比分析了目前主要的几种评价法之后,选择基于灰色关联度的灰色综合评价法建立机器人足球2D仿真比赛中Agent的射门技能决策。然后阐述了基于灰色关联度的灰色综合评价法的原理,建立了评价指标体系,设置了组合权重,并通过比赛实验提高了射门成功率,验证了此方法的可行性。
     最后,在传球技能的研究上,我们首先比较了手工编码方式和机器学习方式的优缺点,然后阐述了机器学习的意义。传球动作属于局部协作动作,适于用机器学习的方式进行研究。对比分析了机器学习的各种学习策略,结合RoboCup环境的特点,选用基于DFL的Agent自主学习方法进行研究。然后论文阐述了Agent自主学习概念、基于DFL的Agent心智模型结构及Agent自主学习模型,并给出一个实例验证了此方法的可行性和有效性。最后将此方法应用于球队中,选取防守策略强的队伍进行实验,在传球技能的学习上,取得了较好的效果。
RoboCup World Cup Soccer Games and Conference(RoboCup) involves advanced research and novel technologies including artificial intelligence, robotics, sensing, communication and many other areas. Working as a simulation platform, RoboCup2D simulation game system is often employed to verify various intelligent algorithm theories and technologies by studying action skills of individual agent and collaboration strategy of the team. To emulate the constantly changing characteristic of soccer games,the RoboCup2D simulation game system offers a dynamic environment where only partial information is available to the players(agents). Under this circumstance, the choice of factors and decision-making strategies is critical to increase the winning ratio. Based on RoboCup2D platform, this thesis investigates methods of manual coding mode and machine learning for high-level actions of the agents in our team. The main work in this thesis consistes of three parts, as follows.
     Firstly, we make a brief introduction on the research status and existing methods involving RoboCup, with an emphasis on two major research paradigms, namely, manual coding mode and machine learning. These two paradigms are to be employed to study the decision-making of agent's high-level action. The principle and models of sense, movement and action utilized by the soccer server are also described in this part. We also depict the structure and agent model in our team our team "GDUT-TiJi"
     Secondly, we focus on the study of shooting skill. After an analysis of our team's code and game videos, we find the shortcomings of shooting skill and reasons resulting in shooting failures. We then select the manual coding way and use the grey comprehensive evaluation criterion to improve the shooting strategy according to characteristic of various comprehensive evaluation methods and RoboCup2D simulation game system. Experiment results suggest that this method improves the success rate of shooting.
     Finally, we investigate the passing skill. A comparison between the manual coding and machine learning leads to the conclusion that the latter is more suitable for decision-making process at the moment of passing the ball. Among various machine learning strategies, we choose the DFL-based autonomous learning as the main algorithm. After verifying the feasibility and validity of this method with an example, we finally apply this method in our team with an opponent team adopting a strong defensive strategy. Statistics shows that the rate of passing success is greatly increased by this method.

引文

[1]The RoboCup Federation. The Official RoboCup Website[EB/OL]. http://www.robocup.org
    [2]卢武昌,胡山立RoboCup发展与研究综述[J].福建电脑,2006(4).
    [3]Alan K. Mackworth. On Seeing Robots. In A. Basu and X. Li,editors,Computer Vision: Systems,Theory and Applications[M].World Science Press,Singapore,1993:1-13.
    [4]Mao Chen, Klaus Dorer. Users Manual RoboCup Soccer Server[M].2003.
    [5]K. Kosfiadis, H. Hu. A Multi-threaded Approach to Simulated Soccer Agents for the RoboCup Competition,RoboCup:Robot Soccer World Cup Ⅲ.Stockholm.Sweden. Springer Verlag.2000:366-377.
    [6]Kitano H,Tambe M. The RoboCup synthetic agent challenge 97[C]. Proceedings of International Joint Conference on Artificial Intelligence(IJCAP97),1997.
    [7]RoboCup[EB/OL]. http://www.robocup.org/Press/pr/PressRelease_030619_J.pdf
    [8]Stone P. Layered. Learning in Multi-Agent Systems[D]. Carnegie Mell on University, 1998.
    [9]Boer R.de, Kok J. The Incremental Development of a Synthetic Multi-Agent System:The UvA Trilearn 2001 Robotic Soccer Simulation Team[D]. Faculty of Science,University of Amsterdam,2002.
    [10]Guestrin.C, Koller.D, Parr.R. Multi-agent planning with factored MDPs[R]. Advances in Neural Information Processing Systems,2002,14:1523-1530.
    [11]RoboCup澳大利亚公开赛[EB/OL]. http://www.xagjzx.com/robotedu/xgjc/robot06/R206_20.htm
    [12]Ma Jie. OxBlue Team Description[A].Robot Soccer World Cup XI,Springer Verlag, Berl,2008.
    [13]李实,陈江,孙增沂.清华机器入足球队的结构设计与实现[J].清华大学学报(自然科学版)2001,41(7)：94—97.
    [14]李镇宇.多主体系统决策问题研究及在RoboCup中的应用[D].合肥：中国科学技术大学,2005.
    [15]孙鹏.动态环境下多机器人行为规划[D].合肥：中国科学技术大学,2005.
    [16]李龙.基于价值的机器学习学习方法及其在RoboCup仿真2D中的应用[D].合肥：合肥工业大学,2009.
    [17]Microsoft Robotics[EB/OL]. http://msdn.microsoft.com/zh-cn/robotics/default(en-us).aspx
    [18]足球机器人[EB/OL].http://www.mianfeilunwen.com/Jisuanji/Zhineng/14000_3.html
    [19]Ma Jie. OxBlue Team Description[A]. Robot Soccer World Cup XI,Springer Verlag,Berl in,2008.
    [20]多智能体系统实验室[EB/OL].http://www.wrighteagle.org
    [21]张晓红RoboCup仿真环境下Agent机器学习策略的研究[D].南京：河海大学,2007.
    [22]Jelle R,Kok, Nikos Vlassis, F.C.A. Groen. UvA Trilearn 2003 team description[J]. Proceedings CD RoboCup 2003,Padua,Italy,Jul 2003.
    [23]J.R.Kok, R.de Boer, N.Vlassis, F.Groen. UvA Trilearn 2004 team description[J]. In Proceedings CD RoboCup 2004, Springer-Verlag, Lisbon, Portugal, July 2004.
    [24]RoboCup清华大学毕业设计论文[EB/OL]. http://wenku.baidu.com/view/99e9d17101f69e31433294eb.html
    [25]Peter Stone. Layered learning in Multi-Agent System[D].Computer Science Department P Camegie Mellon University,Pittsburgh PA.1998.
    [26]Martin A. Riedmiller, Artur Merke, David Meier, Andreas Hoffmann, Alex Sinner, Ortwin Thate, R. Ehrmann:Karlsmhe Brainstormers-A Reinforcement Learning Approach to Robotic Soccer[J]. RoboCup 2000:367-372
    [27]Kaminka. AT-Humbolt Team Description[A].Robot Soccer World Cup Ⅳ,Springer Verlag,Berlin,2001.
    [28]Jinyi Yao, Jiang Chen, Zengqi Sun. An Application in RoboCup Combining Q-learning with Adversarial Planning[C].The 4th World Congress on Intelligent Control and Automation,2004.
    [29]于美娟.机器人足球(Robocup)仿真比赛中进攻策略的研究与应用[J].科学论坛,2010.
    [30]Vahid Salmani,et al. A Fuzzy Two-Phase Decision Making Approach for Simulated Soccer Agent[J]. Department of Computer Engineering,2006.
    [31]William R. Plant,et al. An Overview of Genetic Algorithms in Simulation Soccer[C]. IEEE Congress on Evolutionary Computation,2008.
    [32]李龙澍,等.基于神经网落的批强化学习在ROBOCUP中的应用[J].计算机技术与发展,2009.
    [33]李楠,刘国栋.内在激励强化学习及其在robocup仿真中的应用[J].计算机仿真,2006.
    [34]Tomoharu Nakashima, Hisao Ishibuchi. Mimicking Dribble Trajectories by Neural Networks for RoboCup Soccer Simulation[C].IEEE International Symposium on Intelligent Control Part of IEEE Multi-conference on Systems and Control Singapore, 2007.
    [35]Martin Riedmiller, Thomas Gabel,et al. On Experiences in a Complex and Competitive Gaming Domain:Reinforcement Learning Meets RoboCup[J].Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Games,2007.
    [36]邢字明,白振兴.分层强化学习在足球机器人中的应用[J].机器人技术,2008.
    [37]张长彬.机器人足球RoboCup仿真系统的研究[J].江西理工大学学报,2007.
    [38]马勇,等.基于Q学习的Agent智能防守策略研究与应用[J].计算机技术与发展,2008.
    [39]李亘,等.基于单亲遗传算法的RoboCup动态角色分配[J].计算技术与自动化,2010.
    [40]Mao Chen, Klaus Dorer. Users Manual RoboCup Soccer Server [S].2003.
    [41]Remco de Boer, Jello Kok. The Incremental Development of a Synthetic Multi-Agent System:The UvA Trilearn 2001 Robotic Soccer Simulation Team[M].February28,2002.
    [42]杜栋,庞庆华,吴炎.现代综合评价方法与案例精选[M].北京：清华大学出版社,2008.
    [43]朱建军.层次分析法的若干问题及应用[D].沈阳：东北大学,2005.
    [44]杜栋.论AHP的标度评价[J].运筹与管理,2000,9：597-602.
    [45]Tsitsiklis, JohnN. Asynchronous stochastic approximation and Q-learning [J]. Machine Learning,1994,16:185-202.
    [46]Singh S, Jaakkola T, Jordan M I. Reinforcement learning with soft state aggregation[M]. Advances in Neural Infomation Processing Systems, Morgan Kaufmann:MIT Press,1995:361-368.
    [47]陈卫华,梁晓艳,糜仲春.模糊综合评判在人事考核中的应用[J].价值工程,2005,10：96-99.
    [48]B Price, C Boutilier. Accelerating Reinforcement Learning Through Implicit Imitation [J]. Journal of Artificial Intelligence Research,2003,19:569-574.
    [49]Jelle R.Kok. Coordination and Learning in Cooperative Multiagent Systems [D]. Ph.D. Thesis, Netherlands:University of Amsterdam,2006.
    [50]袁嘉祖.灰色系统理论及其应用[M].北京：科学出版社,1991.
    [51]Jelle R. Kok, Remco de Boer, Nikos Vlassis. Towards an optimal scoring policy for simulated soccer agents [R]. Technical Report IAS-UVA-01-06, Informatics Institute, University of Amsterdam,2001.
    [52]叶义成,柯丽华,黄德育.系统综合评价技术及其应用[M].北京：冶金工业出版社,2006
    [53]樊治平.多属性决策的一种新方法[J].系统工程,1994,12：25—27.
    [54]F.Rosenblatt. Principles of Neurodynamics:Perceptrons and Theory of Brain Mechanisms[M].Washington DC:Spartan Books,1962:1-2.
    [55]魏权龄.评价相对有效性的DEA方法[M].北京：中国人民大学出版社,2001.
    [56]韩学东,洪炳熔,孟伟.机器人足球射门算法研究[J].哈尔滨工业大学学报.2003,35(9)：1064—1066.
    [57]J.Kok, R.deBoer, N.Vlassis. Towards an Optimal Scoring Policy for Simulated Soccer Agents[C].In Proceedings of the seventh International Conference California,on Mar. Intelligent Autonomous Systems(IAS-7),Marina del Rey,2002:195-198.
    [58]Yang Zeng-guan, Li Long-shu. Study on Shooting Skill in RoboCup Simulator League[C].Proceedings of the Second International Conference on Machine Learning and Cybernetics,IEEE Systems,Man and Cybernetics Technical Committee on Cybernetics,IEEE Catalog Number:03EX693,2003,11:2089-2092.
    [59]Y.Matsuno, T.Yamazaki, J.Matsuda. A Multi-agent Reinforcement Learning Method Based on The Model Inference of The Other Agents[J]. Systems and Computers in Japan,2002,33(12):67-76.
    [60]吴继伟,杨定鹏,萧蕴诗.多智能体协作方法及其应用研究[J].控制与决策,2004,19(1)：216-218.
    [61]The Federation of Robot-soccer Asociation. The Official FIRA Website[EB/OL]. http://www.fira.net
    [62]杜向阳.“反省心理学”原理及心理治疗、天才培养、人工智能应用[EB/OL]. http://www.ppeeaa.com
    [63]Li F-zh, Kang Y, Qian X-p. Description of Differential Learning Conception[J].Asian Journal of information Technology,2005.
    [64]顾国昌,仲宇,张汝波.一种新的多智能体强化学习算法及其在多机器人协作任务中的应用[J].机器人,2003,25(4)：344—348.
    [65]B.Price, C.Boutilier. Accelerating Reinforcement Learning Through ImplicitImitation[J]. Journal ofArtificial Intelligence Research,2003,(19):569-629.
    [66]范波,潘泉,张洪才.多智能体学习中基于知识的强化函数设计方法[J].计算机工程与应用,2005,(3)：77-79.
    [67]李熊.多智能体系统的学习策略研究[D].广州：广东工业大学,2009.
    [68]张静.动态模糊机器学习模型及其应用研究[D].苏州：苏州大学,2007.
    [69]王静.基于DFL的自主学习子空间学习算法及应用研究[D].苏州：苏州大学,2008.
    [70]Xie Li-ping, Li Fan-zhang. The multi-agent learning model based on dynamic fuzzy logic[J].Journal of communication and computer,2006,3(3):87-89.
    [71]段爱华.基于DFL的Agent自主学习模型及其应用研究[D].苏州：苏州大学,2005.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700