机器人群体行为数学建模与定量分析方法研究

英文题名：Research on Mathematical Modeling and Quantitative Analysis of Robots Collective Behaviors
作者：宋勇
论文级别：博士
学科专业名称：模式识别与智能系统
中文关键词：移动机器人 ; 群体行为 ; 数学建模 ; 定量分析 ; 强化学习
英文关键词：mobile robots ; collective behavior ; mathematical modeling ; quantitative analysis ; reinforcement learning
学位年度：2012
导师：李贻斌
学科代码：081104
学位授予单位：山东大学
论文提交日期：2012-10-15

摘要

机器人群体行为是在机器人之间及机器人与环境之间的交互作用过程中涌现出来的,是一个高度复杂的动态的过程,其运动形式经常是混沌的。因此,现有的机器人行为建模与设计方法还不足以从机理上描述机器人群体行为的复杂性。机器人群体行为的科学分析方法就是要实现对机器人行为的数学建模与定量分析,这也是在实际机器人行为学习中亟待解决的关键理论和技术问题。
     本文通过机器人任务建模及机器人与环境交互建模获得机器人行为相关参数的数学描述,并建立机器人群体行为混沌动力学模型,通过对数学模型的研究与分析有助于理解机器人行为系统内部的作用规律。机器人群体行为学习主要研究机器人之间及机器人与环境进行社会性交互的机制,在社会性交互作用中涌现出复杂的群体行为。本文通过对机器人群体行为的定量分析与数学建模,建立关于机器人、任务和环境之间的社会性交互的完整理论框架体系。具体研究内容为：
     (1)针对机器人在强化学习过程中存在的收敛速度慢、组合爆炸等问题,提出了一种基于神经网络的移动机器人路径规划强化学习初始化方法。神经网络与机器人工作空间具有相同的拓扑结构,每一个神经元对应于状态空间中的一个离散状态。首先根据已知的部分环境信息对神经网络进行演化,直到达到平衡状态,这时每个神经元的输出值就代表该状态可获得的最大累积回报。然后将当前状态执行选定的动作获得的立即回报加上后继状态遵循最优策略获得的最大折算累积回报(最大累积回报乘以折算因子),即可对所有状态-动作对的Q(5,a)设定合理的初始值。通过Q值初始化能够将先验知识融入到学习系统中,对机器人初始阶段的学习进行优化,从而为机器人提供一个较好的学习基础。
     (2)针对机器人群体行为强化学习过程中算法收敛速度较慢的问题,提出了基于知识共享的顺序Q学习算法。在基于知识共享的顺序Q学习过程中,追捕机器人首先根据目标的运动状态利用聚类的方法形成不同的追捕团队,然后每一个团队内机器人按照一定顺序依次进行学习。每个机器人通过传感器获得当前环境状态,并判断其他机器人是否已经遇到过同样的环境状态,如果行为规则库中已经存在相同的状态,则根据知识库与行为规则库选择动作并对与其相对应的行为权重向量进行强化学习,否则,就将新的行为规则加入规则库。在对行为权重向量进行强化学习时,学习机器人利用加权策略共享为每个机器人分配相应的权值,并且利用所有机器人经验值的加权和来对行为权重进行强化学习。
     (3)将前两部分研究内容实现的机器人行为作为建模对象,利用分形建模思想建立机器人协作追捕行为的完整数学模型。在对机器人行为进行建模的过程中,利用机器人协作追捕行为系统整体与局部具有的功能自相似性,从宏观向微观逐层细化地建立不同层次的机器人行为模型。首先根据具体任务确定系统总体目标；然后利用宏观建模法建立多机器人协作追捕行为状态层次的数学模型,分析个体参数对机器人群体行为的影响；最后利用多项式建模法建立机器人与环境交互作用的行为层次数学模型。通过对机器人群体行为进行建模,可以分析关键参数对系统行为的影响,通过数学分析获得系统的最优化参数选择,为机器人群体行为的设计与分析提供必要的理论依据。
     (4)采用动力学系统理论分析机器人之间、机器人与环境之间的交互作用,利用系统中一个机器人的演化轨迹研究系统在多维相空间中的运动规律。首先采集一个机器人演化轨迹上不同时刻的数据点,选择适当的嵌入维与延迟时间,重构与原系统等价的相空间。相空间中的状态信息可以充分地描述多机器人系统,并且包含了对动力学系统进行状态预测所需的所有信息。然后分析相空间中吸引子的性质,计算吸引子的特征值,包括Lyapunov指数、关联维数、Kolmogorov熵等,根据吸引子的特征值对机器人群体行为进行定量地描述与分析。最后利用量化参数研究影响机器人交互作用的关键因素,加深对机器人交互作用机理的理解。
The collective behaviors of mobile robots emerge from the interaction between robots and environment. The evolution of behaviors is a highly complex dynamic process. The movement form of behavior is often chaotic. So most existing modeling and design methods of robots behaviors are insufficient to describe the complexity of robots collective behaviors on mechanism. The scientific methods in mobile robots collective behaviors are to realize the mathematical modeling and quantitative analysis of mobile robots behaviors. And this is a processing theroy and techology problem in the practical applications of robots behavior learning.
     This dissertation gives the mathematical description of the relevant parameters via tasks modeling and robot-environment interaction modeling. The chaos dynamics model of robots collective behavior is established. This dissertation will make analysis on the mathematical model to understand the acting law within the system better. The robots collective behavior learning is largely concerned with social interaction mechanism between robots and environment. The comlex collective behavior is emerged via the social interaction. This dissertation gioves a method of quantitative analysis and mathematical modeling for robots collective behavior, and basically sets up the complete theory frame system of social interaction among robots, tasks and environment. The main works and contributions are summarized as follows:
     (1) An initialization method is proposed for mobile robots path planning reinforcement learning based on neural network to improve the speed and the combination explosion of standard Q-learning algorithm. The neural network has the same topography as robots work space. Each neuron corresponds to a certain discrete state. The neural network will evolve and reach an equilibrium state according to the initial environment information. The activity of the special neuron denotes the maximum cumulative reward by following the optimal policy from the corresponding state. Then the initial Q values are defined as the immediate reward plus the maximum cumulative reward by following the optimal policy beginning at the succeeding state. The prior knowledge can be incorporated into the learning system by the initialization method of Q values. In this way, we can optimize the learning in the initial stage and give robots a better learning foundation.
     (2) Reinforcement learning algorithm for multi-robot will become very slow when the number of robots is increasing resulting in an exponential increase of state space. A sequential Q-learning base on knowledge sharing is presented. In the process of sequential Q-learning the pursuers firstly form teams based on clustering method. Each teammate evolves in sequence. The rule repository of robots behaviors is initialized in the process of reinforcement learning. Mobile robots obtain present environmental state by sensors. Then the state will be matched to determine if the relevant behavior rule has been stored in database. If the rule is present, an action will be chosen in accordance with the knowledge and the rules, and the corresponding weight will be refined. Otherwise the new rule will be joined into the database. In reinforcement learning of behavior weitht the learner assignes a corresponding weight to every robot based on weighted strategy sharing. The behavior weight will be refined based on the weighted sum of all robots experience value.
     (3) The robots behaviors in the previous two parts are modelled. The complete mathematical model of multi-robot cooperative capture behavior is established based on the fractal modeling thought. There exists certain degree of self-similarity between the whole and the part of multi-robot cooperative capture behavior system in the process of robots behavior modeling. The robots behavior model at different levels is explicitly established from macro to micro aspects. The overall system goal is firstly determined based on the specific task. Then the mathematical model of multi-robot cooperative capture behavior is established at state level based on the macroscopic modeling. In addition, the individual parameters that influence robots collective behaviors performance are analyzed theoretically. The mathematical model of interaction between robots and environment at behavior level is finally established based on the polynomial modeling. The critical parameters that influence robots collective behaviors performance are analyzed via mathematical model of robots collective behavior. The optimal parameters of system are chosen via mathematical analysis. This will provide the essential theroy basis for the design and analysis of robots collective behaviors.
     (4) The interaction between robots and environment is analyzed based on the dynamics system theory. The system law in the multidimensional phase space is investigated based on the evolution track of a special robot. The data points in different time are firstly obtained for the special robot. The appropriate embedding dimension and delay time are chosen. The phase space equivalent to the original system is reconstructed. The multi-robot system can be adequately described based on the information of the phase space. The dynamics system states can be forecasted based on the information. Then the property of the attractor in the phase space is analyzed. The eigenvalues of the attractor, including Lyapunov exponent, correlation dimension and Kolmogorov entropy, are calculated. The quantitative description and analysis of robots collective behavior is performed based on the eigenvalues. The key that influences the interaction of robots is finally investigated based on the quantized parameters. The analysis will make us understand robots interaction mechanism better.

引文

[1]Leitner J. Multi-robot cooperation in space:a survey[C]//Proceedings of the Advanced Technologies for Enhanced Quality of Life. Iasi, Rumania:IEEE,2009:144-151.
    [2]Bayindir L, Sahin E. A review of studies in swarm robotics[J]. Turkish Journal of Electrical Engineering,2007,15(2):115-147.
    [3]段勇,崔宝侠,徐心和.多智能体强化学习及其在足球机器人角色分配中的应用[J].控制理论与应用,2009,26(04)：371-376.
    [4]宋勇,李贻斌,李彩虹.递归神经网络的进化机器人路径规划方法[J].哈尔滨工程大学学报,2009,30(08)：898-902.
    [5]Brooks R A, Flynn A M. Fast, cheap and out of control [R]. Cambridge:Cambridge Artificial Intelligence Lab,1989.
    [6]张芳,林良明.多移动机器人协调系统体系结构与相关问题[J].机器人,2001,23(6)：554-559.
    [7]Cao Y U, Fukunaga A S, Kahng A. Cooperative mobile robotics:Antecedents and directions[J]. Autonomous Robots,1997,4(1):7-27.
    [8]Jin K, Liang P, Beni G. Stability of synchronized distributed control of discrete swarm structures[C]//Proceedings of the International Conference on Robotics and Automation. San Diego, CA:IEEE,1994:1033-1038.
    [9]王超越,谈大龙.协作机器人学的研究现状与进展[J].机器人,1998,20(1)：69-75.
    [10]Hanna H. Decentralized approach for multi-robot task allocation problem with uncertain task execution[C]//Proceedings of the International Conference on Intelligent Robots and Systems. Edmonton, CA:IEEE,2005:535-540.
    [11]Kushleyev A, Likhachev M. Time-bounded lattice for efficient planning in dynamic environments[C]//Proceedings of the International Conference on Robotics and Automation. Kobe, Japan:IEEE,2009:1662-1668.
    [12]Rausch W A, Levi P. Types of cooperation in the distributed robot systems [M]//Bolles R C, Bunke H, Noltemeier H. Inteliligent robots:sensing, modeling and planning. Singapore: World Scientific,1997:340-355.
    [13]Zhu A M, Yang S X. A survey on intelligent interaction and cooperative control of multi-robot systems[C]//Proceedings of the International Conference on Control and Automation. Xiamen, China:IEEE,2010:1812-1817.
    [14]Littman M L. Markov games as a framework for multi-agent reinforcement learning[C]//Proceedings of the International Conference on Machine Learning. San Francisco,California:IEEE,1994:157-163.
    [15]Hu J, Wellman M P. Nash Q-learning for general-sum stochastic games[J]. Journal of Machine Learning Research,2003,4(6):1039-1069.
    [16]Nolfi S, Floreano D. Learning and evolution[J]. Autonomous Robots,1999,7(1):89-113.
    [17]Mataric M J. Minimizing complexity in controlling a mobile robot population[C]//Proceedings of the International Conference on Robotics and Automation. Nice, France:IEEE,1992:830-835.
    [18]Kube C R, Zhang H. Collective robotics:From social insects to robots[J]. Adaptive Behavior,1993,2(2):189-218.
    [19]Asama H, Matsumoto A, Ishida Y. Design of an autonomous and distributed robot system: ACTRESS[C]//Proceedings of the International Conference on Intelligent Robots and Systems. Piscataway, NJ, USA:IEEE,1989:283-290.
    [20]Fukuda T, Nakagawa 9, Kawauchi Y. Structure decision method for self organising robots based on cell structures-CEBOT[C]//Proceedings of the International Conference on Robotics and Automation. Piscataway, NJ, USA:IEEE,1989:695-700.
    [21]Parker L E. ALLIANCE:An architecture for fault tolerant multirobot cooperation[J]. IEEE Transactions on Robotics and Automation,1998,14(2):220-240.
    [22]Kim J H, Vadakkepat P. Multi-agent systems:A survey from the robot-soccer perspective[J]. Intelligent Automation and Soft Computing,2000,6(1):3-18.
    [23]李实,徐旭明,叶榛等.国际机器人足球比赛及其相关技术[J].机器人,2000,22(5)：420-426.
    [24]Hamann H. Space-time continuous models of swarm robotic systems [M]. Berlin: Springer,2010.
    [25]陈白帆,蔡自兴,袁成.基于粒子群优化的移动机器人SLAM方法[J].机器人,2009,31(06)：513-517.
    [26]Gamier S, Gautrais J, Theraulaz G. The biological principles of swarm intelligence[J]. Swarm Intelligence,2007,1(1):3-31.
    [27]Pugh J, Martinoli A. Distributed scalable multi-robot learning using particle swarm optimization[J]. Swarm Intelligence,2009,3(3):203-222.
    [28]Moutarde F. A robot behavior-learning experiment using particle swarm optimization for training a neural-based animat[C]//Proceedings of the International Conference on Control, Automation, Robotics and Vision. Hanoi, Vietnam:IEEE,2008:1742-1746.
    [29]Chiou J S, Wang K Y, Shieh M Y. The optimization of the application of fuzzy ant colony algorithm in soccer robot[C]//Proceedings of the International Conference on Information and Automation. Macau:IEEE,2009:681-686.
    [30]Chen X B, Zhang Y. Affine transformation of multiple mobile robot formation by generalized ant colony optimization[C]//Proceedings of the International Conference on Robotics, Automation and Mechatronics. Chengdu:IEEE,2008:532-536.
    [31]Floreano D, Keller L. Evolution of adaptive behaviour in robots by means of darwinian selection[J]. Biology,2010,8(1):1-8.
    [32]Stanley K O, Miikkulainen R. Competitive coevolution through evolutionary complexification[J]. Journal of Artificial Intelligence Research,2004,21(1):63-100.
    [33]Nelson A L, Grant E. Developmental analysis in evolutionary robotics[C]//Proceedings of the IEEE Mountain Workshop on Adaptive and Learning Systems. Logan, UT:IEEE, 2006:201-206.
    [34]Prieto A, Becerra J A, Bellas F. Open-ended evolution as a means to self-organize heterogeneous multi-robot systems in real time[J]. Robotics and Autonomous Systems, 2010,58(12):1282-1291.
    [35]Johnson J H, Iravani P. The multilevel hypernetwork dynamics of complex systems of robot soccer agents[J]. ACM Transactions on Autonomous and Adaptive Systems.2007, 2(2):1-23.
    [36]Sakamoto K, Zhao Q F. Generating smart robot controllers through co-evolution[C]//Proceedings of the International Conference on Embedded and Ubiquitous Computing. Nagasaki, Japan:Springer,2005:529-537.
    [37]Uchibe E, Yanase M, Asada M. Behavior generation for a mobile robot based on the adaptive fitness function[J]. Robotics and Autonomous Systems,2002,40(2):69-77.
    [38]Dolin B, Bennett F H, Rieffel E G. Co-evolving an effective fitness sample:experiments in symbolic regression and distributed robot control[C]//Proceedings of the ACM Symposium on Applied Computing. Machid, Spain:ACM,2002:553-559.
    [39]Uchibe E, Asada M. Incremental coevolution with competitive and cooperative tasks in a multirobot environment[J]. Proceedings of the IEEE,2006,94(7):1412-1424.
    [40]Uchibe E, Nakamura M, Asada M. Co-evolution for cooperative behavior acquisition in a multiple mobile robot environment[C]//Proceedings of the International Conference on Intelligent Robots and Systems. Victoria, BC, Canada:IEEE,2002:425-430.
    [41]Prieto A, Bellas F, Becerra J, et al. Self-organizing robot teams using asynchronous situated co-evolution [M]//Doncieux S. From Animals to Animats 11. Berlin:Springer, 2010:565-574.
    [42]Busoniu L, Babuska R, De Schutter B. A comprehensive survey of multiagent reinforcement learning[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews,2008,38(2):156-172.
    [43]Wang Y, De Silva C W. Extend single-agent reinforcement learning approach to a multi-robot cooperative task in an unknown dynamic environment[C]//Proceedings of the International Joint Conference on Neural Networks. Vancouver, Canada:IEEE,2006: 4999-5005.
    [44]Ghavamzadeh M, Mahadevan S, Makar R. Hierarchical multi-agent reinforcement learning[J]. Autonomous Agents and Multi-Agent Systems,2006,13(2):197-229.
    [45]Gu D B, Yang E F. Fuzzy policy reinforcement learning in cooperative multi-robot systems[J]. Journal of Intelligent & Robotic Systems,2007,48(1):7-22.
    [46]Fernandez F, Borrajo D, Parker L E. A reinforcement learning algorithm in cooperative multi-robot domains[J]. Journal of Intelligent and Robotic Systems,2005,43(2):161-174.
    [47]Uchibe E, Asada M, Hosoda K. Cooperative behavior acquisition in multi-mobile robots environment by reinforcement learning based on state vector estimation[C]//Proceedings of the International Conference on Robotics and Automation. Leuven, Belgium:IEEE, 1998:1558-1563.
    [48]Lin L, Xie H, Zhang D, et al. Supervised neural Q-learning based motion control for bionic underwater robots[J]. Journal of Bionic Engineering,2010,7(Suppl):S177-S184.
    [49]Oh C H, Nakashima T, Ishibuchi H. Initialization of Q-values by fuzzy rules for accelerating Q-learning[C]//Proceedings of the World Congress on Computational Intelligence. Anchorage, AK,USA:IEEE,2002:2051-2056.
    [50]Wiewiora E. Potential-based shaping and Q-value initialization are equivalent[J]. Journal of Artificial Intelligence Research,2003,19(1):205-208.
    [51]Thomaz A L, Hoffman G, Breazeal C. Reinforcement learning with human teachers: Understanding how people want to teach robots[C]//Proceedings of the International Symposium on Robot and Human Interactive Communication. Hatfield:IEEE,2007: 352-357.
    [52]Arsenio A M. Developmental learning on a humanoid robot[C]//Proceedings of the International Joint Conference on Neural Networks. Budapest:IEEE,2005:3167-3172.
    [53]Mirza N A, Nehaniv C L, Dautenhahn K, et al. Developing social action capabilities in a humanoid robot using an interaction history architecture[C]//Proceedings of the International Conference on Humanoid Robots. Daejeon, Korea:IEEE,2008:609-616.
    [54]Mitchell T M.机器学习[M].曾华军等.北京：机械工业出版社,2009.
    [55]刘春阳,谭应清,柳长安等.多智能体强化学习在足球机器人中的研究与应用[J].电子学报,2010,38(08)：1958-1962.
    [56]Cassandra A R, Kaelbling L P, Littman M L. Acting optimally in partially observable stochastic domains[C]//Proceedings of the Twelfth National Conference on Artificial Intelligence. Washington, U S:Citeseer,1995:1023-1023.
    [57]Bouzid M, Chevrier V, Vialle S, et al. Parallel simulation of a stochastic agent/environment interaction model[J]. Integrated Computer-Aided Engineering,2001, 8(3):189-203.
    [58]Spaan M T J, Goncalves N, Sequeira J. Multirobot coordination by auctioning POMDPs[C]//Proceedings of the International Conference on Robotics and Automation. Alaska, U S:IEEE,2010:1446-1451.
    [59]Sun X Q, Mao T, Kralik J D, et al. Cooperative multi-robot reinforcement learning:a framework in hybrid state space[C]//Proceedings of the International Conference on Intelligent Robots and Systems. Louis, USA:IEEE,2009:1190-1196.
    [60]Liu Z, Ang M H, Seah W K G. Reinforcement learning of cooperative behaviors for multi-robot tracking of multiple moving targets[C]//Proceedings of the International Conference on Intelligent Robots and Systems. Alberta, Canada:IEEE,2005:1289-1294.
    [61]Nehmzow U.移动机器人学科学方法[M].张文增.北京：机械工业出版社,2010.
    [62]Lerman K, Galstyan A. Mathematical model of foraging in a group of robots:Effect of interference[J]. Autonomous Robots,2002,13(2):127-141.
    [63]Nehmzow U. Robot behaviour:Design, description, analysis and modelling [M]. London: Springer,2009.
    [64]Li Yi-Bin, Li Cai-Hong, Song Yong. Study on chaotic prediction of adaptive behavior for wheeled mobile robot[C]//Proceedings of the 8th World Congress on Intelligent Control and Automation. Jinan:IEEE,2010:6451-6455.
    [65]Li Cai-Hong, Li Yi-Bin, Song Yong, et al. Adaptive behavior analysis of WMR based on chaotic phase space reconstruction [C]//Proceedings of the 8th World Congress onIntelligent Control and Automation. Jinan:IEEE,2010:6441-6445.
    [66]Lerman K, Galstyan A. A general methodology for mathematical analysis of multi-agent systems [R]. California:University of California, Information Sciences Institute,2001.
    [67]曾建潮,薛颂东.群机器人系统的建模与仿真[J].系统仿真学报,2010,22(06)：1327-1330.
    [68]Lee D W, Seo S W, Sim K B. Online evolution for cooperative behavior in group robot systems[J]. International Journal of Control, Automation and Systems,2008,6(2): 282-287.
    [69]Schaal S, Atkeson C. Learning control in robotics[J]. IEEE Robotics & Automation Magazine,2010,17(2):20-29.
    [70]李瑞,潘启树,洪灿镕.一种基于案例推理的多agent强化学习方法研究[J].机器人,2009,31(04)：320-326.
    [71]Andersen K T, Zeng Y, Christensen D D, et al. Experiments with online reinforcement learning in real-time strategy games[J]. Applied Artificial Intelligence,2009,23(9): 855-871.
    [72]Gaskett C. Q-learning for robot control [D]. Canberra:The Australian National University,2002.
    [73]Dung L T, Komeda T, Takagi M. Reinforcement learning for POMDP using state classification[J]. Applied Artificial Intelligence,2008,22(7):761-779.
    [74]Manju M S, Punithavalli M M. An analysis of Q-Learning algorithms with strategies of reward function[J]. International Journal on Computer Science and Engineering,2011, 3(02):814-820.
    [75]Matignon L, Laurent G J, Le Fort-Piat N. Reward function and initial values:better choices for accelerated goal-directed reinforcement learning[C]//Proceedings of the International Conference on Artificial Neural Networks. Athenes:Springer,2006: 840-849.
    [76]Oh C H, Nakashima T, Ishibuchi H. Initialization of Q-values by fuzzy rules for accelerating Q-learning[C]//Proceedings of the World Congress on Computational Intelligence. Anchorage:IEEE,1998:2051-2056.
    [77]宋勇,李贻斌,栗春等.基于神经网络的移动机器人路径规划方法[J].系统工程与电子技术,2008,30(02)：316-319.
    [78]Lebedev D V, Steil J J, Ritter H. Real-time path planning in dynamic environments:a comparison of three neural network models[C]//Proceedings of the International Conference on Systems, Man and Cybernetics. Washington, D.C. USA:IEEE,2003: 3408-3413.
    [79]Liu Ji-Ming,靳小龙,张世武.多智能体模型与实验[M].北京：清华大学出版社,2003.
    [80]Kim D H. Self-organization of unicycle swarm robots based on a modified particle swarm framework[J]. International Journal of Control, Automation and Systems,2010, 8(3):622-629.
    [81]Wang Y, Desilva C. A machine-learning approach to multi-robot coordination[J]. Engineering Applications of Artificial Intelligence,2008,21(3):470-484.
    [82]Halloy J, Sempo G, Caprari G, et al. Social integration of robots into groups of cockroaches to control self-organized choices[J]. Science,2007,318(5853):1155-1158.
    [83]Holland O, Melhuish C. Stigmergy, self-organization, and sorting in collective robotics[J]. Artificial Life,1999,5(2):173-202.
    [84]Sang-Wook S, Hyun-Chang Y, Kwee-Bo S. Behavior learning and evolution of swarm robot system for cooperative behavior[C]//Proceedings of the International Conference on Advanced Intelligent Mechatronics. Singapore:IEEE/ASME,2009:673-678.
    [85]Givigi S N, Schwartz H M. Swarms of robots based on evolutionary game theory[C]//Proceedings of the International Conference on Control and Applications. Montreal, Canada:ACTA Press,2007:1-7.
    [86]Son J H, Ahn H S. Cooperative reinforcement learning:brief survey and application to bio-insect and artificial robot interaction[C]//Proceedings of the International Conference on Mechtronic and Embedded Systems and Applications. Beijing, China:IEEE,2008: 71-76.
    [87]Kobayashi K, Nakano K, Kuremoto T, et al. Cooperative behavior acquisition of multiple autonomous mobile robots by an objective-based reinforcement learning system[C]//Proceedings of the International Conference on Control, Automation and Systems. Seoul, Korea:IEEE,2007:777-780.
    [88]Fernandez F, Borrajo D, Parker L E. A reinforcement learning algorithm in cooperative multi-robot domains[J]. Journal of Intelligent & Robotic Systems,2005,43(2):161-174.
    [89]Ahmadabadi M N, Asadpour M, Nakano E. Cooperative Q-learning:the knowledge sharing issue[J]. Advanced Robotics,2001,15(8):815-832.
    [90]谭民,王硕,曹志强.多机器人系统[M].北京：清华大学出版社,2005.
    [91]Kanda T, Glas D F, Shiomi M, et al. Abstracting people's trajectories for social robots to proactively approach customers[J]. IEEE Transactions on Robotics,2009,25(6): 1382-1396.
    [92]Liu C A, Liu F, Liu C Y, et al. Multi-agent reinforcement learning based on K-means algorithm[J]. Chinese Journal of Electronics,2011,20(3):414-418.
    [93]孙古贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008,19(1)：48-61.
    [94]Ahmadabadi M N, Asadpur M, Khodanbakhsh S H, et al. Expertness measuring in cooperative learning[C]//Proceedings of the International Conference on Intelligent Robots and Systems. Takamatsu:IEEE,2000:2261-2267.
    [95]Araabi B N, Mastoureshgh S, Ahmadabadi M N. A study on expertise of agents and its effects on cooperative Q-Learning[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B:Cybernetics,2007,37(2):398-409.
    [96]Stone P, Veloso M. Multiagent systems:A survey from a machine learning perspective[J]. Autonomous Robots,2000,8(3):345-383.
    [97]Galstyan A, Hogg T, Lerman K, et al. Modeling and mathematical analysis of swarms of microscopic robots[C]//Proceedings of the IEEE Swarm Intelligence Symposium. Pasadena:IEEE,2005:201-208.
    [98]Thrun S. Probabilistic algorithms in robotics [R]. Pittsburghers:Carnegie Mellon University,2000.
    [99]Burgard W, Moors M, Fox D. Collaborative multi-robot exploration[C]//Proceedings of the International Conference on Robotics and Automation. San Francisco:IEEE,2000: 476-481.
    [100]Fox D, Burgard W, Kruppa H, et al. A probabilistic approach to collaborative multi-robot localization[J]. Autonomous Robots,2000,8(3):325-344.
    [101]Lerman K, Galstyan A, Martinoli A, et al. A macroscopic analytical model of collaboration in distributed robotic systems[J]. Artificial Life,2001,7(4):375-393.
    [102]Martinoli A, Easton K. Optimization of swarm robotic systems via macroscopic models[C]//Proceedings of the Multi-Robot Systems Workshop. Washington:IEEE, 2003:181-192.
    [103]陈铿,韩伯棠.混沌时间序列分析中的相空间重构技术综述[J].计算机科学,2005,34(4)：67-70.
    [104]吕金虎,陆君安,陈士华.混沌时间序列分析及应用[M].武汉：武汉大学出版社,2002.
    [105]Nehmzow U. Quantitative analysis of robot-environment interaction towards "scientific mobile robotics"[J]. Robotics and Autonomous Systems,2003,44(1):55-68.
    [106]韩敏.混沌时间序列预测理论与方法[M].北京：中国水利水电出版社,2007.
    [107]Han Min, Liu Yuhua, Xi Jianhui, et al. Noise smoothing for nonlinear time series using wavelet soft threshold[J]. Signal Processing Letters,2007,14(1):62-65.
    [108]Khazaee P R, Mozayani N, Motlagh M R J. A genetic-based input variable selection algorithm using mutual information and wavelet network for time series prediction[C]//Proceedings of the International Conference on Systems, Man and Cybernetics. Singapore:IEEE,2008:2133-2137.
    [109]Kacimi S, Laurens S. The correlation dimension:A robust chaotic feature for classifying acoustic emission signals generated in construction materials[J]. Journal of Applied Physics,2009,106(2):1-8.
    [110]Lin Shukuan, Qiao Jianzhong, Wang Guoren, et al. Phase space reconstruction of nonlinear time series based on kernel method[C]//Proceedings of the Sixth World Congress on Intelligent Control and Automation. Dalian:IEEE,2006:4364-4368.
    [111]Ding Tao, Xie Hongfei. Chaotic time series prediction based on radial basis function network[C]//Proceedings of the International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing. Qingdao:IEEE, 2007:595-599.
    [112]Kim H S, Eykholt R, Salas J D. Nonlinear dynamics, delay times, and embedding windows[J]. Physica D:Nonlinear Phenomena,1999,127(1):48-60.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700