群体机器人系统分布式协同控制方法与协同行为分析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
本文主要研究了群体机器人系统的协同适应性问题。目的是通过基于局部信息交互下分布式控制、优化与学习,实现群体机器人系统对于动态复杂环境的适应,进而揭示群体智能系统中涌现行为的规律及其可控性。分别从群体机器人运动学模型、动力学模型以及分布式强化学习三个方面四个角度进行了较为深入的研究。
     1、对于运动学模型下协作,通过结合Vicsek模型及人工势场法,实现了群体机器人在复杂环境中的同步运动,借助人工协调场方法实现了群体机器人的避障运动。
     2、对于动力学模型下的协同,通过结合虚拟力、外部环境因素以及连接矩阵的方法,针对静态环境和动态环境分别设计了基于局部信息交互的分布式控制器,并进行了相应的稳定性分析,实现了群体机器人系统的同步运动,利用改进的粒子群优化算法对控制器中的参数进行优化,降低了群体机器人系统的能量损耗。
     3、提出了基于内部平均动能的分布式控制器设计方法,并进行了稳定性分析,有效地实现了群体机器人在复杂环境下的觅食任务。
     4、对于分布式强化学习,由于状态空间过大,机器人数量较多,导致强化学习的收敛速度过慢,适量的通信能够加快学习速度,这里提出几种基于黑板结构通信的协作强化学习算法,充分利用了多机器人系统的分布式感知能力去探索学习空间与收集经验,进而提高强化学习收敛速度。
     本文研究得到了国家自然科学基金项目“复杂环境下群体机器人系统协同适应性理论与方法研究”(60675057)以及吉林大学2009研究生创新研究计划“动态环境下群体机器人协同方法研究”(20091020)的资助。
With the development of computing technology, sensor technology, communication technology, control theory, artificial intelligence, and some research of robot formed by a number of interdisciplinary have also entered a new stage. Thus swarm robotics is produced by some social insects in nature-inspired. Swarm robots system is a special class of multi-robot system, with the features of robustness, adaptability, and scalability.
     Research on swarm robotic system has important significance in theory and practice. In theory, with the further research of swarm robotics, it will help reveal the emergence of a fundamental mechanism for intelligent behavior. In practice, a mature swarm robots system can be in the ship manufacturing, product assembly, transportation systems, military equipment, aerospace and other areas of the completion of certain dangerous work independently and therefore have high potential applications.
     Coadaptivity for swarm robots system means the ability of the autonomous robot optimize their own control strategies constantly, and adjust behavior to meet the dynamic changes in the environment and the characteristics of the task, and ultimately the overall optimality through the local information with other robot and external environment in a complex dynamic environment.
     This dissertation research on some problems of the coadaptivity for swarm robots system, which are based on the tasks of foraging and flocking control. The work is supported by the National Natural Science Fund of China under Grant 60675057.Name of the project is“coadaptivity theory and methods research for swarm robots system in a complex dynamic environment”.The major work of this dissertation studies on the following four aspects:
     1. The distributed control strategy is studied for flocking under swarm robots system kinematics model. The achievement and maintenance of flocking formation is implemented for swarm robots systems though the method combined the Vicsek model and improved artificial potential field in an environment without obstacles. In order to achieve flocking and avoiding the static obstacles in the environmentrun, the combination of update rule for Vicsek model and artificial coordinating field is adopted to design coorespoding control strategy, then the improved particle swarm optimization algorithm is presented to optimize the coorespoding parameters in order to achieve a stable flocking behavior. The data of simulation experiments show that the distributed control strategy can implement the flocking behavior for swarm robots system effectively in the absence of or with static obstacles environment.
     2. The distributed control strategy is designed for flocking under swarm robots system dynamics model based on local information exchange. Analysis of the stability of the distributed controller, and estimates the corresponding finish time of the flocking behavior. The improved particle swarm optimization algorithm is presented to optimize the coorespoding parameters in order to minimize the energy consumption during the process of the motion. In order to solve the problem how to design the controller cause the swarm robots system flocking in an dynamic environment. The distributed control strategy is designed based on local information exchange. In order to analyze the stability of the non-autonomous system, Barbalat lemma is introduced under the coorespoding assumption,such that the speed of all the individuals converge to the same curves. Simulation results show that for the above two cases, the designed distributed control strategy can achieve a stable flocking behavior for the swarm robots system effectively and rapidly.
     3. The distributed c ontrol strategy is designed for social swarm foraging in an consistent environment under swarm robots system dynamics model based on interal average kinetic energy, so that the swarm robots system can finish the foraging task efficiently under the non-flocking condition based on local information exchange, and prove that the value of intermal average kinetic energy will eventually converge to the prior expectation value in a damping environment. Simulation results show that the convergence lower value of internal average kinetic energy, make the swarm robots system as a whole cover a smaller area in the search space, by contrast the convergence higher value of internal average kinetic energy, make the swarm robots system as a whole cover a larger area in the search space, then the swarm robots system can find the extreme value of the environment function more efficiently, to finish the swarm social foraging task.
     4. The method of cooperative Q-learning is presented based on blackboard architecture, targeted at some shortcomings as follows: poor scalability of the point to point communication, too much communication traffic and too slow convergence speed of reinforcement learning. The learning process is executed at the blackboard architecture making use of the advantage of robots number and distributed sensing capability in the training scenario to explore the learning space and collect experiences. Communication is essential for swarm robots system which can be used to share experiences, parameters and control policies. Resent research proofed that proper communication can largely improve the performance of swarm robots system.It can achieve to independence of each robot reinforcement learning in experience sharing by learning automata and improved particle swarm optimization algorithm. Simulation results show that the model can improve the learning speed and reduce communication traffic.
     In summary, some problems of the coadaptivity for swarm robots system are studied in this dissertation. The main purpose of this work is to establish a complete theory of coadaptivity and its implementation for swarm robots system. And then, Simulation experiments are performed for the purpose of related verification and analysis.
引文
[1] Marco Dorigo, Erol Sahin. Guest Editorial: Special Issue on Swarm Robotics [J]. Autonomous Robots, 2004,17:111-113.
    [2] Erol Sahin, Alan Winfield. Special Issue on Swarm Robotics [J]. Swarm Intelligence, 2008,2:69-72.
    [3] Erol Sahin. Swarm Robotics: From Sources of Inspiration to Domains of Application [C]. Swarm Robotics Workshop 2004. Lecturer Notes on Computer Science, 2005,3342,:10-20.
    [4]谭民,王硕,曹志强.多机器人系统[M].北京:清华大学出版社, 2005.
    [5]谭民,范永,徐国华.机器人群体协作与控制的研究[J].机器人, 2001, 23(2):178-182.
    [6] Hiroaki Yamaguchi,Gerardo Beni.Distributed Autonomous Formation Control of Mobile Robot Groups by Swarm-based Pattern Generation[J]. Distributed Autonomous Robotic Systems. 1996,2:141-155.
    [7] Hisashi Osumi. Cooperative Strategy for multiple Position-controlled Mobile Robots[J]. Distributed Autonomous Robotic Systems. 1996,2:374-385.
    [8] R. Kumar, J. A. Stover. A Behavior-based Intelligent Control Architecture with Application to Coordination of Multiple underwater Vehicles [J]. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Cybernetics, 2000,30(6):767-784.
    [9]李夏,戴汝为,系统科学与复杂性(I)[J].自动化学报, 1998,24(2):200-207.
    [10]李夏,戴汝为,系统科学与复杂性(II)[J].自动化学报, 1998,24(2):476-483.
    [11] Martial F V. Coordinating Plans of Autonomous agents[M]. New York: Springer-Verlag, 1992.
    [12] Veysel Gazi and Kevin M.Passino, Stability Analysis of Swarms[J], IEEE Transaction on Automatic and Control, 2003,48(4):692-697.
    [13] Veysel Gazi and Kevin M.Passino, A Class of Attractions/repulsion Functions for Stable Swarm Aggregation[J], INT. J. CONTROL, 2004,77(18):1567-1579.
    [14] Yanfei Liu and Kevin M.Passino, Stable Social Foraging Swarms in a Noisy Environment[J],IEEE Transaction on Automatic and Control, 2004,49(1):30-44.
    [15] Veysel Gazi and Kevin M.Passino, Stability of a One-dimensional Discrete-time Asynchronous Swarm[J], IEEE Transaction on System, Man, and Cybernetics, Part B:Cybernetics. 2005,35(4):834-841.
    [16] Jingyi Yao and Veysel Gazi Swarm Tracking Using Artificial Potentials and Sliding Mode Control[J], Journal of Dynamic Systems, Measurement, and Control,2007,129(5):749-755.
    [17] Veysel Gazi, Stability of an Asynchronous Swarm With Time-Dependent Communication Links[J], IEEE Transaction on System, Man, and Cybernetics, Part B: Cybernetics. 2008,38(1):267-274.
    [18]刘志新,郭雷. Vicsek模型的连通与同步[J].中国科学E辑:信息科学, 2007,37(8):979-988.
    [19]唐共国,郭雷,线性化Vicsek模型的同步性分析[C],第25届中国控制会议论文集,379-382,哈尔滨, 2006.
    [20] Zhixin Liu, Lei Guo, Synchronization of Vicsek Model with Large Population[C], Proceedings of the 26th Chinese Control Conference,pages:673-677, Zhangjiajie, 2007
    [21] Lin Wang, Zhixin Liu, and Lei Guo, Robust Consensus of Multi-agent Systems with Noise[C], Proceedings of the 26th Chinese Control Conference,pages:737-741, Zhangjiajie, 2007
    [22] Jing Han, Ming Li and Lei Guo, Soft Control on Collective Behavior of a Group of Autonomous agent y a Shill agent [J]. Journal of Systems Science and Complexity 2006,19(1):54-62.
    [23] Bo Liu, Tianguang Chu, Long Wang, and Guangming Xie Controllability of a Leader–Follower Dynamic Network With Switching Topology[J] IEEE Transaction on Automatic and Control,2008,53(4):1009-1013.
    [24] Fangcui Jiang and Long Wang, Finite-time Information Consensus for Multi-agent Systems with Fixed and Switching Topologies[J] Physica D, 2009,238:1550-1560.
    [25] Feng Xiao and Long Wang, Asynchronous Consensus in Continuous-Time Multi-agent Systems with Switching Topology and Time-varying Delays[J] ,IEEE Transaction on Automatic and Control, 2008,53(8):1804-1816.
    [26] Yuan Gong Sun and Long Wang, Consensus of Multi-agent Systems in Directed Networks With Nonuniform Time-Varying Delays [J]. IEEE Transaction on Automatic and Control, 2009,54(7):1607-1613.
    [27] Guangming Xie, Long Wang and Yingmin Jia, Output Agreement in high-dimensional multi-agent systems[C] American Control Conference. St. Louis, MO, USA, pages:2243-2248, 2009
    [28] Guangming Xie, Huiyang Liu, Long Wang and Yingmin Jia, Consensus in Networked Multi-agent Systems via Sampled Control: Switching Topology Case[C] AmericanControl Conference. St. Louis, MO, USA, pages:4525-4530, 2009
    [29] Fangcui Jiang, Long Wang and Yingmin Jia, Consensus in Leaderless Networks of High-Order-Integrator agents[C] American Control Conference. St. Louis, MO, USA, pages:4458-4463, 2009
    [30] Zonggang Li, Yingmin Jia, Junping Du, and Shiying Yuan, Flocking for Multi-agent Systems with Switching Topology in a Noisy Environment[C] American Control Conference. Seattle, Washington, USA, pages:111-116, 2008
    [31] Peng Lin, Yingmin Jia, Consensus of second-order discrete-time multi-agent systems with nonuniform time-delays and dynamically changing topologies[J], Automatica ,2009,45:2154-2158.
    [32] Yang Liu, Yingmin Jia, Junping Du and Shiying Yuan, Dynamic Output Feedback Control for Consensus of Multi-agent Systems: An H? Approach[C], American Control Conference, pages: 4470-4475, 2009
    [33] Li Xiao, Xi Yugeng, Flocking of Multi-agent Dynamic Systems with Guaranteed Group Connectivity, Proceedings of the 27th Chinese Control Conference, pages:546-551, 2008
    [34] Xiaoli Li, Yugeng Xi, Space Coverage Control for Distributed Multi-agent Systems with Preserved Connectivity[C], Proceedings of the 7th Asian Control Conference, pages:111-116, 2009
    [35]张飞,陈卫东,席裕庚,多机器人协作探索的改进市场法[J],控制与决策,2005,20(5):517-524.
    [36] Shi Guodong, Hong Yiguang. Multi-agent Coordination with Switching Interaction Structures and Heterogeneous agents[C]. Proceedings of the 27th Chinese Control Conference,pages:531-535, 2008
    [37] Y. Hong, J. Hu, and L. Gao, Tracking Control for Multi-agent Consensus with an Active Leader and Variable Topology[J], Automatica,2006,42:1177-1182.
    [38] J. Hu and Y. Hong, Leader-following Coordination of Multi-agent Systems with Coupling Time Delays[J], Physica A, 2008,374:853-863.
    [39] Y. Hong, G. Chen, and L. Bushnell, Distributed Observers Design for Leader-following Control of Multi-agent Networks[J], Automatica, 2008, 44:846-850.
    [40] Y. Hong and Xiaoli Wang Multi-agent Tracking of a High-Dimensional Active Leader with Switching Topology [J]. Journal of Systems Science and Complexity, 2009,22(4):722-731.
    [41] G. Shi and Y. Hong, Global Target Aggregation and State Agreement of Nonlinear Multi-agent Systems with Switching Topologies[J], Automatica,2009, 45:1165-1175.
    [42]刘成林,田玉平,具有不同通信时延的多个体系统的一致性[J],东南大学学报(自然科学版),2008,38(1):170-174.
    [43] Tian Yu-Ping, Liu Cheng-Lin, Consensus of multi-agent systems with diverse input and communication delays[J], IEEE Transaction on Automatic and Control, 2008,53:2122-2128.
    [44] Tian Yu-Ping, Liu Cheng-Lin, Robust consensus of multi-agent systems with diverse input delays and asymmetric interconnection perturbations[J], Automatic, 2009,45(5):1347-1353.
    [45] Chen Yangyang; Tian Yuping, Formation control of three-coleader agents in the plane via back stepping design [C]. 27th Chinese Control Conference, pages:530-534, 2008
    [46] Liu Cheng-Lin, Tian Yu-Ping. Formation control of multi-agent systems with heterogeneous communication delays[J]. International Journal of Systems Science, 2009,40(6):627-636.
    [47]李世华,田玉平,非完整移动机器人的有限时间跟踪控制算法研究[J],控制与决策,2005,20(7):750-754.
    [48]童亮,陆际联,基于强化学习的多智能体协作方法研究[J],计算机测量与控制,2005,13(2):174-176.
    [49] M. L. Littman, Value-function Reinforcement Learning in Markov Games, Journal of Cognitive Systems Research[J], 2001,2:55–66.
    [50] J.Hu and M.P.Wellman, Multiagent Reinforcement Learning in Stochastic Games[Online]. Available:citeseer.ist.psu.edu/hu99multi-agent.html, 1999.
    [51]蔡庆生,张波,一种基于智能体团队的强化学习模型与应用研究[J],计算机研究与发展,2000,37(9):1087-1093.
    [52]李冬梅,陈卫东,席裕庚,基于强化学习的多机器人合作行为获取[J],上海交通大学学报,2005,39(8):1331-1335.
    [53]仲宇,顾国昌,张汝波,多智能体系统中的分布式强化学习研究现状[J],控制理论与应用,2003,20(3):317-322.
    [54]祖丽楠.多机器人系统自主协作控制与强化学习研究[D].吉林大学博士学位论文, 2006.
    [55]梅昊.群体机器人系统协同适应性研究[D].长春:吉林大学硕士学位论文, 2007.
    [56]杨永明.群体机器人系统协同行为研究[D].长春:吉林大学博士学位论文, 2009.
    [57] T. Balch, M. Hybinette, Social Potentials for Scalable Multi-Robot Formations[C], IEEE International Conference on Robotics and Automation, San Francisco pages:544-550, 2000.
    [58] A. Hayes, P. Dormiani-Tabatabaei, Self-Organized Flocking with agent Failure: Off-Line Optimization and Demonstration with Real Robots[C], IEEE International Conference on Robotics and Automation, Washington DC, USA, pages:3900-3905, 2002.
    [59] A. Howard, M. Mataric, G. Sukhatme, An Incremental Self-Deployment Algorithm for Mobile Sensor Networks[J],Autonomous Robots, Special Issue on Intelligent Embedded Systems, 2002,3:113-126.
    [60] A. Howard, M. Mataric, G. Sukhatme, Mobile Sensor Network Deployment using Potential Fields: A Distributed, Scalable Solution to the Area Coverage Problem[C], DARS 02, Fukuoka, Japan, pages:77-81, 2002.
    [61] V. Trianni, R. Grob, T. Labella, E. S?ahin, M. Dorigo,Evolving Aggregation Behaviors in a Swarm of Robots[J], Lecture Notes in Artificial Intelligence,2003,28(1):865-874, 2003.
    [62] C.M.Breder Equations Descriptive of Fish Schools and Other Animal Aggregations[J],Ecology, 1954,35:361-370.
    [63] D.Grunbaunm, Translating Stochastic Density-dependent Individual Behavior to a Contiuum Model of Animal Swarming[J], Journal of Mathematical Biology,1994,33:139-161.
    [64] James Kennedy and Russell Eberhart Particle Swarm Optimization [C]. IEEE International Conference on Neural Networks, pages:1942-1948, 1995.
    [65]焦李成公茂果等,自然计算、机器学习与图像理解前沿[M],西安,西安电子科技大学出版社,2008.
    [66] Shi Y H,Russell Eberhart. A Modified Particle Swarm Optimizer[C], IEEE International Conference on Evolutionary Computation,pages:69-73, 1998.
    [67] Shi Y H, Russell Eberhart. Empirical Study of Particle Swarm Optimization[C], Proceedings of the Congress on Evolutionary Computation,1999,3:1945-1950.
    [68] Shi Y H,Russell Eberhart. Fuzzy Adaptive Particle Swarm Optimization[C], Proceedings of the Congress on Evolutionary Computation, vol.1,pages:101-106, 2001.
    [69] Clerc M. The Swarm and Queen:Towards a Deterministic and Adaptive Particle Swarm Optimization [C] Proceedings of the Congress on Evolutionary Computation, vol.3,pages:1951-1957, 2002.
    [70] Fan H Y,Shi Y H,Study of Vmax of the Particle Swarm Optimization Algorithm[C], Proceedings of the Workshop on Particle Swarm Optimization, pages:1-13, 2001.
    [71] Angeline P J.Using Selection to Improve Particle Swarm Optimization[C] IEEE International Conference on Evolutionary Computation, pages:84-89, 1998.
    [72] Suganthan P N. Particle Swarm Optimuzer with Neighborhood Operator[C] Proceedings of the Congress on Evolutionary Computation, pages:1959-1962, 1999.
    [73] Fieldsend J E,Singh S. A Multi-objective Algorithm Based upon Particle Swarm Optimization, an Efficient Data Structure and Turbulence[C] Proceedings of the UK Workshop on Computational Intelligence, pages:37-44, 2002.
    [74] Coello C A,Pulido G T, Handing Multiple Objectives with Particle Swarm Optimization[J],IEEE Transactions on Evolutionary Computations, 2004,8(3):256-279, 2004.
    [75] Sierra M R, Coello C A. Improving PSO-based Multi-objective Optimization Using Crowding, Mutation and E-dominance[C], Proceedings of the 3th International Conference Evolutionary Multi-Criterion Optimization pages:505-519, 2005.
    [76] Korudu P,Das S,Welch S M, Multi-objecive Hybrid PSO Using u-fuzzy Dominance[J] Proceedings of the Genetic and Evolutionary Computation Conference, 2007,5:853-860.
    [77] Craig W. Reynolds. Flocks, Herds, and Schools: A Distributed Behavioral Model [J]. Computer Graphics, 1987,21(4):25-34.
    [78] Tamas Vicsek, Andras Czirok, Eshel Ben-Jacob, et al. Novel Type of Phase Transition in a System of Self-Driven Particles [J]. Physical Review Letters, 1995,75(6):1226-1229.
    [79]齐心跃.基于强化学习的多机器人任务分配算法研究[D].长春,吉林大学硕士学位论文, 2008。
    [80]李建会,张江.数字创世纪:人工生命的新科学[M].北京:科学出版社, 2006.
    [81]景兴建,王超越,谈大龙,人工协调场及其在动态不确定环境下的机器人运动规划中的应用[J],中国科学E辑工程科学材料科学2004,34(9):1021-1036.
    [82] Scott Camazine, Jean-Louis Deneubourg, Nigel R. Franks, et al. Self-Organization in Biological Systems [M]. Princeton University Press, Princeton, NJ, 2001.
    [83] Ali Jadbabaie, Jie Lin, A. Stephen Morse. Coordination of Groups of Mobile Autonomous agents Using Nearest Neighbor Rules [J]. IEEE Transactions on Automatic Control, 2003,48(6):988-1001.
    [84] E.Shaw, Fish in Schools[J], Natural History,1975, 84(8):40-45.
    [85] A.Okubo, Dynamical Aspects of Animal Grouping:Swarm,Schools,Flocks and Herds[J]. Adv.Biophys,1986,22:1-94.
    [86] H.Levine and W.J.Rappel, Self-organization in Systems of Self-propelled Particles[J],Phys.Rev.E,2001,63:208-211.
    [87] B.Crowther, Flocking of Autonomous Unmanned Air Vehicles[J],Aeronaut.J.,2003,107(1068):99-109.
    [88] R.Olfati-Saber and R.M.Murray, Consensus Seeking in Multi-agent Systems under Dynamically Changing Interaction Topologies[J], IEEE Transactions on Automation and Control, 2005,50(5):655-661.
    [89] H.G.Tanner, A.Jadbabaie, and G.J.Pappas, Stable Flocking of Moble agents ,Part I: Fixed Topology[C], Proceeding 42nd IEEE Conference on Decision and Control, pages:2010-2015, 2003.
    [90] H.G.Tanner, A.Jadbabaie, and G.J.Pappas, Stable Flocking of Moble agents ,Part II: Dynamic Topology[C], Proceeding 42nd IEEE Conference on Decision and Control, pages:2016-2021, 2003.
    [91] R.Olfati-Saber, Flocking for Multi-agent Dynamic Systems:Algorithms and Theory[J], IEEE Transactions on Automation and Control, 2006,51(3):401-420.
    [92] Moreau L. Stability of Multi-agent Systems with Time-dependent Communication Links[J]. IEEE Transactions on Automation and Control, 2005,50(2):169-181.
    [93] John H Reif, Hongyan W. Social Potential Fields: A Distributed Behavioral Control for Autonomous Robots [J]. Robotics and Autonomous Systems, 1999,27(3):171-194.
    [94] Sugawara Ken, Sano Masaki. Cooperative Acceleration of Task Performance: Foraging Behavior of Interacting Multi-robots System [J]. Physica D: Nonlinear Phenomena, 1997,100(3-4): 343-354.
    [95] Kristina Lerman, Aram Galstyan. Mathematical Model of Foraging in a Group of Robots: Effect of Interference [J]. Autonomous Robots, 2002,13(2):127-141.
    [96] Kristina Lerman, Aram Galstyan, Martinoli Alcherio, Ijspeert Auke Jan. A Macroscopic Analytical Model of Collaboration in Distributed Robotic Systems [J]. Artificial Life, 2001,7(4): 375-393.
    [97] Kristina Lerman, Aram Galstyan. A General Methodology for Mathematical Analysis of Multi-agent Systems[R]. University of Southern California, Information Sciences Technical Report ISI-TR-529, 2001
    [98] Leah Edelstein-Keshet, James Watmough and G. Bard Ermentrout. Trail following in ants: individual properties determine population behavior[J] Behavioral Ecology and Sociobiology, 1995,36(2):119-133.
    [99] Leah Edelstein-Keshet, Simple models for trail-following behaviour; Trunk trails versus individual foragers[J], Journal of Mathematical Biology, 1994,32(4):303-328.
    [100]陈虹舟,秦世引.群体机器人觅食过程的数学建模与仿真分析[C],第十一届中国人工智能学术年会,南京,1030-1035, 2005.
    [101]徐东,刘佰龙,张汝波.群体觅食行为启发的多机器人分布式编队控制方法研究[J],小型微型计算机系统, 2009,30(10):2034-2038.
    [102]刘佰龙,张汝波,史长亭随机扰动下多源群体觅食系统建模与仿真[J],智能系统学报, 2008,3(4):342-348.
    [103]杨永明,田彦涛,洪伟,梅昊.群体机器人合作觅食任务数学模型的建立与分析[J].机器人,2007,29(6):546-551.
    [104]杨永明,田彦涛.基于区域分工机制的异构群体机器人觅食任务数学分析[J].吉林大学学报(工学版),2008,38(6):1396-1401.
    [105]杨丽艳,郑毓蕃.带有一般非线性结构的觅食群体的稳定性分析[J].复杂系统与复杂性科学, 2009,6(2):10-18.
    [106]Reza Pedrami, Wijenddra Sivaram, Jamie Baxter and Brandon W. Gordon, A control allocation approach for energetic swarm control of wheeled mobile robots[C], Proc. of the 2008 IEEE International Conf. on Robotics and Biomimetics Bangkok, Thailand, pages:1924-1931, 2009.
    [107]R. Pedrami, S. Wijenddra, J. Baxter and B. W. Gordon, A control allocation approach for energetic swarm control[C], 2009 American Contr. Conf. Hyatt Regency Riverfront, St. Louis, MO, USA, pages:5079-5084, 2009.
    [108]R. Pedrami and B. W. Gordon, Control and analysis of energetic swarm systems[C], Proc. of the 2007 American Contr.l Conf. Marriott Marquis Hotel at Times Square New York City, USA, pages:1894-1899, 2007.
    [109]Reza Pedrami and B. W. Gordon, Control and cohesion of energetic swarms[C], 2008 American Control Conference, Westin Seattle Hotel, Seattle, Washington, USA pages:129-134, 2008.
    [110]Reza Pedrami and Brandon W. Gordon, Control of energetic robotic swarm systems[C], Proc. of the 2007 IEEE International Conf. on Robotics and Biomimetics Sanya, China, pages:547-552 2007.
    [111]Reza Pedrami and Brandon W. Gordon, temperature control of energetic swarms[C], Proc. of the 2007 IEEE International Conf. on Mechatronics and Automation Harbin, China, pages:2639-2644,2007.
    [112]Minsky M, Riecken D. A Conversation With Marvin Minsky about agents[J]. Communication of the ACM.,1994,37(7):22-29.
    [113]Robert C, Barto G. Elevator Group Control Using Multiple Reinforeement Leaming agents[J], Machine Leaming,1998,33:235-262.
    [114]Kim G W, Lee G. Genetic Reinforcement Learning Approach to the Heterogeneous Machine Seheduling Problem[J]. IEEE Transactionon Robotics and Automation,1998,14(6):879-893, 1998.
    [115]Tan M. Multi-agent Reinforcement Learning: Independent Vs Cooperative agents[C], Proeeedings of the 10th International Conferenee of Machine Learning, pages330-337, 1993.
    [116]Sen S, Sekaran M, Hale J. Learning to Coordinate without Sharing Information[C]. Proeeedings of the 12th International Conference on Artificial Intelligence, pages:426-431, 1994.
    [117]Chen Y, Khoroshilov Y. Learning under Limited Information[J]. Games and Economic Behavior, 2003,44(1):1-25.
    [118]Mataric M. Using Communication to Reduce Locality in Multi-robot Learning[C], Proeeedings of the 14th International Conferenee of Artificial Intelligence, pages:643-648, 1997.
    [119]Kawakami K, Ohkura K, Ueda K. Adaptive Role Development in a Homogeneous Connected Robot Group[C]. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, pages:251-256, 1999.
    [120]Kostiadis K, Hu H. Reinforcement Learning and Cooperation in a Simulated Multi-agent System[C]. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages:990-995, 1999.
    [121]Ishiwaka Y, Sato T, Kakaza Y. An Approach to the Pursuit Problem on a Heterogeneous Multi-agent System Using Reinforcement Learning[J]. Robotics and Autonomous Systems, 2003,43(4):245-256.
    [122]Berenji H, Saraf S. Competition and Collaboration among Fuzzy Reinforcement Learning agents[C]. IEEE International Conference on Fuzzy Systems, pages:622-627, 1998.
    [123]Mariano C, Morales E. Distributed Reinforcement Learning for Multiple Objective Optimization Problems[C]. Proceedings of the IEEE Congress on Evolutionary Computation, pages:188-195, 2000.
    [124]Kutschinski E, Uthmann T, Polani D. Learning Competitive Pricing Strategies by Multi-agent Reinforcement Learning[J]. Journal of Economix Dynamics and Control, 2003,27(11):2207-2218.
    [125]王醒策,张汝波,顾国昌.基于强化学习的多机器人编队方法研究[J],计算机工程,2002,28(6):15-23.
    [126]蔡庆生,张波.一种基于智能体团队的强化学习模型与应用研究[J].计算机研究与发展. 2000,37(9):1087-1093.
    [127]孟伟,洪炳熔,韩学东.强化学习在机器人足球比赛中的应用.计算机应用研究[J]. 2002,19(6):79-81.
    [128]M.J.Mataric.Learning to Behave Socially[C]. Proceedings of the 3th International Conference on Simulation of Adaptive Behavior, pages:453-462, 1994.
    [129]M.J.Mataric.Interaction and Intelligent Behavior [D]. Ph.D. Thesis,Department of Electrical Engineering and Computer Science, MIT,USA, 1994.
    [130]M.J.Mataric.Reward Functions for Accelerated Learning[C]. Proeeedings of the 11th International Conferenee of Machine Learning, pages535-542, 1994.
    [131]Y.Nagayuki, S. Ishii, K. Doya. Multi-agent Reinforcement Learning: An approach based on The Other agent’s Internal Model[C]. Proceedings of the 4th International Conference on Multi-agent Systems,pages:215-221, 2000.
    [132]L.E.Parker. Heterogeneous Multi-robot Cooperation [D]. Ph.D. Thesis, MIT,USA, 1994.
    [133]E.Uchibe, M., Asada, K. Hosoda. Cooperative Behavior Acquisition in Multi-mobile robots Environment by Reinforcement Learning based on Atate Vector Estimation[C]. Proceedings of the IEEE International Conference on Robotics and Automation, pages:1558-1563, 1998.
    [134]R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA, 1998.
    [135]高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报, 2004,30(1):86-100.
    [136]Watkins P. Dayan. Q-learning[J]. Machine Learning, 1992,8(3):279-292.
    [137]Leslie Pack Kaelbling, Michael L Littman, Andrew W Moore. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research[J], 1996, 4:237- 285.
    [138]Ahmadabali M. N., Asadpour M.. Expertness Based Cooperative Q-learning [J]. IEEE Transactions on Systems, Man and Cybernetics-Part B: Cybernetics, 2002,32(1):66-76.
    [139]仲宇,顾国昌,张汝波.多智能体系统中的分布式强化学习研究现状[J].控制理论与应用, 2003,20(3):317-322.
    [140]仲宇.分布式强化学习理论及在多机器人中的应用研究[D].哈尔滨,哈尔滨工程大学博士学位论文, 2003.
    [141]Broadwell, M.M. and D. M. Smith. Interfacing Symbolic Processes to a Flight Simulator[C]. In Proceedings of the Summer Computer Simulation Conference. pages: 751 - 755, 1986.
    [142]Corkill. Daniel D., Kevin Q. Gallagher, and Philip M. Johnson. Achieving Flexibility, Efficiency, and Generality in Blackboard Architectures[C]. In Proceedings AAAI. pages:18-23, 1987.
    [143]Corkill. Daniel D., Gallagher, K. Q, and Murrary, K. E. GBB: a Generic Blackboard Development System[C]. In Proceedings AAAI, pages:1008-1014,1986.
    [144]Rajendra Dodhiawala.Blackboard Architectures and Applications[M], Academic Press,Inc. Orlando, FL, USA. 1989.
    [145]Dodhiawala, Rajendra, et al. The First Workshop on Blackboard Systems[J]. AI Magazine, 1989,10(1):77-80.
    [146]Ensor. J. Robert, and John D. Gabbe, Transactional Blackboards[C], In Proceedings of the International Conference on Artificial Intelligence. pages: 340-344, 1985.
    [147]Jones, J., Millington, M., and Ross P. A Blackboard Shell in PROLOG[C]. In Proceedings ECAI, pages:428-436, 1986.
    [148]James McLurkin, Jennifer Smith. Distributed Algorithms for Dispersion in Indoor Environments using a Swarm of Autonomous Mobile Robots [C]. Proceedings of the Seventh International Symposium on Distributed Autonomous Robotic Systems, pages:381-390, 2004.
    [149]K. S. Narendra and M. A. L. Thathachar, Learning automata:Asurvey[J]. IEEE Transaction.on Systems Man, Cybernetics, 1974,SMC-14:323–334.
    [150]M. L. Tsetlin, Automata Theory and Modeling of Biological Systems[M].New York: Academic, 1973.
    [151]R. Viswanathan, Learning automaton: Models and applications[D], Ph.D. dissertation, Yale Univ., New Haven, CT, 1972.
    [152]M. A. L. Thathachar and P. S. Sastry, Learning Optimal Discriminant Functions through a Cooperative Game of Automata[J], IEEE Transaction.on Systems Man, Cybernetics, 1987, SMC-17:73–85.
    [153]A. G. Barto and P. Anandan, Pattern-recognizing Stochastic Learning Automata[J]. IEEE Transaction.on Systems Man, Cybernetics, 1985,SMC–15:360–374.
    [154]K. Rajaraman and P. S. Sastry, Stochastic Optimization over Continuous and Discrete Variables with Applications to Concept Learning under Noise[J]. IEEE Transaction.on Systems Man, Cybernetics A, 1999,29:542–553.
    [155]G. I. Papadimitriou and D. G. Maritas, Learning Automata-based Receiver Conflict Avoidance Algorithms for WDM Broadcast-and-select Star Networks[J]. IEEE/ACM Transaction on Networking,1996,4:407–412.
    [156]C. K. K. Tang and P. Mars, Games of Stochastic Learning Automata and Adaptive Signal Processing[J]. IEEE Transaction.on Systems Man, Cybernetics, 1993,23:851–856.
    [157]X. Zeng, J. Zhou, and C. Vasseur, A Strategy for Controlling Nonlinear Systems Using a Learning Automaton[J]. Automatica, 2000,36:1517–1524.
    [158]PS Sastry, GD Nagendra, N Manwani, A Team of Continuous-action LearningMan, and Cybernetics, Part B: Cybernetics ,2010,40(1):19-28. Automata for Noise-tolerant Learning of Half-spaces[J]. IEEE Transactions on Systems,

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700