事件驱动的强化学习多智能体编队控制

英文篇名：Event-triggered reinforcement learning formation control for multi-agent
作者：徐鹏 ; 谢广明 ; 文家燕 ; 高远
英文作者：XU Peng;XIE Guangming;WEN Jiayan;GAO Yuan;School of Electric and Information Engineering, Guangxi University of Science and Technology;College of Engineering, Peking University;Institute of Ocean Research, Peking University;
关键词：强化学习 ; 多智能体 ; 事件驱动 ; 编队控制 ; 马尔可夫过程 ; 集群智能 ; 动作决策 ; 粒子群算法
英文关键词：reinforcement learning;;multi-agent;;event-triggered;;formation control;;Markov decision processes;;swarm intelligence;;action-decisions;;particle swarm optimization
中文刊名：ZNXT
英文刊名：CAAI Transactions on Intelligent Systems
机构：广西科技大学电气与信息工程学院;北京大学工学院;北京大学海洋研究院;
出版日期：2019-01-03 14:52
出版单位：智能系统学报
年：2019
期：v.14;No.75
基金：国家重点研发计划项目(2017YFB1400800);; 国家自然科学基金项目(91648120,61633002,51575005,61563006,61563005);; 广西高校工业过程智能控制技术重点实验室项目(IPICT-2016-04)
语种：中文;
页：ZNXT201901009
页数：6
CN：01
ISSN：23-1538/TP
分类号：97-102

摘要

针对经典强化学习的多智能体编队存在通信和计算资源消耗大的问题,本文引入事件驱动控制机制,智能体的动作决策无须按固定周期进行,而依赖于事件驱动条件更新智能体动作。在设计事件驱动条件时,不仅考虑智能体的累积奖赏值,还引入智能体与邻居奖赏值的偏差,智能体间通过交互来寻求最优联合策略实现编队。数值仿真结果表明,基于事件驱动的强化学习多智能体编队控制算法,在保证系统性能的情况下,能有效降低多智能体的动作决策频率和资源消耗。
A large consumption of communication and computing capabilities has been reported in classical reinforcement learning of multi-agent formation. This paper introduces an event-triggered mechanism so that the multi-agent's decisions do not need to be carried out periodically; instead, the multi-agent's actions are replaced depending on the event-triggered condition. Both the sum of total reward and variance in current rewards are considered when designing an event-triggered condition, so a joint optimization strategy is obtained by exchanging information among multiple agents. Numerical simulation results demonstrate that the multi-agent formation control algorithm can effectively reduce the frequency of a multi-agent's action decisions and consumption of resources while ensuring system performance.

引文

[1] POLYDOROS A S, NALPANTIDIS L. Survey of modelbased reinforcement learning:applications on robotics[J].Journal of intelligent&robotic systems, 2017, 86(2):153-173.
    [2] TSAURO G,TOURCTZKY D S,LN T K,et al. Advances in neural information processing systems[J]. Biochemical and biophysical research communications,1997, 159(6).
    [3]梁爽,曹其新,王雯珊,等.基于强化学习的多定位组件自动选择方法[J].智能系统学报,2016, 11(2):149-154.LIANG Shuang, CAO Qixin, WANG Wenshan, et al. An automatic switching method for multiple location components based on reinforcement learning[J]. CAAI transactions on intelligent systems, 2016, 11(2):149-154.
    [4] KIM H E, AHN H S. Convergence of multiagent Q-learning:multi action replay process approach[C]//Proceedings of 2010 IEEE International Symposium on Intelligent Control. Yokohama, Japan, 2010:789-794.
    [5] IIMA H, KUROE Y. Swarm reinforcement learning methods improving certainty of learning for a multi-robot formation problem[C]//Proceedings of 2015 IEEE Congress on Evolutionary Computation. Sendai, Japan, 2015:3026-3033.
    [6] MENG Xiangyu,CHEN Tongwen. Optimal sampling and performance comparison of periodic and event based impulse control[J]. IEEE transactions on automatic control,2012,57(12):3252-3259.
    [7] DIMAROGONAS D V,FRAZZOLI E, JOHANSSON K H. Distributed event-triggered control for multi-agent systems[J]. IEEE transactions on automatic control, 2012,57(5):1291-1297.
    [8] XIE Duosi, XU Shengyuan, CHU Yuming, et al. Eventtriggered average consensus for multi-agent systems with nonlinear dynamics and switching topology[J]. Journal of the franklin institute, 2015, 352(3):1080-1098.
    [9] WU Yuanqing, MENG Xiangyu, XIE Lihua, et al. An input-based triggering approach to leader-following prob-lems[J]. Automatica, 2017, 75:221-228.
    [10] TABUADA P. Event-triggered real-time scheduling of stabilizing control tasks[J]. IEEE transactions on automatic control, 2007, 52(9):1680-1685.
    [11] WEN Jiayan, WANG Chen, XIE Guangming. Asynchronous distributed event-triggered circle formation of multiagent systems[J]. Neurocomputing, 2018, 295:118-126.
    [12] MENG Xiangyu, CHEN Tongwen. Event based agreement protocols for multi-agent networks[J]. Automatica,2013, 49(7):2125-2132.
    [13] ZHONG Xiangnan, NI Zhen, HE Haibo, et al. Eventtriggered reinforcement learning approach for unknown nonlinear continuous-time system[C]//Proceedings of2014 International Joint Conference on Neural Networks.Beijing, China, 2014:3677-3684.
    [14]张文旭,马磊,王晓东.基于事件驱动的多智能体强化学习研究[J].智能系统学报,2017, 12(1):82-87.ZHANG Wenxu, MA Lei, WANG Xiaodong. Reinforcement learning for event-triggered multi-agent systems[J].CAAI transactions on intelligent systems, 2017, 12(1):82-87.
    [15] KROSE B J A. Learning from delayed rewards[J]. Robotics and autonomous systems, 1995, 15(4):233-235.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700