基于Nash-Q的网络信息体系对抗仿真技术

英文篇名：Confrontation simulation for network information system-of-systems based on Nash-Q
作者：闫雪飞 ; 李新明 ; 刘东 ; 王寿彪
英文作者：YAN Xuefei;LI Xinming;LIU Dong;WANG Shoubiao;Science and Technology on Complex Electronic System Simulation Laboratory,Equipment Academy;
关键词：网络信息体系 ; 零和博弈 ; Q-Learning ; Nash均衡
英文关键词：network information sysem-of-systems(NISoS);;zero-sum game;;Q-Learning;;Nash equilibrium
中文刊名：XTYD
英文刊名：Systems Engineering and Electronics
机构：装备学院复杂电子系统仿真实验室;
出版日期：2017-09-14 14:54
出版单位：系统工程与电子技术
年：2018
期：v.40;No.460
基金：装备预研领域基金项目;; 重点实验室基础研究项目(DXZT-JC-ZZ-2015-007)资助课题
语种：中文;
页：XTYD201801031
页数：8
CN：01
ISSN：11-2422/TN
分类号：222-229

摘要

武器装备体系作战仿真研究隶属于复杂系统研究范畴,首次对基于Nash-Q的网络信息体系(network information system-of-systems,NISoS)对抗认知决策行为进行探索研究。Nash-Q算法与联合Q-learning算法具有类似的形式,其区别在于联合策略的计算,对于零和博弈体系作战模型,由于Nash-Q不需要其他Agent的历史信息即可通过Nash均衡的求解而获得混合策略,因此更易于实现也更加高效。建立了战役层次零和作战动态博弈模型,在不需要其他Agent的完全信息时,给出了Nash均衡的求解方法。此外,采用高斯径向基神经网络对Q表进行离散,使得算法具有更好的离散效果以及泛化能力。最后,通过NISoS作战仿真实验验证了算法的有效性以及相比基于Q-learning算法以及Rule-based决策算法具有更高的收益,并且在离线决策中表现优异。
Battle simulation for weapon equipment sysem-of-systems(SoS)belongs to the research category of complex system and the confrontation cognition of network information system-of-systems(NISoS)based on Nash-Q technology is researched.The form of the Nash-Q is similar with the union Q-learning except the obtaining of the union policy.For the zero-sum game model of the SoS battle simulation,the realization and solution of the Nash-Q model is more effective since the Nash-Q does not need the history action messages of other Agents.The zero-sum game command model for the battle simulation of the tactical command level is built and the solving process of Nash-equilibrium is introduced through the complete information of other Agents is not known.The Gauss radial basis function neural network is used to discrete the Q-table to improve the discrete performance and generalization ability of Nash-Q.Finally,the effectiveness of the algorithm is validated through battle simulation of NISoS.Compared with Q-learning and Rule-based algorithm,the proposed algorithm has higher gains and can be used to off-line decision.

引文

[1]GILMORE J M.2015Assessment of the ballistic missile defense system(BMDS)[R].Washington,DC:Defense Technical Information Center,2016.
    [2]PATRICK T H,KEVIN M A.Integrated condition assessment for navy system of systems[J].International Journal of System of Systems Engineering,2012,3(3/4):356-367.
    [3]YANG A,ABBASS H A,SARKER R.Landscape dynamics in multi-agent simulation combat systems[J].Lecture Notes in Computer Science,2004,3339:121-148.
    [4]CONNORS C D.Agent-based modeling methodology for analyzing weapons systems[D].Ohio:Air Force Institute of Technology,2015.
    [5]GISSELQUIST D E.Artificially intelligent air combat simulation agents[D].Ohio:Air Force Institute of Technology,1994.
    [6]ERCETIN A.Operational-level naval planning using agent-based simulation[R].Monterey:Naval Post-graduate School,2001.
    [7]TSVETOVAT M,ATEK M.Dynamics of agent organizations:application to modeling irregular warfare[J].Lecture Notes in Computer Science,2009,5269:141-153.
    [8]CIL I,MALA M.A multi-agent architecture for modelling and simulation of small military unit combat in asymmetric warfare[J].Expert Systems with Applications,2010,37(2):1331-1343.
    [9]GALSTYAN A.Continuous strategy replicator dynamics for multi-agent Q-learning[J].Autonomous Agents and Multi-Agent Systems,2013,26(1):37-53.
    [10]杨克巍,张少丁,岑凯辉,等.基于半自治agent的profit-sharing增强学习方法研究[J].计算机工程与应用,2007,43(15):72-95.YANG K W,ZHANG S D,CEN K H,et al.Research of profit-sharing reinforcement learning method based on semiautonomous agent[J].Computer Engineering and Applications,2007,43(15):72-75.
    [11]杨萍,毕义明,刘卫东.基于模糊马尔可夫理论的机动智能体决策模型[J].系统工程与电子技术,2008,30(3):511-514.YANG P,BI Y M,LIU W D.Decision-making model of tactics maneuver agent based on fuzzy Markov decision theory[J].Systems Engineering and Electronics,2008,30(3):511-514.
    [12]马耀飞,龚光红,彭晓源.基于强化学习的航空兵认知行为模型[J].北京航空航天大学学报,2010,36(4):379-383.MA Y F,GONG G H,PENG X Y.Cognition behavior model for air combat based on reinforcement learning[J].Journal of Beijing University of Aeronautics and Astronautics,2010,36(4):379-383.
    [13]徐安,寇英信,于雷,等.基于RBF神经网络的Q学习飞行器隐蔽接敌策略[J].系统工程与电子技术,2012,34(1):97-101.XU A,KOU Y X,YU L,et al.Stealthy engagement maneuvering strategy with Q-learning based on RBFNN for air vehicles[J].Systems Engineering and Electronics,2012,34(1):97-101.
    [14]段勇,徐心和.基于多智能体强化学习的多机器人协作策略研究[J].系统工程理论与实践,2014,34(5):1305-1310.DUAN Y,XU X H.Research on multi-robot cooperation strategy based on multi-agent reinforcement learning[J].Systems Engineering-Theory&Practice,2014,34(5):1305-1310.
    [15]贾文生,向淑文,杨剑锋,等.基于免疫粒子群算法的非合作博弈Nash均衡问题求解[J].计算机应用研究,2012,29(1):28-31.JIA W S,XIANG S W,YANG J F,et al.Solving Nash equilibrium for N-persons non-cooperative game based on immune particle swarm algorithm[J].Application Research of Computers,2012,29(1):28-31.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700