基于最优初始值Q学习的电力信息网络防御策略学习算法

英文篇名：A Defense Policy Learning Algorithm for Power Information Networks Based on Optimal Initial Value Q-learning
作者：景栋盛 ; 杨钰 ; 薛劲松 ; 朱斐 ; 吴文
英文作者：JING Dong-sheng;YANG Yu;XUE Jing-song;ZHU Fei;WU Wen;Suzhou Power Supply Branch,State Grid Jiangsu Electric Power Limited Company;School of Computer Science and Technology,Soochow University;
关键词：电力信息网络 ; 最优初始值 ; Q学习 ; 网络防御
英文关键词：power information network;;optimal initial values;;Q-learning;;network defense
中文刊名：JYXH
英文刊名：Computer and Modernization
机构：国网江苏省电力有限公司苏州供电分公司;苏州大学计算机科学与技术学院;
出版日期：2018-11-15
出版单位：计算机与现代化
年：2018
期：No.279
基金：国家自然科学基金资助项目(61303108,61373094);; 江苏省高校自然科学研究项目重大项目(17KJA520004)
语种：中文;
页：JYXH201811006
页数：6
CN：11
ISSN：36-1137/TP
分类号：22-26+33

摘要

电力信息网络的安全与稳定是当今社会发展的重要保障,随着电力信息网络越来越庞大和复杂,如何高效合理地建立电力信息防护网络成为研究人员关注的重点之一。在自动化电力信息网络中,其防御策略通常缺乏统筹管理,只能针对少数设备进行防护,存在着更新速度慢、更新周期长、无法自动更新和资源分配不均等问题。本文提出一种基于最优初始值Q学习的电力信息网络防御策略学习算法,该算法以强化学习中的Q学习算法为框架,利用生成对抗网络思想,通过攻击智能体和防御智能体的模拟对抗学习安全策略。算法中的防御智能体使用Q学习方法更新其防御策略,利用历史防御经验在线改进防御策略,避免了人为手动操作。在训练中引入最优初始值极大加快了系统防御性能的训练速度。实验结果验证了算法的有效性。
Maintaining the security and stability of the power information network is an important guarantee for today's social development. With the development of the power information network,the researchers now focus on how to establish an efficient and stable power information protection network. The defense strategy used in an automated power information network system used to have problems such as slow update speed,long update cycle,inability to update automatically,and uneven resource allocation.The paper proposed a power information network defense algorithm based on optimal initial value Q learning. The method uses the classical reinforcement learning algorithm. Defensive strategy is obtained through simulated confrontation. Defensive agent uses Qlearning algorithm in order to utilize the historical experience. The optimistic initial values could greatly accelerate the training speed of the system's defensive performance. The experiment verifies the effectiveness of the algorithm.

引文

[1]薛禹胜,赖业宁.大能源思维与大数据思维的融合(一)大数据与电力大数据[J].电力系统自动化,2016,40(1):1-8.
    [2]余贻鑫,刘艳丽.智能电网的挑战性问题[J].电力系统自动化,2015,39(2):1-5.
    [3]汤奕,陈倩,李梦雅,等.电力信息物理融合系统环境中的网络攻击研究综述[J].电力系统自动化,2016,40(17):59-69.
    [4]王栋,陈传鹏,颜佳,等.新一代电力信息网络安全架构的思考[J].电力系统自动化,2016,40(2):6-11.
    [5]靳丹,马志程,杨鹏,等.电力信息系统动态风险评估方法研究[J].现代电子技术,2016,39(14):162-165.
    [6]张振安,黄少伟,梁易乐,等.基于主从博弈的交直流混联系统主动防御策略设计[J].电工电能新技术,2015,34(10):10-16.
    [7]黄天恩,孙宏斌,郭庆来,等.基于电网运行大数据的在线分布式安全特征选择[J].电力系统自动化,2016,40(4):32-40.
    [8] ANWAR A,MAHMOOD A N. Anomaly detection in electric network database of smart grid:Graph matching approach[J]. Electric Power Systems Research,2016,133:51-62.
    [9]金鑫,李龙威,苏国华,等.基于Spark框架和PSO优化算法的电力通信网络安全态势预测[J].计算机科学,2017,44(s1):366-371.
    [10] ZHU F,LIU Q,FU Y C,et al. Segmentation of neuronal structures using SARSA(λ)-based boundary amendment with reinforced gradient-descent curve shape fitting[J].PLo S One,2014,9(3):1-19.
    [11]秦蕊,曾帅,李娟娟,等.基于深度强化学习的平行企业资源计划[J].自动化学报,2017,43(9):1588-1596.
    [12]朱斐,朱海军,刘全,等.一种解决连续空间问题的真实在线自然梯度AC算法[J].软件学报,2018,29(2):267-282.
    [13] SUTTON R S,BARTO A G. Reinforcement learning:An introduction[J]. IEEE Transactions on Neural Networks,2005,16(1):285-286.
    [14] BUSONIU L,BABUSKA R,SCHUTTER B D,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators[M]. CRC Press,2010.
    [15] WIERING M,OTTERLO M V. Reinforcement Learning[M]. Springer Berlin Heidelberg,2012.
    [16]肖峻,甄国栋,祖国强,等.配电网安全域法的改进及与N-1仿真法的对比验证[J].电力系统自动化,2016,40(8):57-63.
    [17]何耀,周聪,郑凌月,等.基于扩展卡尔曼滤波的虚假数据攻击检测方法[J].中国电力,2017,50(10):35-40.
    [18]陈小军,时金桥,徐菲,等.面向内部威胁的最优安全策略算法研究[J].计算机研究与发展,2014,51(7):1565-1577.
    [19]陈学通,凌超,薛峰,等.一种基于贪心算法的紧急控制策略优化搜索方法[J].电力系统保护与控制,2017,45(23):74-81.
    [20] AUER P,CESA-BIANCHI N,FREUND Y,et al. The non-stochastic multi-armed bandit problem[J]. Siam Journal on Computing,2011,32(1):48-77.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700