基于强化学习的互联电网CPS指令动态优化分配算法
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
互联电网调度端的自动发电控制(Automatic Generation Control, AGC)功率指令到各类型AGC机组的动态优化分配是随机最优问题。1997年北美电力可靠性委员会(NERC)正式推出联络线功率与系统频率偏差(TBC)模式下AGC的最新控制性能标准(CPS)。CPS标准更注重AGC系统的中长期收益,改变了传统AGC的控制思想,如何设计适应CPS标准下AGC系统功率指令(简称CPS指令)的动态优化分配策略成为一个全新的理论研究课题。
     首先,本文概述了CPS指令动态优化分配的基本原理,介绍了国内外在AGC经济调试及机组组合问题的研究现状,研究CPS考核指标的数据统计特性及优化控制目标。在分析CPS指令动态分配问题特点及控制目标的基础上,指出CPS标准下的AGC系统可看作“不确定的随机系统”,数学模型以高斯—马尔可夫随机过程建模,动态负荷分配问题可理解为一个离散时间马尔可夫决策过程(Discrete Time Markov Decision Processes, DTMDP).从而将强化学习理论中基于随机最优控制技术的Q-学习方法引入CPS指令优化分配策略的研究。
     其次,以标准两区域互联系统和广东电网的负荷频率控制(LFC)模型为研究对象,系统地应用单步Q-学习和多步回溯Q(λ)学习方法进行详细仿真比较分。根据优化目标的差异,设计不同的奖励函数,并将其引入到算法当中,有效结合水、火电机组的调节特性,并考虑水电机组的调节裕度,提高AGC系统调节能力。统计性仿真比较试验显示引入Q-学习方法能有效实现分配策略的在线自学习和动态优化决策,增强了AGC系统的鲁棒性和适应性且提高了CPS考核合格率。据国内外查新显示,迄今尚未有任何基于马尔可夫决策理论的优化和控制方法在CPS指令动态优化分配领域中出现。
     最后,本文指出基于经典强化学习方法的CPS指令分配算法不可避免面临“维数灾难”问题,提出应用分层强化学习的方法,将全网机组按调频时延做初次分类,CPS指令逐层分配形成任务分层结构。在分层Q学习算法层与层之间引入一个时变协调因子,改进的分层Q学习算法有效提高原算法收敛速度。奖励函数中设计不同的权值线性组合,展示保守及乐观控制下系统CPS控制水平和调节成本的变化关系。南方电网统计性仿真研究显示,改进分层Q学习较分层Q学习算法平均收敛时间缩短60%,在复杂随机扰动的环境中改进算法能够有效提高系统CPS考核合格率和降低调节成本4%。
     本论文的研究得到国家自然科学基金面上项目“CPS标准下AGC最优松弛控制及其马尔可夫决策过程(50807016)”、广东省自然科学基金项目“非马氏环境下随机最优松驰发电控制及其半马氏决策过程”(9151064101000049)及中央高校基本科研业务费专项资金“基于分层半马氏决策过程的智能电网负荷频率控制的自学习和自演化理论”的资助(No.2009ZM0251)。
The dynamic optimization of automatic generation control (AGC) generating command dispatch is a stochastic optimization problem for the interconnected power system. The North American Electric Reliability Council (NERC) formally released new Control Performance Standards (CPS) for AGC of interconnected power systems under Tie-lines Bias Control (TBC) mode in 1997. CPS standard pays more attention to the medium and long-run returns of AGC performance and change the traditional AGC control philosophy, and how to design the CPS order dynamic optimal dispatch strategy under CPS standards has become a fire-new topic of theoretical research.
     Firstly, the paper states the CPS order dynamic optimal dispatch principle briefly, and introduces the background, current research status of CPS order dispatch at home and abroad. This paper also gives the mathematical analysis of NARI's CPS control rules. On the basis of in-depth study of CPS order dispatch characteristics and the optimal control objective, the paper suggests that the NERC's CPS based AGC system is a stochastic multistage decision process, and the dispatch problem should be suitably modeled as a reinforcement learning (RL) problem based on Discrete Time Markov Decision Process (DTMDP) theory, and Q-learning method based optimal stochastic control techniques is introduced into the domain of CPS order dispatch for its solution.
     Secondly, by applying the Matlab/Simulink and DTMDP simulation modeling, the load frequency control (LFC) models of two-area power system and Guangdong power grid are taken as examples for detailed comparison and analysis three Q-learning based CPS order dispatch algorithm. Reward functions in Q-learning are designed based on different optimization objectives. Thermal and hydro units are integrated, with the regulating margin for hydro units being considered, to improve the regulating performance of the AGC system. The multi-step optimization Q(λ) method with the backtracking function is also employed to overcome the problem of long control time-delay in the AGC control loop. The statistical experiment results show the proposed dispatch methodology with online self-learning technique and dynamic optimization capability can obviously enhance the robustness and adaptability of AGC systems while the CPS compliances are ensured.
     Finally, this paper presents an improved hierarchical reinforcement learning (HRL) algorithm to solve the curse of dimensionality problem in the multi-objective dynamic optimization of CPS order dispatch. The CPS order dispatch task is decomposed into several subtasks by classifying the AGC committed units according to their response time delay of power regulating. A time-varying coordination factor is introduced between layers of HRL to speed up the algorithm by 60%. Numbers of linear combination of weights in reward function are designed to optimize hydro capacity margin and AGC production cost. The application of improved hierarchical Q-learning in the China southern power grid model shows that the proposed method can enhance the performance of AGC systems in CPS assessment and save AGC regulating cost over 4%, compare with the hierarchical Q-learning and genetic algorithm.
     This paper is supported by National Natural Science Fund of China "AGC Optimal Relaxed Control and its Markov Decision Process based on Control Performance Standards" (50807016), Guangdong Natural Science Foundation Project (9151064101000049) and the Fundamental Research Funds for the Central Universities (No.2009ZM0251).
引文
[1]N. Jaleeli, L. S. VanSlyck, "NERC's new control performance standards," IEEE Trans. on Power Systems, vol.14, pp.1091-1099, Aug.1999.
    [2]洪宪平.电网AGC分层控制[J].中国电力,2000,33(9):62-66.
    [3]李政林.火电厂厂级负荷优化调度系统的研究[D].北京:华北电力大学,2003:13-21.
    [4]袁晓辉,袁艳斌,王金文等.水火电力系统短期优化方法综述[J].中国电力,2002,35(9):33-38.
    [5]Raymond R. Shoults, Show Kang Chang, Steve Helmick, "A practical approach to unit Commitment, economic dispatch and savings allocation for multiple-area pool operation with import/export constraints," IEEE Trans. on Power Apparatus and Systems, vol.99, no.2, pp.625-634,1980.
    [6]Fred N. Lee, "Short-term thermal unit commitment-a new method," IEEE Trans. on Power Systems, vol.3, no.2, pp.421-427,1988.
    [7]J. Y. Fan, L. Zhang, J. D. McDonald, "Enhanced techniques on sequential unit commitment with interchange transactions," IEEE Trans. on Power Systems, vol.11, no.1, pp.93-97,1996.
    [8]Fred N. Lee, "The application of commitment utilization factor (CUF) to thermal unit commitment," IEEE Trans. on Power Systems, vol.6, no.2, pp.691-698,1991.
    [9]刘可真,高峰,束洪春.一种实用的机组最优启停计划方法[J].昆明理工大学学报,2002,27(1): 73-78.
    [10]陈皓勇,王锡凡.机组组合问题的优化方法综述[J].电力系统自动化,1999,23(4):51-54.
    [11]Ryuya Tanabe, Atsushi Kurita, Yasuyuki Tada, et al., "A practical algorithm for unit commitment based on unit decommitment ranking," Electrical Engineering in Japan, vol.138, no.2, pp.1-13, 2001.
    [12]Jaeeung Yi, John W. Labadie, and Steven Stitt, "Dynamic optimal unit commitment and loading in hydropower systems," J. Water Resour. Ping, and Mgmt., vol.129, no.5, pp.388-398,2003.
    [13]Subir Sen and D. P. Kothari, "Optimal thermal generating unit commitment:a review," International Journal of Electrical Power & Energy Systems, vol.20, no.7, pp.443-451,1998.
    [14]P. Oliveira, S. Mckee and C. Coles, "Lagrangian relaxation and its application to the unit-commitment-economic-dispatch problem," IMA Journal of Management Mathematics, vol.4, no.3, pp.261-272,1992.
    [15]Dipankar Dasgupta, Douglas R. McGregor, "Short term unit-commitment using genetic algorithms,' Proceedings of the 1993 IEEE, Boston, Massachusetts, pp.240-247,1993.
    [16]David C. Walters, Gerald B. Sheble, "Genetic algorithm solution of economic dispatich with valve point loading," IEEE Trans. on Power Systems, vol.8, no.3, pp.1325-1332,1993.
    [17]S. A. Kazarlis, A. G. Bakirtzis, V. Petridis, "A genetic algorithm solution to the unit commitment problem," IEEE Trans. on Power Systems, vol.11, no.1, pp.83-92,1996.
    [18]Tim T. Maifeld, Gerald B. Sheble, "Genetic-based unit commitment algorithm," IEEE Trans. on Power Systems, vol.11, no.3, pp.1359-1370,1996.
    [19]Ting Kuo, Shu-Yuen Hwang, "A genetic algoritm with disruptive selection," IEEE Trans. on Power Systems, vol.26, no.2, pp.299-307,1996.
    [20]A. H. Mantawy, Youssef L. Abdel-Magid, Shokri Z. Selim, "A new genetic algorithm approach for unit commitment,".Genetie Algorithm in Engineering Systems:Innovation and Application, vol.9, no.2, pp.215-220,1997.
    [21]Zhu Mingyu, Cen Wenhui, Wang Mingyou, "Using an enhanced genetic algorithm to slove the unit commitment problem,".IEEE International Conference on Intelligent Processing Systems, pp. 611-614,1997.
    [22]左浩,陈昆薇,洪潮.机组负荷最有分配的改进遗传算法[J].电力系统及其自动化学报,2001,13(2):16-19.
    [23]高山,单渊达.遗传算法搜索优化及其在机组启停中的应用[J].中国电机工程学报,2001,21(3):45-48.
    [24]汪峰,朱艺颖,白晓民.基于遗传算法的机组组合研究[J].电力系统自动化,2003,27(6):36-41.
    [25]武瀚,吕鹏飞,刘观起.基于改进遗传算法的经济负荷分配[J].黑龙江电力,2003,25(1):4-7.
    [26]唐巍,李殿璞.电力系统经济负荷分配的混沌优化方法[J].中国电机厂程学报,2000,20(10):36-37.
    [27]王爽心,韩芳,朱衡君.基于改进变尺度混沌优化方法的经济负荷分配[J].中国电机厂程学报,2005,25(24):91-93.
    [28]Madrigal Marcelino and Quintana Victor H, "An interior-point/cutting-plane method to solve unit commitment problems," IEEE Trans. on Power Systems, vol.15, no.3, pp.1022-1027,2000.
    [29]Madrigal Marcelino and Quintana Victor H, "A security-constrained energy and spinning reserve markets cleaning system using an interior-point method," IEEE Trans. on Power Systems, vol.15, no.4, pp.1410-1416,2000.
    [30]张利.电力市场中的机组组合理论研究[D].山东:山东大学,2006.
    [31]关仲,陈刚,张忠静等.人工神经网络与动态搜索的机组组合算法[J].重庆大学学报(自然科学版),2006,29(10):29-32.
    [32]R. Naresh, J. Shanna, "Two-phase neural network based solution technique for short-term hydrothermal scheduling," IEEE Proceedings Generation Transmission and Distribution, vol.15, no. 2, pp.541-545,1999.
    [33]X. Bai, S. Shahidehpour, "Hydrothermal scheduling by tabu search and decomposition method," IEEE Trans. on Power Systems, vol.11, no.2, pp.968-974,1996.
    [34]A. H. Mantawy, Y.L. Abdel-Magid, and S.Z. Selim, "Unit commitment by tabu search," IEE Proceedings Generation, Transmission and Distribution, vol.145, no.1, pp.56-64,1998.
    [35]吴金华,吴耀武,熊信艮等.机组优化组合问题的随机tabu搜索算法[J].电网技术,2003,27(10):35-38.
    [36]C. Christober Asir Rajan, M. R. Mohan, "An evolutionary programming based simulated annealing method for solving the unit commitment problem," International Journal of Electrical Power and Energy Systems, vol.29, no.7, pp.540-550,2007.
    [37]苏鹏,刘天琪,赵国波等.基于改进粒子群算法的节能调度下多目标负荷最优分配[J].电网技术,2009,33(5):48-53.
    [38]Zhao B., Guo C. X., Bai B. R., et al., "An improved particle swarm optimization algorithm for unit commitment," International Journal of Electrical Power and Energy Systems, vol.28, no.7, pp. 482-490,2006.
    [39]张秀霞,王爽心,吴冠玮等.基于混沌遗传和模糊决策算法的多目标负荷经济调度[J].电力自动化设备,2009,29(1):94-98.
    [40]N. P. Padhy, V. Ramachandran, S. R. Paranjothi, "Fuzzy decision system for unit commitment risk analysis," International Journal of Power and Energy Systems, vol.19, no.2, pp.180-185,1999.
    [41]N. P. Padhy, "Unit commitment-a bibliographical survey," IEEE Transactions on Power Systems, vol. 19, no.2, pp.1196-1205,2004.
    [42]Shyh Jier Huang, "Enhancement of hydroelectric generation scheduling using ant colony system based optimization approaches," IEEE Transactions on Engergy Conversion, vol.16, no.3, pp. 296-301,2001.
    [43]Sasan Mokhtari, Jagjit Sing, Bruce Wollenberg, "Unit Commitment Expert System," IEEE Transactions on Power Systems, vol.3, no.1, pp.272-277,1987.
    [44]Z. Ouyang, S. M. Shahidehpour, "Short-term unit commitment expert system," Electric Power Systems Research, vol.20, no.1, pp.1-13,1990.
    [45]D. P. Kothari, A. Ahmad, "Expert system approach to the unit commitment problem" Energy Conversion and Management, vol.36, no.4, pp.257-261,1995.
    [46]吴冠玮.基于混沌遗传和模糊决策算法的负荷经济调度[D].北京:北京交通大学,2006.
    [47]雷德明.利用混沌搜索全局最优解的一种混合遗传算法[J].系统工程与电子技术,1999,21(12): 81-82.
    [48]A. A. E. Desouky, R. Aggarwal, M. M. Eikarteb, "Advanced hybrid genetic algorithm for short-term generation scheduling" IEEE Proceedings Generation, Transmission and Distribution, vol.148, no.6, pp.511-517,2001.
    [49]李亚东,李少远.一种新的遗传混沌优化组合方法[J].控制理论与应用,2002,19(1):143-145.
    [50]章敬东,刘小辉,邓飞其等.混沌优化与遗传算法的智能集成[J].计算机工程与应用,2003,16(1): 17-19.
    [51]Gwo-Ching Liao, "Application meta-heuristics method for short-term unit commitment problem" IEEE Transactions on Power Systems, vol.11, no.3, pp.1-6,2004.
    [52]Gwo-Ching Liao, Ta-Peng Tsao, "Using chaos search immune genetic and fuzzy system for short-term unit commitment algorithm" Electric Power and Energy Systems, vol.28, no.1, pp.1-12, 2006.
    [53]王欣,秦斌,阳春华.基于混沌遗传混合进化算法的短期负荷环境和经济调度[J].中国电机工程学报,2006,26(11):128-133.
    [54]Yu Xichang, Zhou Quanren, "Practical implementation of the SCADA+AGC/ED system of the hunan power pool in the central China Power "Network" IEEE Transactions on Engergy Conversion, vol.9, no.2, pp.250-255,1994.
    [55]M. Yao, R. R. Shoults, R. Kelm, "AGC logic based on NERC's new control performance standard and disturbance control standard" IEEE Trans. on Power Systems, vol.15, no.2, pp.852-857,2000.
    [56]T. P. I. Ahamed, P. S. N. Rao, P. S. Sastry, "A reinforcement learning approach to automatic generation control" Electric Power Systems Research, vol.63, no.1, pp.9-26,2002.
    [57]唐悦中,张王俊.基于CPS的AGC控制策略研究[J].电网技术,2004,28(21):75-79.
    [58]李滨,韦化,农蔚涛等.基于现代内点理论的互联电网控制性能评价标准下的AGC控制策略[J].中国电机工程学报,2008,28(25):56-61.
    [59]余涛,周斌,陈家荣.基于Q学习的互联电网动态最优CPS控制[J].中国电机工程学报,2009,29(19): 13-19.
    [60]高宗和,滕贤亮,涂力群.互联电网AGC分层控制与CPS控制策略[J].电力系统自动化,2004,28(1): 78-81.
    [61]高宗和,滕贤亮,张小白.互联电网CPS标准下的自动发电控制策略[J].电力系统自动化,2005,29(19):40-44.
    [62]刘斌,王克英,余涛等.PSO算法在互联电网CPS功率调节中的应用研究[J].电力系统保护与控制,2009,37(6):36-39.
    [63]唐跃中,阮前途,李瑞庆等.上海电网AGC扩展系统的原理与应用[J].中国电力,1999,32(3):19-23.
    [64]广东电网CPS控制策略及建立可视化系统的研究.广东省电力调度中心,2007.12.
    [65]J. Weissgerber, "Dynamic models for steam and hydro turbines in power system studies" IEEE Transactions on Power Apparatus and Systems, vol.92, no.6, pp.1904-1951,1973.
    [66]North American Electric Reliability Council (NERC), "Standard BAL-001-Control Performance Standard," 2005, http://standard.nerc.net/.
    [67]Mine H, Osaki S. Markovian decision processes [M]. New York:Eisevier,1970.
    [68]R.A.Howard. Dynamic Programming and Markov Process [M]. New York:MIT Press,1960.
    [69]胡奇英,刘建庸.马尔可夫决策过程,西安:西安电子科技大学出版社,2002.
    [70]Richard S. Sutton, Andrew G. Barto, "Reinforcement Learning:An Introduction," Cambridge:MIT Press,1998.
    [71]张汝波.强化学习理论及应用.哈尔滨:哈尔滨工程大学出版社,2001.
    [72]高阳,陈世福,陆鑫.强化学习研究综述.自动化学报,2004,30(1):86-100.
    [73]仲宇,顾国昌,张汝波.多智能体系统中的分布式强化学习研究现状[J].控制理论与应用,2003,20(3):317-322.
    [74]Watkins J C H, Dayan Peter, "Q-learning," Machine Leaning, vol.8, pp.279-292,1992.
    [75]Tsitsiklis, John N, "Asynchronous stochastic approximation and Q-learning," Machine Leaning, vol. 16, no.3, pp.185-202,1994.
    [76]Kaelbling L. P., "Recent Advances in Reinforcement Learning," Boston:Kluwer Academic Publishers,1996.
    [77]Ray G, Prasad A N, Prasad G D, "A new approach to the design of robust load frequency controller for large scale power system," Electric Power Systems Research, vol.51, no.1, pp.13-22,1999.
    [78]蒋金山,何春雄,潘少华.最优化计算方法[M].广州:华南理工大学出版社,2007:225-259.
    [79]Parr R, "Hierarchical Control and Learning for Markov Decision Processes," CA:University of California, Berkeley,1998.
    [80]R. S. Sutton, D. Precup, S. Singh, "Between MDPs and semi-MDPs a framework for temporal abstraction in reinforcement learning," Artificial Intelligence, vol.112, no.1, pp.181-211,1999.
    [81]T. G. Dietterich, "Hierarchical reinforcement learning with the MAXQ value function decomposition," Journal of Artificial Intelligence Research, vol.13, no.1, pp.227-303,2000.
    [82]B. Hengst, "Discovering hierarchy in reinforcement learning," Sydney:University of New South Wales,2003.
    [83]庞士焕,朱相冰,张琦等.基于MAXQ方法的分层强化学习[J].计算机技术与发展,2003,2009,19(4):154-169.