电力大数据的价值密度评价及结合改进k-means的提升方法研究

英文篇名：Evaluation and Promotion Methods with Improved k-means for Value Density of Electric Power Big Data
作者：王赛一 ; 余建平 ; 孙丰杰 ; 王承民 ; 谢宁
英文作者：WANG Saiyi;YU Jianping;SUN Fengjie;WANG Chengmin;XIE Ning;State Grid Pudong Power Supply Company;School of Electronic Information and Electrical Engineering,Shanghai Jiaotong University;
关键词：电力大数据 ; 价值密度 ; 评价指标 ; k-means算法 ; 三层过滤机制
英文关键词：electric power big data;;value density;;evaluating indicator;;k-means algorithm;;three-layer filtering mechanism
中文刊名：XBDJ
英文刊名：Smart Power
机构：国网上海市电力公司浦东供电公司;上海交通大学电子信息与电气工程学院;
出版日期：2019-03-20
出版单位：智慧电力
年：2019
期：v.47;No.305
基金：国家自然科学基金资助项目(51777121)~~
语种：中文;
页：XBDJ201903003
页数：8
CN：03
ISSN：61-1512/TM
分类号：14-21

摘要

针对目前电力大数据价值密度的研究存在缺乏定义和量化指标、提升手段单一导致效果有限的问题,提出了相关定义及评价指标,从空间上内存占用、时间上运行速率2个维度计算价值密度评价指标;并提出了基于多初始聚类中心的改进k-means算法,弥补其太过依赖于初始聚类中心的不足。结合该算法,分别从"脏数据"、记录、字段等不同维度,研究如何提升价值密度。以日负荷预测为算例进行仿真测试,结果表明评价指标能较好地反映价值密度,改进聚类算法有较好的的聚类效果和速率优势,可以有效提升数据价值密度。
Study on the value density of electric power big data lacks of quantitative evaluation index and promotion methods are limited, resulting in limited effect. In the view of this problem, the paper proposes the evaluation index of value density based on memory footprint and operation speed, and puts forward the multiple initial clustering centers-based improved k-means algorithm to make up for the problem that it relies too much on the initial cluster center. Combined with the algorithm, this paper improves the value density from different dimensions such as "dirty data", records and fields. Taking daily load forecasting as the simulation test, the results show that the defined index can well reflect the value density. The improved algorithm has better clustering effect and speed advantage, and can effectively enhance the data value density.

引文

[1]宗威,吴锋.大数据时代下数据质量的挑战[J].西安交通大学学报(社会科学版),2013,33(5):38-43.ZONG Wei, WU Feng. Challenges of data quality in the era of big data[J]. Journal of Xi'an Jiao Tong University(SOCIAL SCIENCES), 2013,33(5):38-43.
    [2] CUI M J, KE D P, SUN Y Z, et al. Wind power ramp events forecasting using a stochastic scenario generation method[J]. IEEE Transactions on Sustainable Energy,2015, 6(2):422-433.
    [3]崔明建.基于ANN概率生成模型的风电功率爬坡事件大数据场景预测[J].智慧电力,2017, 45(7):83-91.CUI Mingjian. Big data scenarios prediction of wind power ramp events based on ANN probability generation model[J]. Smart Power,2017, 45(7):83-91.
    [4]谢晓帆,刘秋林,李斌.基于主成分分析法与对应分析法的县域配电网状况评估[J].智慧电力,2018,46(6):68-73.XIE Xiaofan, LIU Qiulin, LI Bin. Evaluation of country distribution network status based on principal component analysis and correspondence analysis[J]. Smart Power,2015,35(6):149-153.
    [5]王思华,杨桐,段启凡.基于DT法和粗糙集理论的接地网安全性状态评定[J].电力系统保护与控制,2017,45(2):48-54.WANG Sihua, YANG Tong, DUAN Qifan. Evaluation of security state in grounding grid based on DT method and rough set[J]. Power System Protection and Control, 2017,45(2):48-54.
    [6] HAMERLY G, PERELMAN E, LAU J, et al. Using machine learning to guide architecture simulation[J]. Journal of Machine Learning Research, 2006,7(3):343-378.
    [7] COONS K E, MUSUVATHI M, MCKINLEY K S. Bounded partial-order reduction[J]. Acm Sigplan Notices, 2013,48(10):833-848.
    [8]曲朝阳,陈帅,杨帆,等.基于云计算技术的电力大数据预处理属性约简方法[J].电力系统自动化,2014,38(8):67-71.QU Zhaoyang, CHEN Shuai, YANG Fan, et al. An attribute reducing method for electric power big data preprocessing based on cloud computing technology[J]. Automation of Electric Power Systems,2014,38(8):67-71.
    [9]吴霜,季聪,孙国强.基于CUDA技术的海量电力负荷曲线聚类算法[J].电力工程技术,2018,37(4):65-70.WU Shuang,JI Cong,SUN Guoqiang. A clustering algorithm based on CUDA technology for massive electric power load curves[J]. Electric Power Engineering Technology,2018,37(4):65-70.
    [10] XU Y, QU W, et al. Efficient k-means++approximation with mapreduce[J]. IEEE Computer Society,2014,25(12):3135-3144.
    [11]周涛,陆惠玲.数据挖掘中聚类算法研究进展[J].计算机工程与应用,2012,48(12):100-111.ZHOU Tao, LU Huiling. Clustering algorithm research advances on data mining[J]. Computer Engineering and Applications, 2012,48(12):100-111.
    [12]韩俊,谈健,黄河,等.基于改进k-means聚类算法的供电块划分方法[J].电力自动化设备,2015,35(6):123-129.HAN Jun, TAN Jian, HUANG He, et al. Power-supplying block partition based on improved k-means clustering algorithm[J]. Electric Power Automation Equipment,2015,25(6):123-129.
    [13]张斌,庄池杰,胡军.结合降维技术的电力负荷曲线集成聚类算法[J].中国电机工程学报,2015,35(15):3741-3749.ZHANG Bin, ZHUANG Chijie, HU Jun. Ensemble Clustering Algorithm Combined With Dimension Reduction Techniques for Power Load Profiles[J]. Proceedings of the CSEE, 2015,35(15):3741-3749.
    [14]孙丰杰,王承民,谢宁.面向智能电网大数据关联规则挖掘的频繁模式网络模型[J].电力自动化设备,2018(5):110-116.SUN Fengjie, WANG Chengmin, XIE Ning. Frequent pattern network model association rule mining of big data in smart grid[J]. Power Automation Equipment, 2018(5):110-116.
    [15]牛东晓,王建军.基于粗糙集和决策树的自适应神经网络短期负荷预测方法[J].电力自动化设备,2009,29(10):30-34.NIU Dongxiao, WANG Jianjun. Short-term load forecasting using adaptive ANN based on rough set and decision tree[J]. Electric Power Automation Equipment, 2009, 29(10):30-34.
    [16]凌武能,杭乃善,李如琦.基于云支持向量机模型的短期风电功率预测[J].电力自动化设备,2013,33(7):34-38.LING Wuneng, HANG Naishan, LI Ruqi. Short-term wind power forecasting based on cloud SVM model[J].Electric Power Automation Equipment, 2013,33(7):34-38.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700