基于机器学习的电网设备档案数据异常诊断研究

英文篇名：Research on Abnormal Diagnosis for Power Grid Equipment Archival Data Based on Machine Learning
作者：龙婧 ; 刘伟 ; 殷胜
英文作者：LONG Jing;LIU Wei;YIN Sheng;Hubei Huazhong Electric Power Technology Development Co.,Ltd.;
关键词：大数据 ; 机器学习 ; 电网设备档案数据 ; 数据异常 ; 自动诊断
英文关键词：big data;;machine learning;;grid equipment archiving data;;abnormal data;;automatic diagnosis
中文刊名：DXXH
英文刊名：Electric Power Information and Communication Technology
机构：湖北华中电力科技开发有限责任公司;
出版日期：2018-07-15
出版单位：电力信息与通信技术
年：2018
期：v.16;No.179
语种：中文;
页：DXXH201807004
页数：7
CN：07
ISSN：10-1164/TK
分类号：25-31

摘要

为了对电网设备档案数据中无法提炼错误规则的数据问题进行自动诊断,提高数据质量,文章利用大数据机器学习技术,运用机器学习算法,对数据进行自动检测;基于Spark分布式内存计算,利用K-Means聚类算法对档案数据进行聚类训练,再对训练后数据进行分析和处理。试验证明,基于本方法论形成的自动诊断工具能够大幅降低在数据治理工作中的人力投入,减少工作量,降低工作成本,并且可以获得比人力筛查更详细更准确的结果。
In order to automatically diagnose the data problems that cannot be extracted from the error rules in the grid equipment archival data, based on big data technology, this paper used machine learning to automatically detect the data for such problems. Based on the distributed memory calculation of Spark, the K-Means clustering algorithm is used to cluster the archival data, and then the data after training are processed and analyzed. The automatic diagnosis tool based on this method can greatly reduce labor cost, workload and the cost of work, and achieve more detailed and accurate results than human screening.

引文

[1]钟志琛.电力大数据信息安全分析技术研究[J].电力信息与通信技术,2015,13(9):128-132.ZHONG Zhi-chen.The technical discussion of power big data information security analysis[J].Electric Power Information and Communication Technology,2015,13(9):128-132.
    [2]刘凤魁,邓春宇,王晓蓉,等.基于改进快速密度峰值聚类算法的电力大数据异常值检测[J].电力信息与通信技术,2017,15(6):36-41.LIU Feng-kui,DENG Chun-yu,WANG Xiao-rong,et al.Outlier detection of smart grid big data based on improved fast search and find density peaks clustering algorithm[J].Electric Power Information and Communication Technology,2017,15(6):36-41.
    [3]孟建良,刘德超.一种基于Spark和聚类分析的辨识电力系统不良数据新方法[J].电力系统保护与控制,2016,44(3):85-91.MENG Jian-liang,LIU De-chao.A new method for identifying bad data of power system based on Spark and clustering analysis[J].Power System Protection and Control,2016,44(3):85-91.
    [4]王冲,邹潇.基于Spark框架的电力大数据清洗模型[J].电测与仪表,2017,54(14):33-38.WANG Chong,ZOU Xiao.A data cleaning model for electric power big data based on Spark framework[J].Electrical Measurement&Instrumentation,2017,54(14):33-38.
    [5]HAN J W,KAMBER M,PEI J.Data mining:concepts and techniques(3rd Edition)[M].San Francisco(USA):Morgan Kaufmann Publishers,2012.
    [6]JAIN A K,MURTY M N,FLYNN P J.Data clustering:a review[J].Acm Computing Surveys(CSUR),1999,31(3):264-323.
    [7]WEI C P,LEE Y H,HSU C M.Empirical comparison of fast clustering algorithms for large data sets[C]//Proceedings of the33rd Hawaii International Conference on System Sciences,2000.
    [8]高彦杰.Spark大数据处理技术、应用与性能优化[M].北京:机械工业出版社,2014.
    [9]李建江,崔健,王聃,等.Map Reduce并行编程模型研究综述[J].电子学报,2011,39(11):2635-2642.LI Jian-jiang,CUI Jian,WANG Dan,et al.Survey of Map Reduce parallel programming model[J].Acta Electronica Sinica,2011,39(11):2635-2642.
    [10]吴哲夫,张彤,肖鹰,等.基于Spark平台的K-Means聚类算法改进及并行化实现[J].互联网天地,2016,1(1):44-50.
    [11]李淋淋,倪建成,曹博,等.基于Spark框架的并行聚类算法[J].计算机技术与发展,2017,27(5):1-5.LI Lin-lin,NI Jian-cheng,CAO Bo,et al.Parallel clustering algorithm with Spark framework[J].Computer Technology and Development,2017,27(5):1-5.
    [12]赵永雷,黄家栋,李配配.基于加权模糊C均值聚类算法的变压器故障诊断[J].陕西电力,2011,39(9):39-41.ZHAO Yong-lei,HUANG Jia-dong,LI Pei-pei.Fault diagnosis on transformer based on weighted fuzzy C-means clustering algorithm[J].Shaanxi Electric Power,2011,39(9):39-41.
    [13]曹鹏.基于Spark平台的聚类算法的优化与实现[D].北京:北京交通大学,2016.
    [14]梁彦.基于分布式平台Spark和YARN的数据挖掘算法的并行化研究[D].广州:中山大学,2014.
    [15]张良均,王璐.Python数据分析及挖掘实战[M].北京:机械工业出版社,2015.
    [16]RYZA S,LASERSON U,OWEN S,et al.Advanced analytics with Spark[M].Sebastopol(USA):O’REILLY,2015.