一种水文时间序列异常模式检测方法研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:An Anomaly Pattern Detection Method for Hydrological Time Series
  • 作者:李云霞 ; 姚建国 ; 万定生 ; 赵群
  • 英文作者:LI Yun-xia;YAO Jian-guo;WAN Ding-sheng;ZHAO Qun;School of Computer and Information,Hohai University;Bureau of Hydrology of Huaihe River Commission;
  • 关键词:时间序列 ; 分段线性表示 ; 层次聚类 ; 异常因子 ; 异常模式
  • 英文关键词:time series;;piecewise linear representation;;hierarchical clustering;;outlier factor;;anomaly pattern
  • 中文刊名:WJFZ
  • 英文刊名:Computer Technology and Development
  • 机构:河海大学计算机与信息学院;淮河水利委员会水文局;
  • 出版日期:2019-03-21 10:21
  • 出版单位:计算机技术与发展
  • 年:2019
  • 期:v.29;No.267
  • 基金:国家重点研发计划(2018YFC0407900);; 公益性行业科研专项(201501022)
  • 语种:中文;
  • 页:WJFZ201907033
  • 页数:5
  • CN:07
  • ISSN:61-1450/TP
  • 分类号:165-169
摘要
时间序列数据是一类常见的多维复杂类型数据,它客观记录了观测系统随时间次序而变化的、在各观测时刻点的重要信息。时间序列数据具有海量性、高维性、复杂性等特点,直接对原始水文时间序列进行异常检测需要花费大量的时间,因此提出一种基于两阶段的水文时间序列异常检测方法。该方法通过分段线性表示方法对原始时间序列进行表示,提取子序列的斜率,极值差和均值三个特征值来表示原始时间序列。第一阶段在每个子序列为一个三元组的基础上用层次聚类算法对数据进行聚类,得到聚类结果。第二阶段基于聚类结果计算每一类的异常因子,根据异常因子判定异常模式。为验证该方法的有效性,采用龙门站的实测数据和人工合成数据进行实验检测,取得了较好的效果。
        Time series data is a kind of common multi-dimensional complex data,which objectively records the important information of the observation system changing with time order at each observation point. Time series data is characterized by massiveness,high dimensionality,complexity and so on,and it takes a lot of time to conduct anomaly detection in the original hydrological time series. Therefore,we present an anomaly detection method for hydrological time series based on the two-stage. The original time series is represented by piecewise linear representation method,the slope,the extreme difference and the mean of the subsequence are extracted to express the original sequence. In the first stage,the hierarchical clustering algorithm is used to cluster the data on the basis subsequence,and the clustering result is obtained. In the second stage,based on clustering results the outlier factors of each type are calculated to detect anomaly patterns. In order to demonstrate the effectiveness of this method,the actual dataset of Longmen Station as well as artificial dataset are used for testing,and better results are obtained.
引文
[1] OBUCHOWSKI J,WY?OMAN?SKA A,ZIMROZ R.The local maxima method for enhancement of time-frequency map[J].Mechanical Systems & Signal Processing,2014,46(2):389-405.
    [2] 叶燕清.多元时间序列数据挖掘相似性分析方法及应用研究[D].长沙:国防科学技术大学,2015.
    [3] 喻高瞻,彭宏,胡劲松,等.时间序列数据的分段线性表示[J].计算机应用与软件,2007,24(12):17-18.
    [4] YAMANISHI K,TAKEUCHI J.Discovering outlier filtering rules from unlabeled data[C]//Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining.[s.l.]:ACM,2001:389-394.
    [5] KNORR E M,NG R T.Algorithms for mining distance-based outliers in large datasets[C]//International conference on very large data bases.[s.l.]:Morgan Kaufmann Publishers Inc.,1998:392-403.
    [6] 黄光球,彭绪友,靳峰.基于密度的异常挖掘方法研究与应用[J].微电子学与计算机,2005,22(3):262-265.
    [7] BREUNIG M M,KRIEGEL H,NG R T.LOF:identifying density-based local outliers[C]//ACM SIGMOD international conference on management of data.Dallas,Texas,USA:ACM,2000:93-104.
    [8] 曹文平,熊启军,罗颖,等.基于聚类的时间序列异常检测模型[J].金融科技时代,2012(11):100-101.
    [9] 蒋盛益,李庆华,赵延喜.一种两阶段异常检测方法[J].小型微型计算机系统,2005,26(7):1237-1240.
    [10] 刘雪梅,王亚茹.基于异常因子的时间序列异常模式检测[J].计算机技术与发展,2018,28(3):93-96.
    [11] 段明秀.层次聚类算法的研究及应用[D].长沙:中南大学,2009.
    [12] 张红梅,丁伟,范艳峰.一种改进的层次聚类算法在面包品质检验中的应用[J].微电子学与计算机,2009,26(7):187-190.
    [13] 詹艳艳,徐荣聪.时间序列异常模式的k-均距异常因子检测[J].计算机工程与应用,2009,45(9):141-145.
    [14] 曹文平,熊启军,罗颖,等.基于相关性分析的时间序列异常检测方法[J].信息系统工程,2012(10):131-132.
    [15] 李海林.基于动态弯曲的时间序列异步相关性分析[J].计算机应用研究,2014,31(7):1976-1979.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700