基于子序列全连接和最大团的时间序列模体发现算法

英文篇名：Time series motif discovery algorithm based on subsequence full join and maximum clique
作者：朱跃龙 ; 朱晓晓 ; 王继民
英文作者：ZHU Yuelong;ZHU Xiaoxiao;WANG Jimin;College of Computer and Information,Hohai University;
关键词：时间序列 ; 时间序列子序列 ; 子序列连接 ; 最大团 ; 模体发现
英文关键词：time series;;time series subsequence;;subsequence join;;maximum clique;;motif discovery
中文刊名：JSJY
英文刊名：Journal of Computer Applications
机构：河海大学计算机与信息学院;
出版日期：2018-09-20 15:33
出版单位：计算机应用
年：2019
期：v.39;No.342
语种：中文;
页：JSJY201902020
页数：7
CN：02
ISSN：51-1307/TP
分类号：110-116

摘要

针对时间序列模体发现算法计算复杂,并且无法发现多实例模体的问题,提出基于子序列全连接和最大团的时间序列模体发现(TSSJMC)算法。首先,使用快速时间序列子序列全连接算法求得所有子序列之间的距离,生成距离矩阵;然后,设置相似性阈值,将距离矩阵转化为邻接矩阵,构造子序列相似图;最后采用最大团搜索算法从相似图中搜索最大团,最大团的顶点对应的时间序列为包含最多实例的模体。在公开的时间序列数据集上进行实验,选用已有的能够发现多实例模体的Brute Force和Random Projection算法作为对比对象,分别从准确性、效率、可扩展性和鲁棒性对TSSJMC算法进行分析并获得了客观的评判结果。实验结果表明,与Random Projection算法相比,TSSJMC算法在效率、可扩展性和鲁棒性法方面均有明显优势;与Brute Force算法相比,TSSJMC算法发现的模体实例数量虽略低,但其效率和可扩展性都优于Brute Force算法。因此,TSSJMC是质量和效率相平衡的算法。
Existing time series motif discovery algorithms have high computational complexity and cannot find multi-instance motifs.To overcome these defects,a Time Series motif discovery algorithm based on Subsequence full Joins and Maximum Clique(TSSJMC)was proposed.Firstly,the fast time series subsequence full join algorithm was used to obtain the distance between all subsequences and generate the distance matrix.Then,the similarity threshold was set,the distance matrix was transformed into the adjacency matrix,and the sub-sequence similarity graph was constructed.Finally,the maximum clique in the similarity graph was extracted by the maximum clique search algorithm,and the time series corresponding to the vertices of the maximum clique were the motifs containing most instances.In the experiments on public time series datasets,TSSJMC algorithm was compared with Brute Force algorithm and Random Projection algorithm which also could find multi-instance motifs in accuracy,efficiency,scalability and robustness.The experimental results demonstrate that compared with Random Projection algorithm,TSSJMC algorithm has obvious advantages in terms of efficiency,scalability and robustness;compared with Bruce Force algorithm,TSSJMC algorithm finds slightly less motif instances,but its efficiency and scalability are better.Therefore,TSSJMC is an algorithm that balances quality and efficiency.

引文

[1]YEH C-C M,ZHU Y,ULANOVA L,et al.Matrix Profile I:all pairs similarity joins for time series:a unifying view that includes motifs,discords and shapelets[C]//Proceedings of the 2016IEEE 16th International Conference on Data Mining.Piscataway,NJ:IEEE,2016:1317-1322.
    [2]周博,严洪森.基于小波和多维泰勒网动力学模型的金融时间序列预测[J].系统工程理论与实践,2013,33(10):2654-2662.(ZHOU B,YAN H S.Financial time series forecasting based on wavelet and multi-dimensional Taylor network dynamics model[J].Systems Engineering―Theory and Practice,2013,33(10):2654-2662.)
    [3]AILLIOT P,BESSAC J,MONBET V,et al.Non-homogeneous hidden Markov-switching models for wind time series[J].Journal of Statistical Planning and Inference,2015,160:75-88.
    [4]张淑清,师荣艳,李盼,等.基于混沌关联积分的暂态电能质量扰动分类[J].仪器仪表学报,2015,36(1):160-166.(ZHANGS C,SHI R Y,LI P,et al.Transient power quality disturbance classification based on chaos-correlation-integral[J].Chinese Journal of Scientific Instrument,2015,36(1):160-166.)
    [5]ARES J,LARA J A,LIZCANO D,et al.A soft computing framework for classifying time series based on fuzzy sets of events[J].Information Sciences,2016,330:125-144.
    [6]PADMAVATHI S,RAMANUJAM E.Naive bayes classifier for ECG abnormalities using multivariate maximal time series motif[J].Procedia Computer Science,2015,47:222-228.
    [7]GE X,SMYTH P.Deformable Markov model templates for time-series pattern matching[C]//Proceedings of the 2000 6th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2000:81-90.
    [8]KALPAKIS K,GADA D,PUTTAGUNTA V.Distance measures for effective clustering of ARIMA time-series[C]//Proceedings of the2001 IEEE International Conference on Data Mining.Washington,DC:IEEE Computer Society,2001:273-280.
    [9]KEOGH E,CHAKRABARTI K,PAZZANI M,et al.Dimensionality reduction for fast similarity search in large time series databases[J].Knowledge&Information Systems,2001,3(3):263-286.
    [10]CHAKRABARTI K,KEOGH E,MEHROTRA S,et al.Locally adaptive dimensionality reduction for indexing large time series databases[J].ACM Transactions on Database Systems,2002,27(2):188-228.
    [11]LIN J,KEOGH E,LONARDI S,et al.Finding motifs in time series[C]//Proceedings of the 2nd Workshop on Temporal Data Mining.New York:ACM,2002:53-68.
    [12]马百鸣.基于DTW度量的时间序列主旨模式提取[D].大连:大连理工大学,2011:1-3.(MA B M.Motif extraction algorithm of time series based on DTW[D].Dalian:Dalian University of Technology,2011:1-3.)
    [13]PATEL P,KEOGH E,LIN J,et al.Mining motifs in massive time series databases[C]//Proceedings of the 2002 IEEE International Conference on Data Mining.Washington,DC:IEEEComputer Society,2002:370-377.
    [14]LIN J,LI Y.Finding approximate frequent patterns in streaming medical data[C]//Proceedings of the 2010 IEEE 23rd International Symposium on Computer-Based Medical Systems(CBMS).Washington,DC:IEEE Computer Society,2010:13-18.
    [15]MURAKAMI K,DOKI S,OKUMA S,et al.A study of extraction method of motion patterns observed frequently from time-series posture data[C]//Proceeding of the 2005 IEEE International Conference on Systems,Man and Cybernetics.Piscataway,NJ:IEEE,2005:3610-3615.
    [16]FU T-C,CHUNG F-L,LUK R,et al.Preventing meaningless stock time series pattern discovery by changing perceptually important point detection[C]//Proceedings of the 2005 Second International Conference on Fuzzy Systems and Knowledge Discovery,LNCS 3613.Berlin:Springer,2005:1171-1174.
    [17]EAMONN KEOGH J L.Clustering of time series subsequences is meaningless:implications for past and future research[J].Knowledge&Information Systems,2003,8(2):154-177.
    [18]CHIU B,KEOGH E,et al.Probabilistic discovery of time series motifs[C]//Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2003:493-498.
    [19]MINNEN D,ISBELL C,ESSA I,et al.Detecting subdimensional motifs:an efficient algorithm for generalized multivariate pattern discovery[C]//Proceedings of the 2007 Seventh IEEE International Conference on Data Mining.Washington,DC:IEEE Computer Society,2007:601-606
    [20]LIN Y,MCCOOL M D,GHORBANI A A.Time series motif discovery and anomaly detection based on subseries join[J].IAENGInternational Journal of Computer Science,2010,37(3):259-271.
    [21]SILVA D F,YEH C-C M,ENRIQUE G,et al.Si MPle:assessing music similarity using subsequences joins[C]//Proceedings of the 17th International Society for Music Information Retrieval Conference,New York:ISMIR,2016:23-29.
    [22]GRABOCKA J,SCHILLING N,SCHMIDTTHIEME L.Latent time-series motifs[J].ACM Transactions on Knowledge Discovery from Data,2016,11(1):1-20.
    [23]YANKOV D,KEOGH E,MEDINA J,et al.Detecting timeseries motifs under uniform scaling[C]//Proceedings of the 13th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2007:844-853.
    [24]LI Y,LIN J.Approximate variable-length time series motif discovery using grammar inference[C]//Proceedings of the 10th International Workshop on Multimedia Data Mining.New York:ACM,2010:Article No.10.
    [25]LI Y,LIN J,OATES T.Visualizing variable-length time series motifs[C]//Proceedings of the 12th SIAM International Conference on Data Mining.Philadelphia,PA:Society for Industrial and Applied Mathematics(SIAM),2012:895-906.
    [26]DUY T C,ANH D T.A fast method for motif discovery in large time series database under dynamic time warping[C]//Proceeding of the Sixth International Conference on Knowledge and Systems Engineering,AISC 326.Cham:Springer,2015:155-167.
    [27]MUEEN A,ZHU Y,YEH M,et al.The fastest similarity search algorithm for time series subsequences under euclidean distance[EB/OL].(2015-08-01)[2017-11-01].http://www.cs.unm.edu/~mueen/Fastest Similarity Search.html.
    [28]MUEEN A,NATH S,LIU J.Fast approximate correlation for massive time-series data[C]//Proceedings of the 2010 ACMSIGMOD International Conference on Management of Data.New York:ACM,2010:171-182.
    [29]SAKURAI Y,PAPADIMITRIOU S,FALOUTSOS C.BRAID:stream mining through group lag correlations[C]//Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data.New York:ACM,2005:599-610.
    [30]RAKTHANMANON T,CAMPANA B,MUEEN A,et al.Searching and mining trillions of time series subsequences under dynamic time warping[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2012:262-270.
    [31]BELACHEW M T,GILLIS N.Solving the maximum clique problem with symmetric rank-one non-negative matrix approximation[J].Journal of Optimization Theory and Applications,2017,173(1):279-296.
    [32]MUEEN A,KEOGH E.Online discovery and maintenance of time series motif[EB/OL].(2010-06-25)[2017-11-01].http://alumni.cs.ucr.edu/~mueen/Online Motif/.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700