摘要
对于事件序列中的时序依赖发现,传统的频繁情节发现方法一方面使用时间窗口机制挖掘事件之间简单的关联依赖,另一方面无法有效处理事件的交叉时序关联。针对以上问题,提出了时滞情节发现的概念,在频繁情节发现的基础上,设计了一种基于相邻事件匹配集(AEM)的时滞情节发现算法。首先,引入时滞的概率统计模型进行事件序列匹配,避免预先设定时间窗口,处理可能存在的交叉关联;然后,将时滞挖掘转化为最优化问题,使用迭代的方式得到时滞情节之间的时间间隔分布;最后,利用假设检验区分串行时滞情节和并行时滞情节。理论分析与实验结果表明,与目前最新的时滞挖掘方法迭代最近事件(ICE)算法相比,基于AEM的时滞情节发现算法模拟的时滞分布与真实时滞分布的平均KL距离为0. 056,缩短了20. 68%。基于AEM的时滞情节发现算法通过时滞的概率统计模型衡量事件多种匹配情况的可能性,获得一对多的相邻事件匹配集,比ICE算法中的一对一匹配更加有效地模拟了实际情况。
Concerning the problem that a predefined time window is usually used to mine simple association dependencies between events in the traditional frequent episode discovery,which cannot effectively handle interleaved temporal correlations between events,a concept of time-lag episode discovery was proposed.And on the basis of frequent episode discovery,Adjacent Event Matching set(AEM)based time-lag episode discovery algorithm was proposed.Firstly,a probability statistical model introduced with time-lag was introduced to realize event sequence matching and handle optional interleaved associations without a predefined time window.Then the discovery of time lag was formulated as an optimization problem which can be solved iteratively to obtain time interval distribution between time-lag episodes.Finally,the hypothesis test was used to distinguish serial and parallel time-lag episodes.The experimental results show that compared with Iterative Closest Event(ICE)algorithm which is the latest method of time-lag mining,the Kullback-Leibler(KL)divergence between true and experimental distributions discovered by AEM is 0.056 on average,which is decreased by 20.68%.AEM algorithm measures the possibility of multiple matches of events through a probability statistical model of time lag and obtains a one-to-many adjacent event matching set,which is more effective than the one-to-one matching set in ICE for simulating the actual situation.
引文
[1]李涛,刘峥,周绮凤.应用驱动的大数据挖掘[J].中兴通讯技术,2016,22(2):49-52.(LI T,LIU Z,ZHOU Q F.Application-driven big data mining[J].ZTE Communications,2016,22(2):49-52.)
[2]WU C-W,LIN Y-F,YU P S,et al.Mining high utility episodes in complex event sequences[C]//Proceedings of the 2013 ACMSIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2013:536-544.
[3]武健.时序Web数据挖掘方法[J].计算机应用,2014,34(S2):120-122.(WU J.Data mining method from time series Web data[J].Journal of Computer Applications,2014,34(S2):120-122.)
[4]郭跇秀,吕学强,李卓.基于突发词聚类的微博突发事件检测方法[J].计算机应用,2014,34(2):486-490.(GUO Y X,LYUX Q,LI Z.Bursty topics detection approach on Chinese microblog based on burst words clustering[J].Journal of Computer Applications,2014,34(2):486-490.)
[5]AGRAWAL R,IMIELINSKI T,SWAMI A N.Mining association rules between sets of items in large databases[C]//Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data.New York:ACM,1993:207-216.
[6]MANNILA H,TOIVONEN H,VERKAMO A I.Discovery of frequent episodes in event sequences[J].Data Mining&Knowledge Discovery,1997,1(3):259-289.
[7]TANG L,LI T,SHWARTZ L.Discovering lag intervals for temporal dependencies[C]//Proceedings of the 2012 18th ACM SIGK-DD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2012:633-641.
[8]AGRAWAL R,SRIKANT R.Mining sequential patterns[C]//Proceedings of the 1995 Eleventh International Conference on Data Engineering.Washington,DC:IEEE Computer Society,1995:3-14.
[9]李艳辉,刘浩,袁野,等.基于差分隐私的频繁序列模式挖掘算法[J].计算机应用,2017,37(2):316-321.(LI Y H,LIU H,YUANY.Frequent sequence pattern mining with differential privacy[J].Journal of Computer Applications,2017,37(2):316-321.)
[10]LIANG F,MA S,HELLERSTEIN J L.Discovering fully dependent patterns[C]//Proceedings of the 2002 SIAM International Conference on Data Mining.Philadelphia,PA:Society for Industrial and Applied Mathematics(SIAM),2002:511-527.
[11]MA S,HELLERSTEIN J L.Mining mutually dependent patterns[C]//Proceedings of the 2001 IEEE International Conference on Data Mining.Washington,DC:IEEE Computer Society,2001:409-416.
[12]MA S,HELLERSTEIN J L.Mining partially periodic event patterns with unknown periods[C]//Proceedings of the 2001 17th International Conference on Data Engineering.Piscataway,NJ:IEEE,2001:205-214.
[13]AO X,LUO P,WANG J,et al.Mining precise-positioning episode rules from event sequences[J].IEEE Transactions on Knowledge and Data Engineering,2018,30(3):530-543.
[14]LI T,MA S.Mining temporal patterns without predefined time windows[C]//Proceedings of the 4th IEEE International Conference on Data Mining.Washington,DC:IEEE Computer Society,2004:451-454.
[15]ZENG C,TANG L,LI T,et al.Mining temporal lag from fluctuating events for correlation and root cause analysis[C]//Proceedings of the 2014 10th International Conference on Network and Service Management.Washington,DC:IEEE Computer Society,2014:19-27.
[16]ZLLER M-A,BAUM M,HUBER M F.Framework for mining event correlations and time lags in large event sequences[C]//Proceedings of the IEEE 2017 15th International Conference of Industrial Informatics.Piscataway,NJ:IEEE,2017:3-4.
[17]BESL P J,Mc KAY N D.A method for registration of 3-D shapes[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,14(2):239-256.
[18]GUO S,LIU Z,CHEN W,et al.Event extration from streaming system logs[C]//Proceedings of the 9th International Conference on Information Science and Applications,LNEE 514.Singapore:Springer,2018:465-474.
[19]LI T,JIANG Y,ZENG C,et al.FLAP:an end-to-end event log analysis platform for system management[C]//Proceedings of the2017 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2017:1547-1556.
[20]FISCHLER M A,BOLLES R C.Random sample consensus:a paradigm for model fitting with applications to image analysis and automated cartography[J].Readings in Computer Vision,1987:726-740.
[21]LI T,LIANG F,MA S,et al.An integrated framework on mining logs files for computing system management[C]//Proceedings of the 2005 Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2005:776-781.